287 lines
5.9 KiB
Markdown
287 lines
5.9 KiB
Markdown
|
|
# 🚀 MAGAIL完整训练指南
|
|||
|
|
|
|||
|
|
## ✅ 已实现的功能
|
|||
|
|
|
|||
|
|
### 1. 完整训练循环
|
|||
|
|
- ✅ 多智能体buffer存储
|
|||
|
|
- ✅ GAIL判别器更新
|
|||
|
|
- ✅ PPO策略优化
|
|||
|
|
- ✅ TensorBoard日志记录
|
|||
|
|
- ✅ 模型保存和加载
|
|||
|
|
- ✅ 专家数据加载(7805条轨迹)
|
|||
|
|
|
|||
|
|
### 2. 环境系统
|
|||
|
|
- ✅ 多智能体场景环境
|
|||
|
|
- ✅ 车辆动态生成
|
|||
|
|
- ✅ 多维度观测(108维)
|
|||
|
|
- ✅ 渲染和可视化
|
|||
|
|
|
|||
|
|
## 🎮 快速开始
|
|||
|
|
|
|||
|
|
### 方法1:基础训练(推荐新手)
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# 小规模测试(10个episode,无渲染)
|
|||
|
|
python train_magail.py --episodes 10 --horizon 200
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 方法2:带可视化训练
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# 5个episode,带渲染
|
|||
|
|
python train_magail.py --episodes 5 --render --horizon 200
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 方法3:完整训练
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# 长期训练(1000 episodes)
|
|||
|
|
python train_magail.py \
|
|||
|
|
--episodes 1000 \
|
|||
|
|
--horizon 300 \
|
|||
|
|
--rollout-length 512 \
|
|||
|
|
--batch-size 128 \
|
|||
|
|
--lr-actor 3e-4 \
|
|||
|
|
--device cuda
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 方法4:使用测试脚本
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
bash test_training.sh
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## 📊 训练过程
|
|||
|
|
|
|||
|
|
### 数据流
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Episode开始
|
|||
|
|
↓
|
|||
|
|
收集观测 (108维 × N辆车)
|
|||
|
|
↓
|
|||
|
|
Actor采样动作 ([转向, 油门])
|
|||
|
|
↓
|
|||
|
|
环境step
|
|||
|
|
↓
|
|||
|
|
存储到Buffer (state, action, reward, next_state...)
|
|||
|
|
↓
|
|||
|
|
每512步:
|
|||
|
|
├─ 更新判别器 (区分策略vs专家)
|
|||
|
|
├─ 计算GAIL奖励
|
|||
|
|
└─ 更新PPO (Actor + Critic)
|
|||
|
|
↓
|
|||
|
|
Episode结束
|
|||
|
|
↓
|
|||
|
|
保存模型(如果是最佳)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 关键参数说明
|
|||
|
|
|
|||
|
|
| 参数 | 默认值 | 说明 |
|
|||
|
|
|------|--------|------|
|
|||
|
|
| `--episodes` | 1000 | 训练轮数 |
|
|||
|
|
| `--horizon` | 300 | 每轮最大步数 |
|
|||
|
|
| `--rollout-length` | 512 | 更新间隔 |
|
|||
|
|
| `--batch-size` | 128 | 批次大小 |
|
|||
|
|
| `--lr-actor` | 3e-4 | Actor学习率 |
|
|||
|
|
| `--lr-critic` | 3e-4 | Critic学习率 |
|
|||
|
|
| `--lr-disc` | 3e-4 | 判别器学习率 |
|
|||
|
|
| `--epoch-disc` | 5 | 判别器更新轮数 |
|
|||
|
|
| `--epoch-ppo` | 10 | PPO更新轮数 |
|
|||
|
|
| `--render` | False | 是否可视化 |
|
|||
|
|
|
|||
|
|
## 📈 监控训练
|
|||
|
|
|
|||
|
|
### 使用TensorBoard
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# 启动TensorBoard
|
|||
|
|
tensorboard --logdir outputs/
|
|||
|
|
|
|||
|
|
# 在浏览器打开
|
|||
|
|
# http://localhost:6006
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 关键指标
|
|||
|
|
|
|||
|
|
1. **Episode/Reward** - 每个episode的总奖励
|
|||
|
|
2. **Training/GAILReward** - GAIL提供的模仿奖励
|
|||
|
|
3. **Loss/disc** - 判别器损失
|
|||
|
|
4. **Acc/acc_pi** - 判别器识别策略数据的准确率
|
|||
|
|
5. **Acc/acc_exp** - 判别器识别专家数据的准确率
|
|||
|
|
6. **Loss/actor** - Actor损失
|
|||
|
|
7. **Loss/critic** - Critic损失
|
|||
|
|
|
|||
|
|
### 期望的训练曲线
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Episode Reward → 逐渐上升(从0开始增长)
|
|||
|
|
GAIL Reward → 先上升后稳定
|
|||
|
|
Disc Accuracy → 趋向50%(说明策略接近专家)
|
|||
|
|
Actor Loss → 逐渐下降
|
|||
|
|
Critic Loss → 逐渐下降
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## 🔍 训练状态检查
|
|||
|
|
|
|||
|
|
### 查看输出日志
|
|||
|
|
|
|||
|
|
训练时会打印:
|
|||
|
|
```
|
|||
|
|
📍 Episode 1/10
|
|||
|
|
可控车辆数: 5
|
|||
|
|
|
|||
|
|
🔄 步数 512: 更新模型...
|
|||
|
|
GAIL奖励: 0.5234
|
|||
|
|
|
|||
|
|
✅ Episode 1 完成:
|
|||
|
|
步数: 200
|
|||
|
|
总奖励: 0.00
|
|||
|
|
平均奖励: 0.0000
|
|||
|
|
车辆数: 5
|
|||
|
|
💾 保存最佳模型 (奖励: 0.00)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 检查模型文件
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
ls outputs/magail_*/models/
|
|||
|
|
# 应该看到:
|
|||
|
|
# - best_model/model.pth
|
|||
|
|
# - checkpoint_50/model.pth
|
|||
|
|
# - checkpoint_100/model.pth
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## ⚠️ 常见问题
|
|||
|
|
|
|||
|
|
### Q1: 奖励一直是0?
|
|||
|
|
|
|||
|
|
**A:** 这是正常的!
|
|||
|
|
- 环境奖励设计为0
|
|||
|
|
- 真正的奖励由GAIL提供(内在奖励)
|
|||
|
|
- 查看 `Training/GAILReward` 指标
|
|||
|
|
|
|||
|
|
### Q2: 判别器准确率是什么意思?
|
|||
|
|
|
|||
|
|
**A:**
|
|||
|
|
- `acc_pi`: 判别器识别策略数据为"假"的准确率
|
|||
|
|
- `acc_exp`: 判别器识别专家数据为"真"的准确率
|
|||
|
|
- 训练初期:都接近100%(策略很差,容易区分)
|
|||
|
|
- 训练后期:都接近50%(策略接近专家,难以区分)
|
|||
|
|
|
|||
|
|
### Q3: 车辆为什么不动或乱动?
|
|||
|
|
|
|||
|
|
**A:**
|
|||
|
|
- 训练初期:策略随机,车辆行为混乱
|
|||
|
|
- 需要训练多个episode后才会改善
|
|||
|
|
- 运行 `python test_vehicle_movement.py` 确认环境正常
|
|||
|
|
|
|||
|
|
### Q4: 显存不足?
|
|||
|
|
|
|||
|
|
**A:** 减小参数:
|
|||
|
|
```bash
|
|||
|
|
python train_magail.py \
|
|||
|
|
--batch-size 64 \
|
|||
|
|
--rollout-length 256 \
|
|||
|
|
--epoch-disc 3 \
|
|||
|
|
--epoch-ppo 5
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Q5: 训练太慢?
|
|||
|
|
|
|||
|
|
**A:**
|
|||
|
|
- 去掉 `--render`(可视化很耗时)
|
|||
|
|
- 减小 `--horizon`
|
|||
|
|
- 使用更大的 `--rollout-length`
|
|||
|
|
|
|||
|
|
## 🎯 训练建议
|
|||
|
|
|
|||
|
|
### 初次训练
|
|||
|
|
|
|||
|
|
1. **先测试小规模**
|
|||
|
|
```bash
|
|||
|
|
python train_magail.py --episodes 5 --horizon 100
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
2. **观察是否有错误**
|
|||
|
|
|
|||
|
|
3. **检查TensorBoard**
|
|||
|
|
```bash
|
|||
|
|
tensorboard --logdir outputs/
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 正式训练
|
|||
|
|
|
|||
|
|
1. **中等规模预热**
|
|||
|
|
```bash
|
|||
|
|
python train_magail.py --episodes 100 --horizon 200
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
2. **观察学习曲线**
|
|||
|
|
- 判别器准确率是否下降?
|
|||
|
|
- GAIL奖励是否变化?
|
|||
|
|
|
|||
|
|
3. **长期训练**
|
|||
|
|
```bash
|
|||
|
|
python train_magail.py --episodes 1000 --horizon 300
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 超参数调优
|
|||
|
|
|
|||
|
|
可以尝试调整:
|
|||
|
|
- 学习率:`1e-4` 到 `1e-3`
|
|||
|
|
- Rollout length:`256` 到 `1024`
|
|||
|
|
- Batch size:`64` 到 `256`
|
|||
|
|
|
|||
|
|
## 📁 输出文件结构
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
outputs/
|
|||
|
|
└── magail_YYYYMMDD_HHMMSS/
|
|||
|
|
├── models/
|
|||
|
|
│ ├── best_model/
|
|||
|
|
│ │ └── model.pth
|
|||
|
|
│ ├── checkpoint_50/
|
|||
|
|
│ └── checkpoint_100/
|
|||
|
|
└── logs/
|
|||
|
|
└── events.out.tfevents.* # TensorBoard日志
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## 🚀 下一步
|
|||
|
|
|
|||
|
|
训练完成后:
|
|||
|
|
|
|||
|
|
1. **评估模型**
|
|||
|
|
```bash
|
|||
|
|
# TODO: 创建评估脚本
|
|||
|
|
python evaluate.py --model outputs/magail_*/models/best_model
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
2. **可视化行为**
|
|||
|
|
```bash
|
|||
|
|
python train_magail.py --episodes 1 --render \
|
|||
|
|
--load-model outputs/magail_*/models/best_model/model.pth
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
3. **分析日志**
|
|||
|
|
- 查看TensorBoard
|
|||
|
|
- 对比不同超参数的效果
|
|||
|
|
|
|||
|
|
## 💡 提示
|
|||
|
|
|
|||
|
|
- 💾 定期备份 `outputs/` 目录
|
|||
|
|
- 📊 使用TensorBoard监控训练
|
|||
|
|
- ⏰ 长期训练建议使用 `nohup` 或 `screen`
|
|||
|
|
- 🔍 出现错误时查看完整堆栈跟踪
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**祝训练顺利!** 🎉
|
|||
|
|
|
|||
|
|
有问题查看:
|
|||
|
|
- `技术说明文档.md` - 技术细节
|
|||
|
|
- `MAGAIL算法应用指南.md` - 使用指南
|
|||
|
|
- `问题解决记录.md` - 常见问题
|
|||
|
|
|