4.3: MLE.3 ML Training and Learning
Process Definition
Purpose
MLE.3 Purpose: To train the ML model according to the architecture and data requirements.
Outcomes
| Outcome | Description |
|---|---|
| O1 | A ML training and validation approach is specified |
| O2 | The data set for ML training and ML validation is created |
| O3 | The ML model, including hyperparameter values, is optimized to meet the defined ML requirements |
| O4 | Consistency and bidirectional traceability are established between the ML training and validation data set and the ML data requirements |
| O5 | Results of optimization are summarized, and the trained ML model is agreed and communicated to all affected parties |
Base Practices with AI Integration
| BP | Base Practice | AI Level | AI Application |
|---|---|---|---|
| BP1 | Specify ML training and validation approach | L1-L2 | Approach definition |
| BP2 | Create ML training and validation data set | L2 | Data selection, augmentation |
| BP3 | Create and optimize ML model | L2-L3 | Automated training, hyperparameter tuning |
| BP4 | Ensure consistency and establish bidirectional traceability | L2 | Data-to-requirements tracing |
| BP5 | Summarize and communicate agreed trained ML model | L1 | Results documentation |
Training Pipeline
End-to-End Training Process
The following diagram illustrates the end-to-end ML training pipeline, from data ingestion and augmentation through model training, validation, and artifact versioning.
Data Augmentation Strategy
Automotive-Specific Augmentation
# Data augmentation configuration
augmentation:
geometric:
horizontal_flip:
enabled: true
probability: 0.5
scale:
enabled: true
range: [0.8, 1.2]
rotation:
enabled: true
range: [-5, 5] # degrees - limited for driving context
crop:
enabled: true
min_area: 0.7
photometric:
brightness:
enabled: true
range: [-0.2, 0.2]
contrast:
enabled: true
range: [0.8, 1.2]
saturation:
enabled: true
range: [0.8, 1.2]
hue:
enabled: true
range: [-0.1, 0.1]
weather_simulation:
rain:
enabled: true
intensity_range: [0.1, 0.5]
fog:
enabled: true
density_range: [0.1, 0.3]
snow:
enabled: false # Use real snow data instead
automotive_specific:
sun_flare:
enabled: true
probability: 0.1
headlight_glare:
enabled: true
probability: 0.1 # Night scenes only
sensor_noise:
enabled: true
noise_level: 0.01
mosaic:
enabled: true
probability: 0.5
grid_size: 2
mixup:
enabled: true
probability: 0.1
alpha: 0.5
Training Configuration
Hyperparameter Specification
# Training configuration
training:
id: MLE-TRAIN-001
model: YOLOv8-nano
dataset: MLE-DATA-001
# Base training parameters
epochs: 300
batch_size: 64
image_size: [640, 384]
# Optimizer configuration
optimizer:
type: AdamW
lr: 0.001
weight_decay: 0.0005
momentum: 0.937
# Learning rate schedule
scheduler:
type: cosine
warmup_epochs: 3
warmup_bias_lr: 0.1
warmup_momentum: 0.8
min_lr: 0.0001
# Transfer learning
transfer:
pretrained: "yolov8n.pt"
freeze_backbone: false
freeze_epochs: 0
# Loss function
loss:
box_loss: 7.5
cls_loss: 0.5
dfl_loss: 1.5
# Early stopping
early_stopping:
patience: 50
metric: "mAP@0.5"
mode: "max"
# Checkpointing
checkpointing:
save_period: 10 # epochs
save_best: true
save_last: true
# Hardware
hardware:
device: "cuda:0"
workers: 8
pin_memory: true
amp: true # Automatic Mixed Precision
# Reproducibility
reproducibility:
seed: 42
deterministic: true
Training Monitoring
Metrics Dashboard
The diagram below shows a training monitoring dashboard, displaying real-time loss curves, accuracy metrics, and early stopping indicators that enable engineers to detect training issues promptly.
Model Versioning
Version Control for ML
# Model version specification (illustrative example)
model_version:
id: MLE-MODEL-001-v2.3.1
created: "(timestamp)"
training_run: MLE-TRAIN-001
artifacts:
weights: "models/yolov8n_adas_v2.3.1.pt"
config: "configs/train_v2.3.1.yaml"
onnx: "models/yolov8n_adas_v2.3.1.onnx"
trt_int8: "models/yolov8n_adas_v2.3.1.engine"
metrics:
validation:
mAP_50: 0.894
precision: 0.965
recall: 0.989
f1: 0.977
test: # Held-out test set
mAP_50: 0.887
precision: 0.958
recall: 0.984
f1: 0.971
training_data:
dataset_version: MLE-DATA-001-v1.2
train_samples: 700000
val_samples: 150000
augmentation: "aug_config_v2.yaml"
training_params:
epochs: 287 # Early stopped
best_epoch: 237
training_time: "36h 42m"
gpu: "NVIDIA A100"
dependencies:
pytorch: "2.1.0"
ultralytics: "8.0.200"
cuda: "12.1"
traceability:
requirements:
- MLE-ADAS-001
- MLE-ADAS-002
architecture: MLE-ARCH-001
previous_version: MLE-MODEL-001-v2.2.0
status: "validated" # draft, training, validated, released
approver: "(Approver Name)"
approval_date: "(Approval Date)"
Quantization-Aware Training
QAT Process
Note: Python code examples are illustrative and require project-specific implementation (optimizer, loss functions, etc.).
"""
Quantization-Aware Training for Automotive Deployment
"""
import torch
from torch.quantization import QuantStub, DeQuantStub
class QuantizedModel(torch.nn.Module):
"""Model wrapper for quantization-aware training."""
def __init__(self, base_model):
super().__init__()
self.quant = QuantStub()
self.model = base_model
self.dequant = DeQuantStub()
def forward(self, x):
x = self.quant(x)
x = self.model(x)
x = self.dequant(x)
return x
def train_qat(model, train_loader, val_loader, config):
"""Quantization-aware training loop."""
# Prepare model for QAT
model.train()
model.qconfig = torch.quantization.get_default_qat_qconfig('fbgemm')
torch.quantization.prepare_qat(model, inplace=True)
# Training loop
for epoch in range(config['qat_epochs']):
for batch in train_loader:
images, labels = batch
# Forward pass
outputs = model(images)
loss = compute_loss(outputs, labels)
# Backward pass
optimizer.zero_grad()
loss.backward()
optimizer.step()
# Validation
val_metrics = validate(model, val_loader)
print(f"Epoch {epoch}: mAP={val_metrics['mAP']:.4f}")
# Check quantization degradation
if epoch == 0:
baseline_map = val_metrics['mAP']
if baseline_map - val_metrics['mAP'] > 0.01:
print("WARNING: QAT causing >1% accuracy drop")
# Convert to quantized model
model.eval()
quantized_model = torch.quantization.convert(model, inplace=False)
return quantized_model
Work Products
| WP ID | Work Product | AI Role |
|---|---|---|
| 11-06 | Trained model | Training output |
| 11-07 | Training dataset | Data preparation |
| 13-65 | Training report | Metrics tracking |
| 04-11 | Training configuration | Setup documentation |
Summary
MLE.3 ML Training and Learning:
- AI Level: L2-L3 (high automation for training)
- Primary AI Value: Automated training, hyperparameter optimization
- Human Essential: Data quality, final model selection
- Key Outputs: Trained model, training report
- Focus: Reproducibility, versioning, traceability