5.1: MLE Process Application
MLE (Machine Learning Engineering) Process Overview
MLE Lifecycle for Safety-Critical ML
MLE Process: Extension of traditional software engineering for ML systems
6-Phase MLE Lifecycle (adapted for ASPICE context):
MLE.1: ML Requirements Analysis
Operational Design Domain (ODD)
ODD: Conditions under which ML model is designed to operate safely
IEC 62304 Analogy: Similar to "intended use" for medical devices
LKA ODD Definition:
Operational Design Domain (ODD) - Lane Keeping Assist
─────────────────────────────────────────────────────────
Geographic:
- Road Type: Highway, rural roads (NOT urban city streets)
- Lane Markings: Visible lane lines (white/yellow, solid/dashed)
- Lane Width: 2.5 - 3.7 meters (standard lane widths)
Environmental:
- Weather: Dry, light rain (NOT heavy rain, snow, fog)
- Lighting: Daytime (6am-8pm), dusk (NOT night with poor visibility)
- Visibility: ≥100 meters (NOT heavy fog, smoke)
Operational:
- Speed: 60-130 km/h (NOT stop-and-go traffic, parking)
- Curvature: Radius ≥150 meters (NOT sharp curves, roundabouts)
- Traffic: Light-moderate (NOT construction zones, lane merges)
Exclusions (Out of ODD):
[FAIL] Tunnel entrances (lighting transition)
[FAIL] Faded/missing lane markings
[FAIL] Snow-covered roads
[FAIL] Unpaved roads
[FAIL] Parking lots
Traceability: ODD → ML Requirements → Test Cases
Example Requirement Derivation:
[MLE-REQ-001] Lane Detection Accuracy within ODD
Description:
The lane detection model shall achieve ≥92% IoU (Intersection over Union)
when tested on images captured within the defined ODD.
Rationale:
92% IoU ensures lane line segmentation accurate enough for steering control
(lateral offset error ≤0.15 meters, acceptable for ASIL-B)
Acceptance Criteria:
1. Test on 25,000 held-out images (sampled from ODD conditions)
2. Calculate IoU for each image (true lane pixels ∩ predicted / union)
3. Mean IoU ≥ 92% (95% confidence interval)
4. Worst-case IoU ≥ 70% (no catastrophic failures)
Traceability:
- Derived from: [SYS-REQ-LKA-003] "LKA shall keep vehicle in lane center ±0.2m"
- Verified by: [TC-MLE-001-1] "Test set evaluation (25,000 images)"
Safety Class: ASIL-B (lane detection critical to LKA safety function)
[MLE-REQ-002] Inference Latency Requirement
Description:
The lane detection model shall process camera frames with latency ≤30ms
on NVIDIA Jetson AGX Orin target hardware.
Rationale:
At 130 km/h (36 m/s), 30ms latency = 1.08 meter traveled
Acceptable for steering control (PID can compensate for 1m delay)
Acceptance Criteria:
1. Measure end-to-end latency: Image capture → CNN inference → Output
2. Average latency ≤ 25ms (target), max latency ≤ 30ms (requirement)
3. Test under CPU load (other ECU tasks running)
Verification Method:
- Benchmark on Jetson Orin (TensorRT optimized model)
- 1,000 frames, log timestamps, calculate P95 latency
Traceability:
- Derived from: [SYS-REQ-LKA-012] "LKA response time ≤100ms end-to-end"
MLE.2: Dataset Management
Dataset Collection Strategy
Goal: 250,000 annotated images covering ODD conditions
Data Sources:
-
Public Datasets (40%, 100,000 images):
- TuSimple (USA highways, 6,408 images)
- CULane (China urban/rural, 88,880 images)
- BDD100K (USA diverse, 5,712 lane-marked images)
- Advantage: Pre-labeled, diverse scenarios
- Disadvantage: May not match our ODD (e.g., China roads differ from Europe)
-
Proprietary Data Collection (60%, 150,000 images):
- Method: 3 test vehicles, 500,000 km driven over 6 months
- Routes: German Autobahn (40%), French highways (30%), Italian rural (30%)
- Conditions: Dry (70%), light rain (20%), dusk (10%)
- Sampling: Extract 1 frame per 10 meters (avoid redundant similar frames)
Dataset Composition (ensuring ODD coverage):
| Condition | Images | % | Purpose |
|---|---|---|---|
| Daytime, dry, straight highway | 150,000 | 60% | Nominal operation (most common) |
| Daytime, light rain | 40,000 | 16% | Robustness (lane lines less visible) |
| Dusk lighting | 25,000 | 10% | Edge case (shadows, glare) |
| Curved roads (R≥150m) | 20,000 | 8% | ODD boundary (test curvature limit) |
| Faded lane markings | 10,000 | 4% | Corner case (near ODD exit condition) |
| Construction zones (with markings) | 5,000 | 2% | Rare scenario (temporary lane lines) |
Total: 250,000 images
Data Annotation Process
Tool: CVAT (Computer Vision Annotation Tool) - open-source, web-based
Annotation Task: Pixel-wise lane line segmentation (binary mask)
Annotation Guidelines (55-page manual):
## Lane Line Annotation Guidelines v2.3
### Objective
Create binary segmentation masks where:
- White pixels (255): Lane line (left/right boundaries)
- Black pixels (0): Everything else (road, sky, vehicles)
### Rules
1. **Lane Line Definition**: Paint road markings delineating lane boundaries
- Solid white/yellow lines: Full width (typically 10-15 cm)
- Dashed lines: Paint dashed segments only (not gaps)
- Double lines: Paint both lines
2. **Occlusions**: If vehicle/shadow partially occludes lane line:
- Paint visible portions only
- Do NOT interpolate occluded segments (let model learn to handle occlusions)
3. **Faded Markings**: If lane line barely visible:
- Paint what you can see (even if low contrast)
- Quality control: 2nd annotator reviews faded cases
4. **Edge Cases**:
- Construction zone temporary markings: Paint if visible
- Road repairs (black patches over lines): Do NOT paint (line not visible)
- Reflective markers (Botts' dots): Do NOT paint (not painted lines)
### Quality Control
- Each image reviewed by 2nd annotator
- Inter-annotator agreement target: ≥95% (IoU between 2 annotators)
- Disputed cases escalated to senior annotator
Annotation Metrics:
- Time per Image: 4.8 minutes (average)
- Total Annotation Effort: 250,000 images × 4.8 min = 20,000 hours
- Cost: 20,000 hours × €15/hour = €300,000
- Team: 10 annotators (2,000 hours each over 6 months)
Quality Control:
- Inter-Annotator Agreement: 96.2% IoU (exceeds 95% target)
- Re-annotation Rate: 8% (20,000 images re-annotated after QC review)
Dataset Versioning (DVC)
Tool: DVC (Data Version Control) - Git-like for ML datasets
Why DVC?:
- [FAIL] Git doesn't scale for 250,000 images (50 GB dataset, Git limit ~1 GB)
- [PASS] DVC tracks data in remote storage (S3), Git tracks metadata (hashes, versions)
- [PASS] Reproducibility: Checkout dataset v1.0 → Train model → Reproduce results
DVC Workflow:
# Initialize DVC in Git repository
cd lane-detection-project
dvc init
# Add dataset to DVC (stores in S3, tracks hash in Git)
dvc add data/train_images/
dvc add data/train_annotations/
# Commit DVC metadata to Git (not actual images)
git add data/train_images.dvc data/train_annotations.dvc
git commit -m "Dataset v1.0: 250k images, 96% inter-annotator agreement"
git tag dataset-v1.0
# Push data to S3 (DVC remote storage)
dvc remote add -d s3-storage s3://lane-detection-datasets/
dvc push
# Later: Reproduce training with exact same dataset
git checkout dataset-v1.0
dvc pull # Downloads data from S3
python train.py # Uses dataset v1.0
Dataset Versioning History:
| Version | Date | Images | Changes | Model Trained |
|---|---|---|---|---|
| v0.1 | 2024-04 | 50,000 | Initial pilot dataset | Baseline (85% IoU) |
| v0.5 | 2024-07 | 150,000 | Added rain, dusk scenarios | Improved (91% IoU) |
| v1.0 | 2024-10 | 250,000 | Full ODD coverage, QC pass | Final (95.2% IoU) [PASS] |
Traceability: Model performance → Dataset version (reproducibility for safety assessment)
Dataset Lineage for Safety Assessment: TÜV assessors require complete dataset lineage: (1) Source provenance (public datasets vs proprietary), (2) Annotation methodology (guidelines, QC process), (3) Data splits (train/val/test), (4) Augmentation applied. Document this in the MLE.2 Dataset Management work product.
MLE.3: Model Development
Architecture Selection
Task: Pixel-wise lane line segmentation (semantic segmentation)
Candidate Architectures:
| Model | Params | Latency (Jetson Orin) | IoU (val) | Decision |
|---|---|---|---|---|
| U-Net | 31M | 45ms | 93.1% | [FAIL] Too slow (>30ms) |
| FCN-ResNet50 | 35M | 50ms | 92.8% | [FAIL] Too slow |
| DeepLabV3-MobileNetV2 | 5M | 18ms | 90.2% | [WARN] Fast but low accuracy |
| EfficientNet-Lite4 + DeepLabV3 | 12M | 25ms | 95.2% | [PASS] Selected (best accuracy/latency trade-off) |
Rationale: EfficientNet-Lite4 optimized for mobile/embedded (Jetson Orin), DeepLabV3 SOTA segmentation
Training Process
Hyperparameter Search (150 experiments tracked in MLflow):
Experiment Tracker: MLflow
Hyperparameters Tuned:
- Learning rate: {1e-4, 5e-4, 1e-3, 5e-3}
- Batch size: {16, 32, 64}
- Optimizer: {Adam, AdamW, SGD with momentum}
- Data augmentation: {Random brightness/contrast, Gaussian blur, synthetic rain}
- Loss function: {Dice loss, Focal loss, Combo (Dice + Focal)}
Best Configuration (Experiment #127):
# MLflow experiment tracking
import mlflow
mlflow.start_run(run_name="exp-127-efficientnet-deeplabv3")
config = {
"architecture": "EfficientNet-Lite4 + DeepLabV3",
"learning_rate": 5e-4,
"batch_size": 32,
"optimizer": "AdamW",
"epochs": 200,
"loss_function": "Combo (0.5 Dice + 0.5 Focal)",
"data_augmentation": {
"brightness": [-0.2, +0.2],
"contrast": [0.8, 1.2],
"gaussian_blur": 0.1, # 10% of images
"synthetic_rain": 0.05 # 5% of images
}
}
mlflow.log_params(config)
# Training loop (200 epochs, ~120 hours on 8x A100 GPUs)
for epoch in range(200):
train_loss, train_iou = train_one_epoch(model, train_loader)
val_loss, val_iou = validate(model, val_loader)
mlflow.log_metrics({
"train_loss": train_loss,
"train_iou": train_iou,
"val_loss": val_loss,
"val_iou": val_iou
}, step=epoch)
# Save best model (checkpoint)
if val_iou > best_iou:
best_iou = val_iou
torch.save(model.state_dict(), "best_model.pth")
mlflow.pytorch.log_model(model, "lane_detection_cnn")
mlflow.end_run()
Training Results (Experiment #127):
- Final Validation IoU: 95.2%
- Training Time: 120 hours (8x NVIDIA A100 GPUs)
- Convergence: Epoch 180 (plateaued, early stopping at epoch 200)
MLE.4: Model Verification
Test Set Evaluation
Test Set: 25,000 images (held-out, never seen during training)
Evaluation Metrics:
from sklearn.metrics import jaccard_score, precision_score, recall_score
# Load test set (25,000 images + ground truth masks)
test_images, test_masks = load_test_set("data/test/")
# Inference on test set
predictions = []
for image in test_images:
pred_mask = model.predict(image) # Output: binary mask (lane pixels)
predictions.append(pred_mask)
# Calculate metrics
iou = jaccard_score(test_masks.flatten(), predictions.flatten(), average='binary')
precision = precision_score(test_masks.flatten(), predictions.flatten())
recall = recall_score(test_masks.flatten(), predictions.flatten())
print(f"Test Set IoU: {iou:.3f}") # 0.952 (95.2%)
print(f"Precision: {precision:.3f}") # 0.968 (96.8%)
print(f"Recall: {recall:.3f}") # 0.937 (93.7%)
Results:
- IoU: 95.2% [PASS] (exceeds 92% requirement)
- Precision: 96.8% (few false positives, predicted lane pixels are correct)
- Recall: 93.7% (6.3% of actual lane pixels missed, acceptable)
Failure Analysis (worst 100 images, IoU <70%):
| Failure Mode | Count | IoU Range | Root Cause |
|---|---|---|---|
| Severe occlusion (truck blocks view) | 35 | 50-65% | Model can't see lane lines (expected failure) |
| Heavy shadows (trees, overpasses) | 28 | 55-70% | Low contrast, model confuses shadows with lanes |
| Construction zone (temporary yellow lines) | 22 | 60-68% | Yellow lines not in training data (dataset bias) |
| Worn markings (barely visible) | 15 | 58-70% | At ODD boundary (faded lanes), marginal detection |
Mitigation:
- Occlusion: Temporal smoothing (use previous frames to interpolate)
- Shadows: Data augmentation (add synthetic shadows during training)
- Construction zones: Add 5,000 construction images to dataset v1.1 (future retraining)
Failure Mode Categorization: Document failure modes by root cause category: (1) Dataset gaps (missing scenarios), (2) Model architecture limitations (receptive field too small), (3) Sensor limitations (camera dynamic range), (4) Labeling errors (annotation mistakes). This categorization guides targeted improvements.
Corner Case Testing (SOTIF)
Goal: Test model on 10,000 edge cases (ISO 21448 SOTIF requirement)
Corner Case Categories:
| Category | Scenarios | Purpose |
|---|---|---|
| ODD Boundary | Near-faded markings, max curvature (R=150m) | Test model at ODD limits |
| Rare Events | Roadkill on lane line, glare from wet road | Low-probability scenarios |
| Adversarial | Synthetic perturbations (add noise to image) | Robustness to attacks |
| Out-of-ODD | Snow, night, construction | Verify model degrades gracefully |
Example Corner Case Test: ODD Boundary (Faded Lane Markings)
# Test Case: TC-MLE-SOTIF-042
# Description: Faded lane markings (visibility ≈30%, near ODD exit)
# Load faded lane test images (collected from rural roads, poor maintenance)
faded_test_images = load_images("data/corner_cases/faded_lanes/") # 500 images
# Inference
confidence_scores = []
for image in faded_test_images:
pred_mask = model.predict(image)
confidence = calculate_confidence(pred_mask) # 0.0-1.0 score
confidence_scores.append(confidence)
# Acceptance Criteria:
# 1. Model outputs low confidence (<0.5) for faded lanes → Triggers LKA disable
# 2. No high-confidence false detections (precision ≥ 80%)
mean_confidence = np.mean(confidence_scores) # 0.42 (low, as expected)
low_conf_rate = np.sum(np.array(confidence_scores) < 0.5) / len(confidence_scores)
assert mean_confidence < 0.6, f"Expected low confidence for faded lanes, got {mean_confidence}"
assert low_conf_rate > 0.7, f"Expected 70%+ low-confidence predictions, got {low_conf_rate}"
print(f"[PASS] PASS: Model correctly identifies faded lanes as low-confidence")
Result: 78% of faded lane images → low confidence (<0.5) → LKA disables (safe degradation) [PASS]
Summary
MLE Process Deliverables:
| MLE Phase | Work Product | Tool | ASPICE Mapping |
|---|---|---|---|
| MLE.1 Requirements | ML Requirements Spec (120 reqs) | Jama Connect | SWE.1 (extended for ML) |
| MLE.2 Dataset | Versioned dataset (250k images) | DVC, CVAT | - (ML-specific) |
| MLE.3 Development | Trained model (95.2% IoU) | PyTorch, MLflow | SWE.3 (model as "code") |
| MLE.4 Verification | Model Verification Report | Python (test scripts) | SWE.4 (extended for ML) |
| MLE.5 Deployment | TensorRT optimized model | NVIDIA TensorRT | SWE.5 Integration |
| MLE.6 Monitoring | Field performance dashboard | MLflow, Prometheus | - (post-market) |
AI Contribution to MLE:
- Dataset annotation: 20,000 hours (manual, no AI replacement yet)
- Hyperparameter tuning: Optuna (Bayesian optimization) saved 50% trial-and-error time
- Model architecture selection: Literature review (ChatGPT-4 summarized 30 papers)
Next: SOTIF considerations for ML-based perception (28.02).