3.3: HITL Decision Making
Human-in-the-Loop for Safety-Critical Systems
When to Trust AI vs Human Decision
ISO 26262 Requirement: Human approval is mandatory for safety-critical decisions (ASIL-B and above) per ISO 26262-6:2018, Section 9.4.3.
Decision Matrix:
| Task | AI Autonomy | Human Role | Rationale |
|---|---|---|---|
| Boilerplate code | High (AI decides) | Review only | Low risk, repetitive |
| Unit tests | Medium (AI suggests) | Approve/modify | AI generates, human adds edge cases |
| Requirements extraction | Low (AI assists) | Final approval | High risk, customer-facing |
| Architecture decisions | None | Human decides | Critical trade-offs, context-specific |
| Safety logic | None | Human implements | ASIL-B+ requires human design |
| Code review | Medium (AI flags issues) | Final decision | AI finds violations, human approves fix |
Decision Framework
The 3-Level HITL Model
Level 1: AI Autonomous (Review Only)
- Tasks: Code formatting, documentation generation, simple refactoring
- AI Action: Generates output automatically
- Human Action: Review after the fact (no prior approval needed)
- Approval: Post-review (code review process)
Example:
/* AI generates Doxygen header automatically */
/**
* @brief Calculate safe following distance
* @implements [SWE-045-11]
* @param[in] speed_kmh Vehicle speed in km/h
* @return Safe distance in meters
*/
→ Human reviews in code review (low risk)
Level 2: AI Suggests, Human Approves (Pre-Approval)
- Tasks: Code generation, test generation, requirements extraction
- AI Action: Generates suggestion
- Human Action: Review and approve before merging
- Approval: Pre-merge approval required
Example:
/* AI generates function */
float ACC_CalculateSafeDistance(float speed_kmh) {
return (speed_kmh / 3.6F) * 2.0F;
}
→ Human reviews, adds error handling, approves
Level 3: Human Decides, AI Assists (Human-Led)
- Tasks: Architecture decisions, safety requirements, trade-off analysis
- AI Action: Provides information, alternatives
- Human Action: Makes final decision
- Approval: Human owns decision (ADR documented)
Example:
Question: "Should I use Kalman filter or ML for sensor fusion?"
AI provides:
- Option A: Kalman filter (95% accuracy, €0 cost, proven)
- Option B: ML (98% accuracy, €50k cost, novel)
Human decides:
- Chooses Kalman filter (meets requirement ≥95%, lower cost/risk)
- Documents in ADR-007
Decision Triggers
When to Override AI
Trigger 1: Safety-Critical Code
/* AI Output: */
void EmergencyBrake(void) {
Brake_Apply(100); /* Full braking */
}
/* Human Override: Add safety checks */
void EmergencyBrake(void) {
/* Safety: Check sensor validity before braking */
if (!Sensor_IsValid()) {
Log_Error(ERROR_SENSOR_INVALID);
return; /* Don't brake if sensors failed */
}
/* Safety: Check vehicle speed (don't brake if stopped) */
if (GetVehicleSpeed() < 5.0F) {
return;
}
Brake_Apply(100); /* Full braking */
Log_SafetyEvent(EVENT_EMERGENCY_BRAKE);
}
Rationale: AI doesn't understand safety implications (sensor failure, redundancy)
Trigger 2: Context-Specific Requirements
AI suggests: "Use ML for obstacle detection (98% accuracy)"
Human overrides: "Use Kalman filter instead"
Rationale:
- Customer doesn't require 98% (95% sufficient)
- ML adds €50k cost (exceeds budget)
- Kalman filter proven, easier ASIL-B verification
- Documented in ADR-007
Rationale: AI doesn't know project budget, schedule, customer requirements
Trigger 3: Compliance/Standards
/* AI Output: Uses malloc (dynamic memory) */
void* AllocateBuffer(size_t size) {
/* [FAIL] MISRA Rule 21.3: Avoid dynamic memory allocation */
/* (malloc/free prohibited in safety-critical embedded systems) */
return malloc(size);
}
/* Human Override: Static allocation */
#define BUFFER_SIZE 1024
static uint8_t g_buffer[BUFFER_SIZE];
void* AllocateBuffer(void) {
return g_buffer; /* [PASS] Static allocation (MISRA-compliant) */
}
Rationale: AI doesn't always respect safety standards (MISRA, CERT)
When to Escalate (Not Trust AI OR Self)
Escalation Triggers:
- Architectural Decision: Affects multiple modules, long-term impact
- Safety Trade-off: Impacts ASIL classification, hazard analysis
- Regulatory Compliance: ISO 26262, FDA, CE marking
- Customer Requirement: Changes to contractual obligations
- Significant Cost/Schedule: >€10k or >2 weeks impact
Escalation Path: The following diagram shows the decision escalation hierarchy, from routine AI-assisted decisions through team-level reviews to management approval for high-impact changes.
Example:
Situation: AI suggests using Adaptive AUTOSAR (€150k tooling cost)
Question: "Should I switch from Classic to Adaptive AUTOSAR?"
Escalation:
1. Discuss with senior engineer (technical feasibility)
2. Escalate to architect (system-wide impact)
3. Escalate to project manager (cost, schedule)
4. Escalate to customer (contractual requirements)
Decision: Stick with Classic (customer doesn't require OTA updates)
HITL Quality Gates
Mandatory Human Approvals
Quality Gate 1: Requirements Baseline
- AI Role: Extract requirements from customer spec
- Human Approval: Systems engineer reviews, clarifies ambiguities, gets customer sign-off
- Gate: Requirements baselined in DOORS
Quality Gate 2: Architecture Review
- AI Role: Generate architecture diagrams, suggest patterns
- Human Approval: Architect reviews, makes trade-off decisions, documents ADRs
- Gate: Architecture review meeting (stakeholders sign-off)
Quality Gate 3: Code Review
- AI Role: Generate code, flag MISRA violations
- Human Approval: Engineer reviews, tests, approves merge
- Gate: Code review checklist completed, PR approved
Quality Gate 4: Safety Review (ASIL-B and above)
- AI Role: None (AI cannot approve safety-critical code)
- Human Approval: Safety engineer reviews, verifies fail-safe behavior, approves
- Gate: Safety review report signed
Quality Gate 5: Release Approval
- AI Role: Generate release notes, changelog
- Human Approval: Project manager approves release to customer
- Gate: Release tag created, artifacts published
AI Confidence Scoring
How to Assess AI Reliability
Question to Ask: "How confident should I be in this AI output?"
Confidence Indicators:
High Confidence (Trust with Light Review):
- Simple, well-defined task (code formatting, documentation)
- AI has seen many examples (common patterns like CAN parsing)
- Output compiles and passes tests
- Static analysis clean (no MISRA violations)
Medium Confidence (Review Carefully):
- Moderate complexity (PID controller, sensor fusion)
- Domain-specific knowledge required
- Some MISRA violations or test failures
- Edge cases may be missing
Low Confidence (Heavy Review or Rewrite):
- High complexity (safety-critical logic, novel algorithms)
- AI suggests unfamiliar APIs (possible hallucination)
- Many compilation errors or test failures
- Missing requirements traceability
Example Assessment:
Task: Generate CAN message parser
Confidence Score: 6/10 (Medium)
AI Output:
- Compiles: [PASS] (+2)
- Tests pass: [PASS] (+2)
- MISRA clean: [PASS] (+2)
- Handles null pointers: [PASS] (+1)
- Handles CAN timeout: [FAIL] (-1)
Decision: Medium confidence - add timeout handling, then approve
Decision Documentation
Record HITL Decisions (Traceability)
Why Document:
- ASPICE SUP.9 (problem resolution): Why was AI output rejected or modified?
- ISO 26262 (safety audit trail): Who approved safety-critical code?
- Continuous improvement: Learn from AI mistakes
Decision Log Template:
## HITL Decision Log
**Date**: 2025-12-18T14:30:00Z (ISO 8601 format)
**Engineer**: John Smith
**Task**: Implement [SWE-045-11] Safe Following Distance
### AI Suggestion
Function: ACC_CalculateSafeDistance
Code: [PASTE AI OUTPUT]
### Human Review
**Issues Found**:
1. Missing: Input validation (negative speed)
2. Missing: @implements tag for traceability
3. MISRA 10.4: Implicit float conversion
**Decision**: Modify AI output (not suitable as-is)
### Human Modifications
1. Added input validation (negative → 0)
2. Added @implements [SWE-045-11]
3. Fixed MISRA 10.4 (explicit cast)
### Final Approval
Reviewer: Alice Johnson (Senior Engineer)
Status: Approved for merge
PR: #145
Summary
3-Level HITL Model:
- AI Autonomous (low risk, post-review)
- AI Suggests, Human Approves (medium risk, pre-approval)
- Human Decides, AI Assists (high risk, human-led)
Override Triggers: Safety-critical code, context-specific requirements, compliance/standards
Escalation Triggers: Architectural decisions, safety trade-offs, regulatory compliance, significant cost/schedule
Quality Gates: Requirements baseline, architecture review, code review, safety review (ASIL-B and above), release approval
Confidence Scoring: High (trust with light review), Medium (review carefully), Low (heavy review or rewrite)
Decision Documentation: Record HITL decisions for traceability, audit trail, continuous improvement
Key Principle: AI assists, human decides — especially for safety-critical, context-specific, or compliance-sensitive tasks.
Next: AI Tool Selection (35.04) — Choosing the right AI tool for the task