3.3: HITL Decision Making

Human-in-the-Loop for Safety-Critical Systems

When to Trust AI vs Human Decision

ISO 26262 Requirement: Human approval is mandatory for safety-critical decisions (ASIL-B and above) per ISO 26262-6:2018, Section 9.4.3.

Decision Matrix:

Task	AI Autonomy	Human Role	Rationale
Boilerplate code	High (AI decides)	Review only	Low risk, repetitive
Unit tests	Medium (AI suggests)	Approve/modify	AI generates, human adds edge cases
Requirements extraction	Low (AI assists)	Final approval	High risk, customer-facing
Architecture decisions	None	Human decides	Critical trade-offs, context-specific
Safety logic	None	Human implements	ASIL-B+ requires human design
Code review	Medium (AI flags issues)	Final decision	AI finds violations, human approves fix

Decision Framework

The 3-Level HITL Model

Level 1: AI Autonomous (Review Only)

Tasks: Code formatting, documentation generation, simple refactoring
AI Action: Generates output automatically
Human Action: Review after the fact (no prior approval needed)
Approval: Post-review (code review process)

Example:

/* AI generates Doxygen header automatically */
/**
 * @brief Calculate safe following distance
 * @implements [SWE-045-11]
 * @param[in] speed_kmh Vehicle speed in km/h
 * @return Safe distance in meters
 */

→ Human reviews in code review (low risk)

Level 2: AI Suggests, Human Approves (Pre-Approval)

Tasks: Code generation, test generation, requirements extraction
AI Action: Generates suggestion
Human Action: Review and approve before merging
Approval: Pre-merge approval required

Example:

/* AI generates function */
float ACC_CalculateSafeDistance(float speed_kmh) {
    return (speed_kmh / 3.6F) * 2.0F;
}

→ Human reviews, adds error handling, approves

Level 3: Human Decides, AI Assists (Human-Led)

Tasks: Architecture decisions, safety requirements, trade-off analysis
AI Action: Provides information, alternatives
Human Action: Makes final decision
Approval: Human owns decision (ADR documented)

Example:

Question: "Should I use Kalman filter or ML for sensor fusion?"

AI provides:
- Option A: Kalman filter (95% accuracy, €0 cost, proven)
- Option B: ML (98% accuracy, €50k cost, novel)

Human decides:
- Chooses Kalman filter (meets requirement ≥95%, lower cost/risk)
- Documents in ADR-007

Decision Triggers

When to Override AI

Trigger 1: Safety-Critical Code

/* AI Output: */
void EmergencyBrake(void) {
    Brake_Apply(100);  /* Full braking */
}

/* Human Override: Add safety checks */
void EmergencyBrake(void) {
    /* Safety: Check sensor validity before braking */
    if (!Sensor_IsValid()) {
        Log_Error(ERROR_SENSOR_INVALID);
        return;  /* Don't brake if sensors failed */
    }

    /* Safety: Check vehicle speed (don't brake if stopped) */
    if (GetVehicleSpeed() < 5.0F) {
        return;
    }

    Brake_Apply(100);  /* Full braking */
    Log_SafetyEvent(EVENT_EMERGENCY_BRAKE);
}

Rationale: AI doesn't understand safety implications (sensor failure, redundancy)

Trigger 2: Context-Specific Requirements

AI suggests: "Use ML for obstacle detection (98% accuracy)"

Human overrides: "Use Kalman filter instead"

Rationale:
- Customer doesn't require 98% (95% sufficient)
- ML adds €50k cost (exceeds budget)
- Kalman filter proven, easier ASIL-B verification
- Documented in ADR-007

Rationale: AI doesn't know project budget, schedule, customer requirements

Trigger 3: Compliance/Standards

/* AI Output: Uses malloc (dynamic memory) */
void* AllocateBuffer(size_t size) {
    /* [FAIL] MISRA Rule 21.3: Avoid dynamic memory allocation */
    /* (malloc/free prohibited in safety-critical embedded systems) */
    return malloc(size);
}

/* Human Override: Static allocation */
#define BUFFER_SIZE 1024
static uint8_t g_buffer[BUFFER_SIZE];

void* AllocateBuffer(void) {
    return g_buffer;  /* [PASS] Static allocation (MISRA-compliant) */
}

Rationale: AI doesn't always respect safety standards (MISRA, CERT)

When to Escalate (Not Trust AI OR Self)

Escalation Triggers:

Architectural Decision: Affects multiple modules, long-term impact
Safety Trade-off: Impacts ASIL classification, hazard analysis
Regulatory Compliance: ISO 26262, FDA, CE marking
Customer Requirement: Changes to contractual obligations
Significant Cost/Schedule: >€10k or >2 weeks impact

Escalation Path: The following diagram shows the decision escalation hierarchy, from routine AI-assisted decisions through team-level reviews to management approval for high-impact changes.

HITL Decision Making

Example:

Situation: AI suggests using Adaptive AUTOSAR (€150k tooling cost)

Question: "Should I switch from Classic to Adaptive AUTOSAR?"

Escalation:
1. Discuss with senior engineer (technical feasibility)
2. Escalate to architect (system-wide impact)
3. Escalate to project manager (cost, schedule)
4. Escalate to customer (contractual requirements)

Decision: Stick with Classic (customer doesn't require OTA updates)

HITL Quality Gates

Mandatory Human Approvals

Quality Gate 1: Requirements Baseline

AI Role: Extract requirements from customer spec
Human Approval: Systems engineer reviews, clarifies ambiguities, gets customer sign-off
Gate: Requirements baselined in DOORS

Quality Gate 2: Architecture Review

AI Role: Generate architecture diagrams, suggest patterns
Human Approval: Architect reviews, makes trade-off decisions, documents ADRs
Gate: Architecture review meeting (stakeholders sign-off)

Quality Gate 3: Code Review

AI Role: Generate code, flag MISRA violations
Human Approval: Engineer reviews, tests, approves merge
Gate: Code review checklist completed, PR approved

Quality Gate 4: Safety Review (ASIL-B and above)

AI Role: None (AI cannot approve safety-critical code)
Human Approval: Safety engineer reviews, verifies fail-safe behavior, approves
Gate: Safety review report signed

Quality Gate 5: Release Approval

AI Role: Generate release notes, changelog
Human Approval: Project manager approves release to customer
Gate: Release tag created, artifacts published

AI Confidence Scoring

How to Assess AI Reliability

Question to Ask: "How confident should I be in this AI output?"

Confidence Indicators:

High Confidence (Trust with Light Review):

Simple, well-defined task (code formatting, documentation)
AI has seen many examples (common patterns like CAN parsing)
Output compiles and passes tests
Static analysis clean (no MISRA violations)

Medium Confidence (Review Carefully):

Moderate complexity (PID controller, sensor fusion)
Domain-specific knowledge required
Some MISRA violations or test failures
Edge cases may be missing

Low Confidence (Heavy Review or Rewrite):

High complexity (safety-critical logic, novel algorithms)
AI suggests unfamiliar APIs (possible hallucination)
Many compilation errors or test failures
Missing requirements traceability

Example Assessment:

Task: Generate CAN message parser

Confidence Score: 6/10 (Medium)

AI Output:
- Compiles: [PASS] (+2)
- Tests pass: [PASS] (+2)
- MISRA clean: [PASS] (+2)
- Handles null pointers: [PASS] (+1)
- Handles CAN timeout: [FAIL] (-1)

Decision: Medium confidence - add timeout handling, then approve

Decision Documentation

Record HITL Decisions (Traceability)

Why Document:

ASPICE SUP.9 (problem resolution): Why was AI output rejected or modified?
ISO 26262 (safety audit trail): Who approved safety-critical code?
Continuous improvement: Learn from AI mistakes

Decision Log Template:

## HITL Decision Log

**Date**: 2025-12-18T14:30:00Z (ISO 8601 format)
**Engineer**: John Smith
**Task**: Implement [SWE-045-11] Safe Following Distance

### AI Suggestion
Function: ACC_CalculateSafeDistance
Code: [PASTE AI OUTPUT]

### Human Review
**Issues Found**:
1. Missing: Input validation (negative speed)
2. Missing: @implements tag for traceability
3. MISRA 10.4: Implicit float conversion

**Decision**: Modify AI output (not suitable as-is)

### Human Modifications
1. Added input validation (negative → 0)
2. Added @implements [SWE-045-11]
3. Fixed MISRA 10.4 (explicit cast)

### Final Approval
Reviewer: Alice Johnson (Senior Engineer)
Status: Approved for merge
PR: #145

Summary

3-Level HITL Model:

AI Autonomous (low risk, post-review)
AI Suggests, Human Approves (medium risk, pre-approval)
Human Decides, AI Assists (high risk, human-led)

Override Triggers: Safety-critical code, context-specific requirements, compliance/standards

Escalation Triggers: Architectural decisions, safety trade-offs, regulatory compliance, significant cost/schedule

Quality Gates: Requirements baseline, architecture review, code review, safety review (ASIL-B and above), release approval

Confidence Scoring: High (trust with light review), Medium (review carefully), Low (heavy review or rewrite)

Decision Documentation: Record HITL decisions for traceability, audit trail, continuous improvement

Key Principle: AI assists, human decides — especially for safety-critical, context-specific, or compliance-sensitive tasks.

Next: AI Tool Selection (35.04) — Choosing the right AI tool for the task