3.3: HITL Decision Making

Human-in-the-Loop for Safety-Critical Systems

When to Trust AI vs Human Decision

ISO 26262 Requirement: Human approval is mandatory for safety-critical decisions (ASIL-B and above) per ISO 26262-6:2018, Section 9.4.3.

Decision Matrix:

Task AI Autonomy Human Role Rationale
Boilerplate code High (AI decides) Review only Low risk, repetitive
Unit tests Medium (AI suggests) Approve/modify AI generates, human adds edge cases
Requirements extraction Low (AI assists) Final approval High risk, customer-facing
Architecture decisions None Human decides Critical trade-offs, context-specific
Safety logic None Human implements ASIL-B+ requires human design
Code review Medium (AI flags issues) Final decision AI finds violations, human approves fix

Decision Framework

The 3-Level HITL Model

Level 1: AI Autonomous (Review Only)

  • Tasks: Code formatting, documentation generation, simple refactoring
  • AI Action: Generates output automatically
  • Human Action: Review after the fact (no prior approval needed)
  • Approval: Post-review (code review process)

Example:

/* AI generates Doxygen header automatically */
/**
 * @brief Calculate safe following distance
 * @implements [SWE-045-11]
 * @param[in] speed_kmh Vehicle speed in km/h
 * @return Safe distance in meters
 */

→ Human reviews in code review (low risk)


Level 2: AI Suggests, Human Approves (Pre-Approval)

  • Tasks: Code generation, test generation, requirements extraction
  • AI Action: Generates suggestion
  • Human Action: Review and approve before merging
  • Approval: Pre-merge approval required

Example:

/* AI generates function */
float ACC_CalculateSafeDistance(float speed_kmh) {
    return (speed_kmh / 3.6F) * 2.0F;
}

→ Human reviews, adds error handling, approves


Level 3: Human Decides, AI Assists (Human-Led)

  • Tasks: Architecture decisions, safety requirements, trade-off analysis
  • AI Action: Provides information, alternatives
  • Human Action: Makes final decision
  • Approval: Human owns decision (ADR documented)

Example:

Question: "Should I use Kalman filter or ML for sensor fusion?"

AI provides:
- Option A: Kalman filter (95% accuracy, €0 cost, proven)
- Option B: ML (98% accuracy, €50k cost, novel)

Human decides:
- Chooses Kalman filter (meets requirement ≥95%, lower cost/risk)
- Documents in ADR-007

Decision Triggers

When to Override AI

Trigger 1: Safety-Critical Code

/* AI Output: */
void EmergencyBrake(void) {
    Brake_Apply(100);  /* Full braking */
}

/* Human Override: Add safety checks */
void EmergencyBrake(void) {
    /* Safety: Check sensor validity before braking */
    if (!Sensor_IsValid()) {
        Log_Error(ERROR_SENSOR_INVALID);
        return;  /* Don't brake if sensors failed */
    }

    /* Safety: Check vehicle speed (don't brake if stopped) */
    if (GetVehicleSpeed() < 5.0F) {
        return;
    }

    Brake_Apply(100);  /* Full braking */
    Log_SafetyEvent(EVENT_EMERGENCY_BRAKE);
}

Rationale: AI doesn't understand safety implications (sensor failure, redundancy)


Trigger 2: Context-Specific Requirements

AI suggests: "Use ML for obstacle detection (98% accuracy)"

Human overrides: "Use Kalman filter instead"

Rationale:
- Customer doesn't require 98% (95% sufficient)
- ML adds €50k cost (exceeds budget)
- Kalman filter proven, easier ASIL-B verification
- Documented in ADR-007

Rationale: AI doesn't know project budget, schedule, customer requirements


Trigger 3: Compliance/Standards

/* AI Output: Uses malloc (dynamic memory) */
void* AllocateBuffer(size_t size) {
    /* [FAIL] MISRA Rule 21.3: Avoid dynamic memory allocation */
    /* (malloc/free prohibited in safety-critical embedded systems) */
    return malloc(size);
}

/* Human Override: Static allocation */
#define BUFFER_SIZE 1024
static uint8_t g_buffer[BUFFER_SIZE];

void* AllocateBuffer(void) {
    return g_buffer;  /* [PASS] Static allocation (MISRA-compliant) */
}

Rationale: AI doesn't always respect safety standards (MISRA, CERT)


When to Escalate (Not Trust AI OR Self)

Escalation Triggers:

  1. Architectural Decision: Affects multiple modules, long-term impact
  2. Safety Trade-off: Impacts ASIL classification, hazard analysis
  3. Regulatory Compliance: ISO 26262, FDA, CE marking
  4. Customer Requirement: Changes to contractual obligations
  5. Significant Cost/Schedule: >€10k or >2 weeks impact

Escalation Path: The following diagram shows the decision escalation hierarchy, from routine AI-assisted decisions through team-level reviews to management approval for high-impact changes.

HITL Decision Making

Example:

Situation: AI suggests using Adaptive AUTOSAR (€150k tooling cost)

Question: "Should I switch from Classic to Adaptive AUTOSAR?"

Escalation:
1. Discuss with senior engineer (technical feasibility)
2. Escalate to architect (system-wide impact)
3. Escalate to project manager (cost, schedule)
4. Escalate to customer (contractual requirements)

Decision: Stick with Classic (customer doesn't require OTA updates)

HITL Quality Gates

Mandatory Human Approvals

Quality Gate 1: Requirements Baseline

  • AI Role: Extract requirements from customer spec
  • Human Approval: Systems engineer reviews, clarifies ambiguities, gets customer sign-off
  • Gate: Requirements baselined in DOORS

Quality Gate 2: Architecture Review

  • AI Role: Generate architecture diagrams, suggest patterns
  • Human Approval: Architect reviews, makes trade-off decisions, documents ADRs
  • Gate: Architecture review meeting (stakeholders sign-off)

Quality Gate 3: Code Review

  • AI Role: Generate code, flag MISRA violations
  • Human Approval: Engineer reviews, tests, approves merge
  • Gate: Code review checklist completed, PR approved

Quality Gate 4: Safety Review (ASIL-B and above)

  • AI Role: None (AI cannot approve safety-critical code)
  • Human Approval: Safety engineer reviews, verifies fail-safe behavior, approves
  • Gate: Safety review report signed

Quality Gate 5: Release Approval

  • AI Role: Generate release notes, changelog
  • Human Approval: Project manager approves release to customer
  • Gate: Release tag created, artifacts published

AI Confidence Scoring

How to Assess AI Reliability

Question to Ask: "How confident should I be in this AI output?"

Confidence Indicators:

High Confidence (Trust with Light Review):

  • Simple, well-defined task (code formatting, documentation)
  • AI has seen many examples (common patterns like CAN parsing)
  • Output compiles and passes tests
  • Static analysis clean (no MISRA violations)

Medium Confidence (Review Carefully):

  • Moderate complexity (PID controller, sensor fusion)
  • Domain-specific knowledge required
  • Some MISRA violations or test failures
  • Edge cases may be missing

Low Confidence (Heavy Review or Rewrite):

  • High complexity (safety-critical logic, novel algorithms)
  • AI suggests unfamiliar APIs (possible hallucination)
  • Many compilation errors or test failures
  • Missing requirements traceability

Example Assessment:

Task: Generate CAN message parser

Confidence Score: 6/10 (Medium)

AI Output:
- Compiles: [PASS] (+2)
- Tests pass: [PASS] (+2)
- MISRA clean: [PASS] (+2)
- Handles null pointers: [PASS] (+1)
- Handles CAN timeout: [FAIL] (-1)

Decision: Medium confidence - add timeout handling, then approve

Decision Documentation

Record HITL Decisions (Traceability)

Why Document:

  • ASPICE SUP.9 (problem resolution): Why was AI output rejected or modified?
  • ISO 26262 (safety audit trail): Who approved safety-critical code?
  • Continuous improvement: Learn from AI mistakes

Decision Log Template:

## HITL Decision Log

**Date**: 2025-12-18T14:30:00Z (ISO 8601 format)
**Engineer**: John Smith
**Task**: Implement [SWE-045-11] Safe Following Distance

### AI Suggestion
Function: ACC_CalculateSafeDistance
Code: [PASTE AI OUTPUT]

### Human Review
**Issues Found**:
1. Missing: Input validation (negative speed)
2. Missing: @implements tag for traceability
3. MISRA 10.4: Implicit float conversion

**Decision**: Modify AI output (not suitable as-is)

### Human Modifications
1. Added input validation (negative → 0)
2. Added @implements [SWE-045-11]
3. Fixed MISRA 10.4 (explicit cast)

### Final Approval
Reviewer: Alice Johnson (Senior Engineer)
Status: Approved for merge
PR: #145

Summary

3-Level HITL Model:

  1. AI Autonomous (low risk, post-review)
  2. AI Suggests, Human Approves (medium risk, pre-approval)
  3. Human Decides, AI Assists (high risk, human-led)

Override Triggers: Safety-critical code, context-specific requirements, compliance/standards

Escalation Triggers: Architectural decisions, safety trade-offs, regulatory compliance, significant cost/schedule

Quality Gates: Requirements baseline, architecture review, code review, safety review (ASIL-B and above), release approval

Confidence Scoring: High (trust with light review), Medium (review carefully), Low (heavy review or rewrite)

Decision Documentation: Record HITL decisions for traceability, audit trail, continuous improvement

Key Principle: AI assists, human decides — especially for safety-critical, context-specific, or compliance-sensitive tasks.

Next: AI Tool Selection (35.04) — Choosing the right AI tool for the task