3.2: Human-in-the-Loop Patterns
What You'll Learn
By the end of this chapter, you will be able to:
- Describe six HITL patterns for human-AI collaboration
- Select appropriate patterns for different activities
- Implement HITL patterns in development workflows
- Ensure accountability through pattern application
Why HITL Patterns Matter
AI systems can make errors that differ from human errors. HITL patterns ensure:
- Accountability: Humans remain responsible for outcomes
- Quality: Human judgment catches AI errors
- Safety: Critical decisions involve human oversight
- Trust: Stakeholders trust human-verified outputs
- Improvement: Human feedback improves AI performance
Note: In safety-critical domains governed by ISO 26262, IEC 61508, and DO-178C, human oversight is not optional. Standards mandate that qualified personnel verify and approve safety-relevant work products. HITL patterns formalize this mandate into repeatable, auditable workflows.
The Six HITL Patterns
Pattern 1: Reviewer
In the Reviewer pattern, AI generates a complete artifact and a qualified human evaluates it against acceptance criteria before it enters the workflow.
USE WHEN:
- AI generates substantial content
- Human can verify quality efficiently
- Output format is well-defined
EXAMPLES:
- Code generation - code review
- Test generation - test review
- Documentation draft - human editing
Detailed Description
The Reviewer pattern is the most commonly deployed HITL approach in software development. The AI produces a complete or near-complete artifact, and a qualified human examines it against defined acceptance criteria before it enters the workflow. The reviewer has full authority to accept, modify, or reject the output.
| Aspect | Detail |
|---|---|
| AI responsibility | Generate artifact (code, document, test case) |
| Human responsibility | Evaluate correctness, completeness, and compliance |
| Feedback loop | Reviewer comments feed back to AI for regeneration |
| Audit evidence | Review record with accept/reject decision and rationale |
| Typical throughput | 3-10 artifacts per hour depending on complexity |
Tip: Establish a review checklist specific to each artifact type. For AI-generated code, the checklist should include functional correctness, coding standard compliance, security considerations, and traceability to requirements.
Pattern 2: Approver
The Approver pattern adds a binary authorization gate where the human grants or denies permission for an AI-prepared action to proceed.
USE WHEN:
- Action has significant impact
- Human authorization is required
- Reversibility is limited
EXAMPLES:
- Deployment recommendation - human approval
- Security fix - human authorization
- Configuration change - human sign-off
Detailed Description
The Approver pattern differs from the Reviewer in a critical respect: the human does not modify the output but instead provides a binary authorization gate. The AI prepares a recommendation along with supporting evidence, and the approver either grants or denies permission to proceed. This pattern maps directly to gate reviews in ASPICE process models.
| Aspect | Detail |
|---|---|
| AI responsibility | Prepare recommendation with evidence package |
| Human responsibility | Evaluate risk and authorize or deny the action |
| Feedback loop | Denial triggers AI to revise recommendation or escalate |
| Audit evidence | Signed approval record with timestamp and identity |
| Typical throughput | 5-20 decisions per hour |
Important: The Approver pattern is mandatory for deployment gates, release authorizations, and any action that modifies a production or safety-critical system. ASPICE SUP.10 (Change Request Management) and MAN.6 (Measurement) require documented approval evidence.
Pattern 3: Monitor
In the Monitor pattern, AI operates continuously while the human observes aggregate metrics and intervenes only on threshold violations or anomalies.
USE WHEN:
- AI operates continuously
- Metrics indicate health
- Human intervention is exception
EXAMPLES:
- CI/CD pipeline monitoring
- Automated test execution
- Static analysis batch runs
Detailed Description
The Monitor pattern applies when an AI system operates at L3 (Full Automation) and humans oversee its operation through dashboards and alerting systems. The human does not review each individual output. Instead, they observe aggregate metrics and intervene only when anomalies or threshold violations occur. This pattern is appropriate for high-volume, low-risk, highly repeatable activities.
| Aspect | Detail |
|---|---|
| AI responsibility | Execute tasks autonomously and report metrics |
| Human responsibility | Watch dashboards, respond to alerts, investigate anomalies |
| Feedback loop | Human adjustments to thresholds and configuration |
| Audit evidence | Dashboard snapshots, alert logs, intervention records |
| Typical throughput | Hundreds to thousands of items per hour |
Caution: The Monitor pattern carries "automation complacency" risk. When the system runs well for extended periods, humans may stop paying attention. Mitigate this with periodic forced reviews and randomized spot-checks.
Pattern 4: Auditor
The Auditor pattern decouples execution from oversight in time -- the AI operates autonomously while a human periodically samples and reviews logged decisions for compliance.
USE WHEN:
- Operations are high-volume
- Compliance verification needed
- Real-time review impractical
EXAMPLES:
- Code review decisions audit
- Security scan trend analysis
- Access control monitoring
Detailed Description
The Auditor pattern decouples execution from oversight in time. The AI operates autonomously, logging every decision and output. A human auditor periodically examines a sample of logs against compliance criteria. This pattern supports regulatory evidence collection and is essential for demonstrating process adherence during ASPICE assessments.
| Aspect | Detail |
|---|---|
| AI responsibility | Execute and log all actions with full traceability |
| Human responsibility | Periodic sampling, trend analysis, compliance checks |
| Feedback loop | Audit findings trigger process or model adjustments |
| Audit evidence | Audit reports, sample records, trend analyses |
| Typical review cycle | Weekly or sprint-aligned |
Note: The Auditor pattern is not a substitute for real-time oversight of safety-critical decisions. Use it for processes where the risk of any single AI error is low but cumulative drift must be detected.
Pattern 5: Escalation
The Escalation pattern lets AI handle routine cases autonomously while routing uncertain or complex cases to a human expert based on defined confidence thresholds.
USE WHEN:
- Volume is high
- Most cases are routine
- Some cases need judgment
EXAMPLES:
- Bug triage (routine - AI, complex - human)
- Support tickets
- Test failure analysis
Detailed Description
The Escalation pattern enables AI to handle routine cases autonomously while routing uncertain or complex cases to a human expert. The AI must be equipped with explicit escalation criteria: confidence thresholds, domain rules, or pattern-matching conditions that trigger the handoff. Well-designed escalation criteria are the difference between a productive workflow and a bottleneck.
| Aspect | Detail |
|---|---|
| AI responsibility | Handle routine cases; detect escalation triggers |
| Human responsibility | Resolve escalated cases; refine escalation criteria |
| Feedback loop | Resolution of escalated cases trains AI on edge cases |
| Audit evidence | Escalation logs with trigger reason and resolution |
| Typical escalation rate | 5-20% of total volume (varies by domain maturity) |
Tip: Start with conservative escalation thresholds (more human involvement) and gradually relax them as the AI demonstrates competence on edge cases. Track the false-escalation rate to avoid overwhelming humans with unnecessary handoffs.
Pattern 6: Collaborator
In the Collaborator pattern, human and AI work iteratively together, each building on the other's contributions through multiple refinement cycles.
USE WHEN:
- Creative/exploratory work
- Multiple iterations expected
- Human expertise + AI capability synergy
EXAMPLES:
- Architecture exploration
- Requirements refinement
- Design alternatives
- Problem investigation
Detailed Description
The Collaborator pattern is the most interactive pattern. Human and AI engage in a turn-based or concurrent working session where each contributes domain expertise and capability. The human provides direction, context, and judgment; the AI provides speed, breadth of knowledge, and consistent execution. This pattern is especially effective for creative, exploratory, and analytical tasks where neither party alone would produce an optimal result.
| Aspect | Detail |
|---|---|
| AI responsibility | Generate alternatives, analyze options, execute directed tasks |
| Human responsibility | Set direction, evaluate options, make decisions |
| Feedback loop | Real-time iterative refinement during the session |
| Audit evidence | Session transcript with decision points annotated |
| Typical session | 30-90 minutes with 5-15 iteration cycles |
Note: The Collaborator pattern produces the highest-quality results for complex tasks but has the lowest throughput. Reserve it for high-value activities such as architecture definition, safety analysis, and requirements decomposition.
Pattern Selection Guide
By Activity Type
| Activity | Recommended Pattern | Rationale |
|---|---|---|
| Code generation | Reviewer | Verify correctness |
| Deployment | Approver | Authorization required |
| CI/CD pipeline | Monitor | Continuous operation |
| Security scanning | Auditor | High volume, periodic review |
| Bug triage | Escalation | Route complex issues |
| Architecture design | Collaborator | Iterative refinement |
By ASIL Level (ISO 26262)
Safety integrity level significantly influences which HITL pattern is appropriate. Higher ASIL levels demand more direct human involvement.
| ASIL Level | Permitted Patterns | Restrictions |
|---|---|---|
| QM (non-safety) | All six patterns | No restrictions; Monitor and Auditor acceptable for routine tasks |
| ASIL A | Reviewer, Approver, Escalation, Collaborator, Auditor | Monitor permitted only for non-decision activities (e.g., build execution) |
| ASIL B | Reviewer, Approver, Escalation, Collaborator | Auditor permitted only with weekly review cycles; Monitor restricted to build/test execution |
| ASIL C | Reviewer, Approver, Collaborator | Escalation permitted only with <10% AI-autonomous resolution; human reviews all safety-related outputs |
| ASIL D | Reviewer, Approver, Collaborator | All AI outputs on safety-critical items require explicit human review or approval; no autonomous AI decisions |
Important: For ASIL C and ASIL D work products, the Reviewer and Approver patterns must include a qualified functional safety engineer in the loop. The engineer's qualifications must be documented per ISO 26262-2, Clause 6.4.
By Process Characteristics
| Characteristic | Best Pattern | Alternative |
|---|---|---|
| High volume, low risk | Monitor | Auditor |
| High volume, medium risk | Escalation | Reviewer |
| Low volume, high risk | Approver | Reviewer |
| Creative/exploratory | Collaborator | Reviewer |
| Compliance-sensitive | Auditor | Reviewer |
| Binary go/no-go decision | Approver | Escalation |
| Continuous operation | Monitor | Auditor |
ASPICE Process Mapping
Each ASPICE process group has different characteristics that favor specific HITL patterns. The table below maps recommended patterns to ASPICE 4.0 process groups.
System Engineering (SYS)
| Process | Process Name | Primary Pattern | Secondary Pattern | Rationale |
|---|---|---|---|---|
| SYS.1 | Requirements Elicitation | Collaborator | Reviewer | Stakeholder interaction requires human judgment; AI assists with analysis |
| SYS.2 | System Requirements Analysis | Reviewer | Collaborator | AI can draft requirements; human validates completeness and consistency |
| SYS.3 | System Architectural Design | Collaborator | Reviewer | Architecture exploration benefits from iterative human-AI sessions |
| SYS.4 | System Integration and Integration Test | Escalation | Monitor | Routine integration tests run autonomously; failures escalate to human |
| SYS.5 | System Qualification Test | Approver | Reviewer | Test results require formal human approval before release |
Software Engineering (SWE)
| Process | Process Name | Primary Pattern | Secondary Pattern | Rationale |
|---|---|---|---|---|
| SWE.1 | Software Requirements Analysis | Reviewer | Collaborator | AI generates derived requirements; human reviews for correctness |
| SWE.2 | Software Architectural Design | Collaborator | Reviewer | Design exploration benefits from human-AI iteration |
| SWE.3 | Software Detailed Design and Unit Construction | Reviewer | Escalation | AI generates code; human reviews; complex cases escalate |
| SWE.4 | Software Unit Verification | Monitor | Escalation | Automated test execution with escalation on failures |
| SWE.5 | Software Integration and Integration Test | Escalation | Monitor | Routine tests automated; integration failures escalate |
| SWE.6 | Software Qualification Test | Approver | Reviewer | Qualification results require formal sign-off |
Support Processes (SUP)
| Process | Process Name | Primary Pattern | Secondary Pattern | Rationale |
|---|---|---|---|---|
| SUP.1 | Quality Assurance | Auditor | Reviewer | Periodic quality audits of AI-generated artifacts |
| SUP.8 | Configuration Management | Monitor | Auditor | Automated CM with periodic human audits |
| SUP.9 | Problem Resolution Management | Escalation | Collaborator | AI triages routine issues; complex problems escalate |
| SUP.10 | Change Request Management | Approver | Escalation | Changes require human authorization |
Management Processes (MAN)
| Process | Process Name | Primary Pattern | Secondary Pattern | Rationale |
|---|---|---|---|---|
| MAN.3 | Project Management | Collaborator | Monitor | Human drives project decisions; AI provides analytics |
| MAN.5 | Risk Management | Collaborator | Reviewer | Risk identification benefits from human-AI collaboration |
| MAN.6 | Measurement | Monitor | Auditor | Automated metric collection with periodic human review |
Implementation Architecture
A robust HITL implementation requires four architectural layers working together.
Layer 1: AI Execution Layer
The AI execution layer produces artifacts, recommendations, or decisions. Every output from this layer must carry metadata enabling downstream oversight.
| Component | Responsibility |
|---|---|
| AI Engine | Execute the primary task (generate code, analyze data, classify items) |
| Confidence Scorer | Attach a confidence score (0.0-1.0) to each output |
| Metadata Tagger | Annotate outputs with traceability info (input hash, model version, timestamp) |
| Output Queue | Buffer outputs for routing to the appropriate HITL pattern handler |
Layer 2: Routing and Decision Layer
This layer evaluates AI outputs and routes them according to the configured HITL pattern.
| Component | Responsibility |
|---|---|
| Pattern Router | Apply pattern-specific rules to determine the next step |
| Threshold Engine | Compare confidence scores against configured thresholds |
| Escalation Manager | Route uncertain outputs to the appropriate human queue |
| Approval Gate | Block actions requiring human authorization until approval is received |
Layer 3: Human Interface Layer
This layer presents AI outputs to humans in a format optimized for efficient decision-making.
| Component | Responsibility |
|---|---|
| Review Dashboard | Present artifacts with context, diffs, and AI rationale |
| Approval Workflow | Structured accept/reject/modify interface with mandatory fields |
| Monitoring Console | Real-time metrics, alerts, and anomaly indicators |
| Audit Portal | Sampling interface for periodic compliance review |
Layer 4: Feedback and Learning Layer
This layer captures human decisions and feeds them back to improve AI performance.
| Component | Responsibility |
|---|---|
| Decision Logger | Record all human decisions with rationale |
| Feedback Aggregator | Compile acceptance/rejection data for model tuning |
| Drift Detector | Identify when AI accuracy degrades over time |
| Metrics Engine | Compute HITL effectiveness metrics (see Metrics section) |
Tip: The four-layer architecture can be implemented incrementally. Start with Layer 1 and Layer 3 (AI produces, human reviews). Add Layer 2 (routing) as volume grows. Add Layer 4 (feedback) once there is enough data to measure trends.
AI Confidence Thresholds
Confidence thresholds define the boundary between autonomous AI operation and human intervention. Setting them correctly is critical: too low, and humans are overwhelmed with unnecessary reviews; too high, and errors slip through.
Threshold Configuration
| Confidence Range | Action | Pattern Activated |
|---|---|---|
| 0.95 - 1.00 | AI proceeds autonomously | Monitor (human observes metrics) |
| 0.80 - 0.94 | AI proceeds with logging for audit | Auditor (periodic human review) |
| 0.60 - 0.79 | AI output queued for human review | Reviewer (human inspects output) |
| 0.40 - 0.59 | AI output escalated to human expert | Escalation (human resolves) |
| 0.00 - 0.39 | AI declines to act; human takes over | Collaborator or manual (L0) |
Threshold Adjustment by Context
Thresholds must be adjusted based on the consequence of error.
| Context Factor | Threshold Adjustment | Example |
|---|---|---|
| Safety-critical output (ASIL C/D) | Raise all thresholds by +0.10 | Code for braking system requires 0.95+ for any autonomous action |
| Security-sensitive output | Raise all thresholds by +0.10 | Authentication logic reviewed at 0.70+ instead of 0.60+ |
| Well-tested domain | Lower thresholds by -0.05 | Formatting checks can proceed autonomously at 0.90+ |
| Novel domain or first deployment | Raise all thresholds by +0.15 | New project starts with conservative thresholds |
| High-volume routine task | Lower thresholds by -0.05 | Build status classification at 0.90+ |
Calibration Process
- Baseline: Run AI on a labeled dataset and record confidence vs. actual correctness
- Plot calibration curve: Confidence should correlate with accuracy (0.90 confidence should mean ~90% correct)
- Identify miscalibration: If 0.90 confidence yields only 75% accuracy, the model is overconfident
- Adjust thresholds: Set the autonomous-operation threshold at the confidence level where accuracy exceeds 95%
- Re-calibrate quarterly: Model drift and data changes require periodic threshold updates
Important: Never set the autonomous-operation threshold solely on model-reported confidence. Always validate against ground truth. A well-calibrated model is one where confidence scores accurately predict correctness rates.
Escalation Protocols
Escalation protocols define the rules and procedures for routing AI outputs or decisions to human experts when predefined conditions are met.
Escalation Triggers
| Trigger Category | Specific Triggers | Priority |
|---|---|---|
| Confidence-based | AI confidence below threshold | Normal |
| Conflict-based | AI recommendation contradicts existing artifact | High |
| Novelty-based | Input falls outside training distribution | High |
| Safety-based | Output affects safety-critical function (ASIL B+) | Critical |
| Compliance-based | Output requires regulatory sign-off | Critical |
| Anomaly-based | Statistical anomaly in AI behavior detected | High |
| Time-based | Decision exceeds maximum allowed AI response time | Normal |
Escalation Levels
| Level | Recipient | Response Time | Decision Authority | Example Trigger |
|---|---|---|---|---|
| L1 - Team | Senior developer or team lead | < 4 hours | Accept, modify, or reject AI output | Confidence below review threshold |
| L2 - Expert | Domain expert or architect | < 8 hours | Override AI decision; adjust process | Contradicts architectural constraint |
| L3 - Safety | Functional safety engineer | < 24 hours | Halt process; mandate manual execution | ASIL C/D related output flagged |
| L4 - Management | Project manager or quality manager | < 48 hours | Process change; tool withdrawal | Systematic AI accuracy degradation |
Escalation Rules for Safety-Critical Decisions
| Decision Type | Mandatory Escalation Level | Additional Requirements |
|---|---|---|
| Safety requirement modification | L3 (Safety engineer) | Impact analysis on safety concept required |
| Safety architecture change | L3 (Safety engineer) + L2 (Architect) | Dual review; both must approve |
| ASIL decomposition | L3 (Safety engineer) + L4 (Management) | Formal review meeting documented |
| Hazard classification | L3 (Safety engineer) | Independent verification required |
| Release of safety-critical software | L3 + L4 | Formal gate review with documented evidence |
| Deviation from coding standard (ASIL C/D) | L2 (Expert) + L3 (Safety engineer) | Deviation record with justification |
Critical: Any escalation involving safety-critical decisions must produce a documented record that includes: the AI output that triggered escalation, the reason for escalation, the human decision, the rationale for that decision, and the identity and qualifications of the decision-maker. This record becomes part of the safety case.
Regulatory Requirements
Standards governing safety-critical development have explicit and implicit requirements for human oversight of tools, including AI-based tools.
ISO 26262 (Automotive Functional Safety)
| Clause | Requirement | HITL Implication |
|---|---|---|
| Part 8, Clause 11 | Software tool qualification | AI tools must be qualified; humans verify tool output in proportion to Tool Confidence Level (TCL) |
| Part 8, Clause 11.4.6 | Increased confidence from use | Validation of tool output by qualified personnel is an accepted method for TCL 2 and TCL 3 |
| Part 2, Clause 6.4 | Competence management | Humans in the loop must be qualified for their role; qualifications must be documented |
| Part 6, Clause 5 | Initiation of product development at the software level | A responsible person must be assigned; AI cannot hold this role |
ASPICE 4.0
| Process Area | Requirement | HITL Implication |
|---|---|---|
| Generic Practice GP 2.1.3 | Work products are reviewed | AI-generated work products require the same review rigor as human-generated ones |
| SUP.1 (Quality Assurance) | Independent evaluation | QA of AI outputs must be performed by personnel independent of the AI tool team |
| SUP.8 (Configuration Management) | Work product integrity | AI-generated work products must be placed under configuration management like any other artifact |
| SUP.10 (Change Request Management) | Change authorization | Changes to work products triggered by AI findings require the standard approval workflow |
IEC 61508 (Functional Safety - General)
| Clause | Requirement | HITL Implication |
|---|---|---|
| Part 3, Clause 7.4.4 | Software module testing | Test results must be reviewed by a competent person regardless of how tests were generated |
| Part 1, Clause 6 | Management of functional safety | A functional safety management plan must address AI tool use and human oversight |
DO-178C (Airborne Systems Software)
| Section | Requirement | HITL Implication |
|---|---|---|
| Section 4.2 | Software plans | Plans must describe all tools used, including AI, and their qualification status |
| Section 6.3 | Software review and analysis | All outputs, including AI-generated ones, must be reviewed per the defined review process |
| Section 12.3 | Tool qualification | AI tools with potential to introduce errors require TQL-5 qualification or output verification |
Note: Across all standards, the common thread is clear: AI tools do not reduce the burden of human oversight. They may change the nature of human work (from creation to review), but they do not eliminate the requirement for qualified human judgment on safety-relevant decisions.
Anti-Patterns
Common mistakes in HITL implementation that undermine effectiveness and compliance.
Anti-Pattern 1: Rubber-Stamp Reviewer
| Aspect | Description |
|---|---|
| Symptom | Reviewer approves AI outputs without meaningful examination |
| Root cause | Review fatigue; excessive volume; high AI accuracy creates overconfidence |
| Risk | Errors pass through uncaught; compliance evidence is hollow |
| Fix | Enforce time-per-review minimums; require reviewers to annotate specific sections; rotate reviewers; embed intentional AI errors as spot-checks |
Anti-Pattern 2: Over-Escalation
| Aspect | Description |
|---|---|
| Symptom | AI escalates 40%+ of cases to humans, negating automation benefits |
| Root cause | Thresholds set too conservatively; AI model not trained on enough edge cases |
| Risk | Human bottleneck; team frustration; return to fully manual process |
| Fix | Analyze escalated cases; retrain model on resolved escalations; gradually lower thresholds with evidence |
Anti-Pattern 3: Under-Escalation
| Aspect | Description |
|---|---|
| Symptom | AI handles cases it should not, resulting in undetected errors |
| Root cause | Thresholds set too aggressively; confidence scores poorly calibrated |
| Risk | Safety violations; compliance failures; loss of trust |
| Fix | Validate confidence calibration against ground truth; implement secondary checks; increase audit frequency |
Anti-Pattern 4: Missing Feedback Loop
| Aspect | Description |
|---|---|
| Symptom | Human corrections to AI outputs are not captured or used for improvement |
| Root cause | No feedback infrastructure; corrections applied only to the output, not to the process |
| Risk | AI repeats the same errors; no improvement over time; increasing human effort |
| Fix | Log every human correction with before/after data; aggregate corrections into retraining or prompt-tuning datasets; track error recurrence rates |
Anti-Pattern 5: Unqualified Human in the Loop
| Aspect | Description |
|---|---|
| Symptom | Persons assigned to review or approve AI outputs lack domain expertise |
| Root cause | Staffing pressure; assumption that AI does the "hard work" so review is easy |
| Risk | Compliance failure (standards require competent personnel); errors missed |
| Fix | Define competency requirements for each HITL role; verify qualifications before assignment; provide training on AI-specific review techniques |
Anti-Pattern 6: Audit Theater
| Aspect | Description |
|---|---|
| Symptom | Audit logs exist but are never meaningfully reviewed |
| Root cause | Audit process defined on paper but not resourced or scheduled |
| Risk | Cumulative drift goes undetected; false sense of compliance |
| Fix | Assign named auditors with allocated time; define audit cadence; require documented findings and follow-up actions |
Metrics
Measuring HITL effectiveness is essential for continuous improvement and for demonstrating process maturity during assessments.
Core Metrics
| Metric | Formula | Target | Interpretation |
|---|---|---|---|
| AI Acceptance Rate | Accepted outputs / Total AI outputs | 80-95% | Below 80%: AI quality insufficient. Above 95%: possible rubber-stamping |
| Review Turnaround Time | Time from AI output to human decision | < 4 hours | Measures human responsiveness and review queue health |
| Escalation Rate | Escalated cases / Total cases | 5-20% | Below 5%: possible under-escalation. Above 20%: AI not handling enough |
| Escalation Resolution Time | Time from escalation to resolution | < 24 hours | Measures expert availability and escalation process efficiency |
| False Escalation Rate | Escalations where AI was correct / Total escalations | < 30% | High rate indicates thresholds are too conservative |
| Post-Release Defect Escape Rate | Defects found post-release in AI-generated artifacts | < baseline | Must be equal to or better than pre-AI baseline |
| Feedback Loop Closure Rate | Corrections fed back into AI improvement / Total corrections | > 70% | Below 70%: losing improvement opportunities |
| Audit Finding Rate | Issues found during audit / Items audited | Trending down | Should decrease over time as AI and process improve |
Safety-Specific Metrics
| Metric | Description | Target |
|---|---|---|
| Safety Escalation Response Time | Time to resolve safety-triggered escalations | < 8 hours |
| ASIL Coverage | Percentage of ASIL-rated outputs reviewed by safety engineer | 100% for ASIL C/D |
| Tool Confidence Validation Rate | Frequency of AI confidence calibration checks | Quarterly minimum |
| Human Override Rate (Safety) | Percentage of safety-related AI recommendations overridden by human | Track trend; investigate spikes |
Metrics Dashboard
Organize metrics into a tiered dashboard for different audiences.
| Dashboard Level | Audience | Metrics Shown | Refresh Rate |
|---|---|---|---|
| Operational | Development team | Acceptance rate, review turnaround, escalation rate | Real-time |
| Tactical | Project management | Defect escape rate, escalation resolution time, feedback closure | Weekly |
| Strategic | Quality/Safety management | Audit findings, safety escalation response, ASIL coverage | Monthly |
Tip: Plot metrics as trend lines, not just point-in-time values. A single sprint with 85% acceptance rate is meaningless without context. A trend from 70% to 85% over six sprints tells a story of improvement.
Implementation Checklist
For Any Pattern
- Define human role clearly
- Establish feedback mechanism
- Create audit trail
- Define escalation criteria
- Measure pattern effectiveness
Pattern-Specific
| Pattern | Key Implementation Element |
|---|---|
| Reviewer | Review checklist |
| Approver | Authorization workflow |
| Monitor | Dashboard and alerts |
| Auditor | Audit schedule and log |
| Escalation | Routing criteria |
| Collaborator | Iteration protocol |
Pre-Deployment Checklist
Complete this checklist before deploying any HITL pattern in a project.
- HITL pattern selected and justified for each AI-assisted activity
- Confidence thresholds defined, calibrated, and documented
- Escalation levels and recipients identified by name and role
- Escalation response time SLAs defined and communicated
- Human competency requirements documented for each HITL role
- Personnel assigned and qualifications verified
- Audit trail infrastructure operational (logging, storage, retrieval)
- Feedback loop mechanism implemented (correction capture, aggregation)
- Metrics collection automated and dashboard configured
- Safety-specific escalation rules defined for ASIL-rated items
- Regulatory compliance mapping completed (ISO 26262, ASPICE, etc.)
- Pattern-specific tooling deployed (review UI, approval gates, dashboards)
- Dry run completed with representative data before production use
- Team trained on HITL procedures and responsibilities
- Periodic calibration and review schedule established (quarterly minimum)
Summary
Six HITL patterns ensure appropriate human oversight:
- Reviewer: AI generates, human reviews
- Approver: AI recommends, human authorizes
- Monitor: AI operates, human watches
- Auditor: AI continuous, human periodic
- Escalation: AI routes complex to human
- Collaborator: Human-AI iterative
Pattern selection depends on activity type, risk level, ASIL classification, and volume. Effective HITL implementation requires calibrated confidence thresholds, defined escalation protocols, qualified personnel, and continuous measurement. Standards including ISO 26262, ASPICE 4.0, IEC 61508, and DO-178C all mandate human oversight of tool outputs. HITL patterns formalize that mandate into auditable, repeatable workflows.