1.3: Architecture Decision Making

Making and Documenting Architectural Decisions

What is an Architectural Decision?

Definition: A significant design choice that affects:

System structure (components, layers, modules)
Technology selection (AUTOSAR Classic vs Adaptive, CAN vs Ethernet)
Quality attributes (performance, safety, cost)

Examples of Architectural Decisions:

[PASS] "Use AUTOSAR Classic R4.4" (affects entire system structure)
[PASS] "Implement sensor fusion with Kalman filter" (affects accuracy, complexity)
[FAIL] "Name variable distance_m" (coding detail, not architectural)

Architecture Decision Record (ADR) Template

ADR Format (Michael Nygard Template)

# ADR-{NUMBER}: {TITLE}

## Status
{PROPOSED | ACCEPTED | DEPRECATED | SUPERSEDED by ADR-XXX}

Status Meanings:
- PROPOSED: Under review, not yet decided
- ACCEPTED: Approved and being implemented
- DEPRECATED: No longer recommended but still in use
- SUPERSEDED by ADR-XXX: Replaced by a newer decision

## Context
{What is the issue we're facing? What constraints exist?}

## Decision
{What is the change we're making? (One sentence)}

## Rationale
{Why this decision? What alternatives were considered?}

### Option 1: {NAME} (SELECTED)
**Pros**:
- {Advantage 1}
- {Advantage 2}

**Cons**:
- {Disadvantage 1}
- {Disadvantage 2}

### Option 2: {NAME} (REJECTED)
**Pros**:
- {Advantage 1}

**Cons**:
- {Disadvantage 1} (critical)

## Consequences
**Positive**: {What we gain}
**Negative**: {What we lose}
**Mitigation**: {How to address negative consequences}

## Alternatives Considered
- {Other options explored but not detailed above}

## Decision Makers
- {Names/roles of people who made decision}

## Date
{YYYY-MM-DD}

## Implemented By
- {Link to pull request, commit, or "Not yet implemented"}

Example ADR: Sensor Fusion Algorithm

# ADR-007: Sensor Fusion Algorithm Selection

## Status
ACCEPTED

## Context
The ACC system must fuse radar and camera data to achieve ≥95% obstacle detection accuracy (requirement [SYS-089]). We need to select an algorithm that meets accuracy, latency, and cost constraints.

Project constraints:
- Target accuracy: ≥95% detection rate
- Latency requirement: ≤50ms
- Budget: €2.5M (ML infrastructure adds €50k)
- Schedule: 18 months to SOP
- Safety class: ASIL-B (requires deterministic, verifiable algorithm)

## Decision
We will use an Extended Kalman Filter (EKF) for sensor fusion.

## Rationale

### Option 1: Simple Averaging (REJECTED)
**Pros**:
- Simple to implement (1 week)
- Fast (5ms latency)
- No additional cost

**Cons**:
- Low accuracy: 85% detection rate (does not meet ≥95% requirement) [FAIL]

**Verdict**: Does not meet requirement [SYS-089] → Rejected

---

### Option 2: Extended Kalman Filter (SELECTED)
**Pros**:
- Meets accuracy: 95% detection rate [PASS]
- Proven technology (used in 100M vehicles)
- Deterministic (predictable behavior, easier ASIL-B verification)
- Fast: 20ms latency (within ≤50ms requirement) [PASS]
- No additional infrastructure cost [PASS]
- Well-understood by team (3 engineers have EKF experience)

**Cons**:
- Moderate complexity (state estimation, covariance matrices)
- Requires tuning (process noise Q, measurement noise R)
- Sensitive to initialization (needs careful startup sequence)

**Verdict**: Meets all requirements at lowest risk/cost → **SELECTED**

---

### Option 3: Machine Learning (CNN-based fusion) (REJECTED)
**Pros**:
- Highest accuracy: 98% detection rate (exceeds requirement)
- Adaptive (learns from data, may improve over time)
- State-of-the-art (competitive advantage)

**Cons**:
- High cost: +€50k for ML infrastructure (GPU server, MLOps tools) [FAIL]
- Schedule risk: ML development unpredictable (data collection, training, tuning)
- ASIL-B verification challenging: Non-deterministic, "black box" (ISO 21448 SOTIF required)
- Team expertise gap: 0 engineers with production ML experience (requires hiring or training)

**Verdict**: Exceeds requirement unnecessarily, high cost/risk → Rejected

---

## Consequences

### Positive
- Meets accuracy requirement (95%) with margin for error
- Proven technology reduces risk (no "bleeding edge" unknowns)
- Deterministic behavior simplifies ASIL-B verification (testable, predictable)
- No additional infrastructure cost (fits within budget)

### Negative
- Misses opportunity for 98% accuracy (ML option)
- EKF requires manual tuning (Q, R matrices) - adds 2 weeks to schedule
- Not adaptive (fixed algorithm, no learning from field data)

### Mitigation
- If customer requests 98% accuracy in future, we can upgrade to ML (ADR-007 revised)
- Document EKF tuning process (capture institutional knowledge)
- Consider ML for next-generation product (2–3 years out)

## Alternatives Considered
- Particle filter: Overkill for this problem (high computational cost, no accuracy benefit over EKF)
- Unscented Kalman Filter (UKF): Similar to EKF, but adds complexity with no clear benefit

## Decision Makers
- @system_architect (Alice Johnson) - Lead architect
- @safety_engineer (Bob Smith) - Safety approval
- @project_manager (Carol Lee) - Budget approval
- @oem_customer (Dave Martinez) - Confirmed 95% accuracy sufficient

## Date
2025-12-17

## Implemented By
- Implementation Agent (AI) - Generated EKF code
- Pull Request: #142 (merged 2025-12-18)
- Code: `src/sensor_fusion.c` (lines 45-320)

Decision-Making Process

Step-by-Step Guide

1. Define Decision Scope

What decision needs to be made?
Example: "Which sensor fusion algorithm should we use?"

What are the constraints?
- Functional: Accuracy ≥95%
- Non-functional: Latency ≤50ms, cost ≤€2.5M budget
- Safety: ASIL-B deterministic verification

2. Research Options

Option A: Simple Averaging
Option B: Extended Kalman Filter (EKF)
Option C: Machine Learning (CNN)

Research sources:
- Academic papers (Google Scholar, IEEE Xplore)
- Industry benchmarks (automotive white papers)
- Competitor analysis (reverse engineering, patents)
- Team expertise (who has done this before?)

3. Evaluate Trade-Offs

| Criterion | Weight | Option A | Option B | Option C |
|-----------|--------|----------|----------|----------|
| Accuracy (≥95%) | 40% | [FAIL] 85% | [PASS] 95% | [PASS] 98% |
| Latency (≤50ms) | 20% | [PASS] 5ms | [PASS] 20ms | [WARN] 40ms |
| Cost | 20% | [PASS] €0 | [PASS] €0 | [FAIL] +€50k |
| Verifiability (ASIL-B) | 20% | [PASS] Easy | [PASS] Easy | [FAIL] Hard |
| **Weighted Score** | | 60% | **95%** | 76% |

Recommendation: Option B (EKF) - Highest score, meets all requirements

4. Document Decision (ADR)

Write ADR-007 (see template above)
- Status: PROPOSED
- Review with stakeholders (architect, safety, PM, customer)
- If approved → Status: ACCEPTED
- If rejected → Status: REJECTED (document why)

5. Implement and Monitor

- Implementation: Pull Request #142
- Verification: HIL tests, proving ground validation
- Monitor: Track actual accuracy (does it meet 95%?)
- If fails to meet requirement → Revisit decision (ADR-007 superseded by ADR-XXX)

Common Architecture Decisions in Embedded Systems

Decision 1: RTOS Selection

Options: FreeRTOS (free, simple) vs SafeRTOS (certified, expensive) vs AUTOSAR OS (standard, complex)

Factors:

Safety: Does project need certified RTOS? (ASIL-C/D → SafeRTOS, ASIL-B → FreeRTOS acceptable)
Cost: SafeRTOS €50k, FreeRTOS €0
Standards: OEM mandates AUTOSAR? (many do)

Typical Decision: AUTOSAR OS if OEM requires, FreeRTOS otherwise (cost-effective)

Decision 2: Communication Protocol

Options: CAN 2.0B (legacy, 1 Mbps) vs CAN FD (2016+, 8 Mbps) vs Ethernet (100 Mbps+)

Factors:

Bandwidth: How much data? (Camera: Ethernet, Radar: CAN sufficient)
Latency: Real-time requirements? (CAN: 1-10ms, Ethernet: 10-100ms with TSN)
Compatibility: Existing vehicle architecture? (Legacy vehicles: CAN only)

Typical Decision: CAN for sensor data (low bandwidth), Ethernet for camera/diagnostics (high bandwidth)

Decision 3: Software Partitioning

Options: Monolithic (one big binary) vs Modular (separate binaries per ECU) vs Microservices (AUTOSAR Adaptive)

Factors:

Maintainability: Monolithic easier for small projects, modular better for large teams
Testability: Modular easier to test (mock interfaces)
Deployment: Microservices allow independent updates (OTA)

Typical Decision: Modular for ASIL-B (easier verification), Microservices only if OTA required

Anti-Patterns to Avoid

Anti-Pattern 1: Analysis Paralysis

Problem: Spending 6 weeks evaluating 10 options, delaying schedule

Solution:

Set time limit: 1–2 weeks for major decisions
Limit options: 3 options maximum (fewer is better)
Make reversible decisions: If wrong, document lesson learned in ADR

Anti-Pattern 2: Resume-Driven Development

Problem: Choosing trendy technology (ML, blockchain) without clear benefit

Example:

Engineer: "Let's use blockchain for secure firmware updates!"
Architect: "Why? We have code signing already."
Engineer: "Because blockchain is cool and I want to learn it."
Architect: "That's not a valid technical rationale. Rejected."

Solution: Every technology choice must solve a real problem (not just "cool")

Anti-Pattern 3: Not-Invented-Here Syndrome

Problem: Rejecting proven solutions, insisting on custom development

Example:

Engineer: "Let's write our own RTOS instead of using FreeRTOS."
Architect: "Why? FreeRTOS is proven in 100M devices."
Engineer: "Because I think I can do better."
Architect: "That's 6 months of development + testing. FreeRTOS is free and certified. Use FreeRTOS."

Solution: Prefer proven solutions (open source, COTS) over custom development

Summary

Architecture Decision Making Process: Define scope → Research options → Evaluate trade-offs → Document (ADR) → Implement → Monitor

ADR Template: Status, Context, Decision, Rationale (Pros/Cons per option), Consequences, Alternatives, Decision Makers, Date

Common Decisions: RTOS selection, communication protocol, software partitioning

Anti-Patterns: Analysis paralysis, resume-driven development, not-invented-here syndrome

Next: Traceability in Practice (33.04) — Maintaining end-to-end traceability throughout development