7.2: MAN.5 Risk Management


Process Definition

Purpose

MAN.5 Purpose: To identify, analyze, treat, and monitor risks continuously throughout the project lifecycle.

Outcomes

Outcome Description
O1 The sources of risks are identified and regularly updated
O2 Potential undesirable events are identified as they develop
O3 Risks are analyzed and priority for treatment is determined
O4 Risk measures are defined, applied, and assessed
O5 Appropriate treatment is taken to correct or avoid impact

Base Practices with AI Integration

BP Base Practice AI Level AI Application
BP1 Identify sources of risks L2 Pattern-based identification from project context
BP2 Identify potential undesirable events L2 Event pattern matching, historical analysis
BP3 Determine risks (probability and severity) L2 AI-assisted scoring and impact analysis
BP4 Define risk treatment options L1 Treatment strategy recommendations
BP5 Define and perform risk treatment activities L1 Mitigation action generation
BP6 Monitor risks L2-L3 Automated monitoring and trend analysis
BP7 Take corrective action L0-L1 Action recommendations

Risk Management Framework

The following diagram illustrates the MAN.5 risk management process, showing the cycle from risk identification and analysis through treatment planning, monitoring, and corrective action.

Risk Management Process


Risk Treatment Strategies

MAN.5 defines four fundamental risk treatment options (BP4):

Treatment Definition When to Use Example
Accept Tolerate the risk without action Cost of mitigation exceeds potential impact; residual risk within acceptable tolerance Accept risk of minor UI cosmetic defect in non-safety-critical display
Mitigate Reduce probability or impact through preventive/protective actions Risk level unacceptable but elimination not feasible; cost-effective mitigation exists Implement software watchdog to reduce probability of system hang
Avoid Eliminate the risk source entirely Risk too severe; alternative approach available Remove dependency on unqualified third-party library; develop in-house
Share/Transfer Outsource risk to supplier, partner, or insurance Risk outside direct control; external party better positioned to manage Transfer hardware reliability risk to qualified automotive Tier-1 supplier

ASPICE Requirement: Treatment selection must be documented with justification (Work Product 08-27: Risk Mitigation Plan). Residual risk after treatment must be evaluated and accepted by appropriate authority.


Process vs. Product Risks

MAN.5 addresses both process and product risks throughout the development lifecycle:

Process Risks

Risks to project execution capability:

  • Schedule: Milestone delays, resource availability, dependency delays
  • Resources: Key personnel departure, skill gaps, infrastructure constraints
  • Methodology: Tool failures, process maturity, organizational change
  • External: Supplier dependencies, regulatory changes, customer requirement volatility

Example: HIL test equipment shared across multiple projects → integration testing delay → schedule risk.

Product Risks

Risks to system/software quality, safety, and performance:

  • Technical Feasibility: Algorithm complexity, real-time constraints, hardware limitations
  • Safety: Functional safety failures, hazard realization (ISO 26262 ASIL-rated risks)
  • Security: Cybersecurity vulnerabilities, attack surface (ISO/SAE 21434 CAL-rated risks)
  • Quality: Defect introduction, requirements gaps, integration incompatibilities

Example: Real-time constraints not achievable on target MCU → system requirements violation → product risk.

ASPICE Alignment: Both risk types must be managed continuously. Product risks often manifest as process risks (e.g., safety certification failure → schedule delay). Risk register (WP 08-26) captures both categories with clear classification.


AI-Powered Risk Identification

Risk Pattern Analyzer

"""
AI-assisted risk identification for embedded software projects.
"""

from dataclasses import dataclass, field
from typing import List, Dict, Optional, Set
from enum import Enum
from datetime import datetime

class RiskCategory(Enum):
    TECHNICAL = "technical"
    SCHEDULE = "schedule"
    RESOURCE = "resource"
    REQUIREMENTS = "requirements"
    INTEGRATION = "integration"
    QUALITY = "quality"
    SAFETY = "safety"
    SECURITY = "security"
    SUPPLIER = "supplier"
    REGULATORY = "regulatory"

class Probability(Enum):
    VERY_LOW = 1
    LOW = 2
    MEDIUM = 3
    HIGH = 4
    VERY_HIGH = 5

class Impact(Enum):
    NEGLIGIBLE = 1
    MINOR = 2
    MODERATE = 3
    MAJOR = 4
    SEVERE = 5

@dataclass
class Risk:
    """Project risk definition."""
    id: str
    title: str
    category: RiskCategory
    description: str
    cause: str
    consequence: str
    probability: Probability
    impact: Impact
    risk_level: int  # probability * impact
    mitigation: str
    owner: str
    status: str
    identified_date: datetime
    ai_confidence: float = 0.0


class RiskIdentifier:
    """AI-assisted risk identification.

    Note: Risk patterns should be customized based on organizational
    experience and project types (automotive, industrial, medical, etc.).
    """

    def __init__(self):
        self.risk_patterns = self._load_risk_patterns()
        self.historical_risks = self._load_historical_risks()

    def _load_risk_patterns(self) -> Dict[str, List[Dict]]:
        """Load risk patterns from knowledge base."""
        return {
            'embedded_software': [
                {
                    'pattern': 'new_target_platform',
                    'title': "New target platform learning curve",
                    'category': RiskCategory.TECHNICAL,
                    'triggers': ['new platform', 'first project', 'unfamiliar MCU'],
                    'probability': Probability.MEDIUM,
                    'impact': Impact.MODERATE
                },
                {
                    'pattern': 'real_time_constraints',
                    'title': "Real-time performance not achievable",
                    'category': RiskCategory.TECHNICAL,
                    'triggers': ['hard real-time', 'timing critical', 'deadline'],
                    'probability': Probability.MEDIUM,
                    'impact': Impact.MAJOR
                },
                {
                    'pattern': 'safety_certification',
                    'title': "Safety certification delays",
                    'category': RiskCategory.REGULATORY,
                    'triggers': ['ASIL', 'ISO 26262', 'functional safety'],
                    'probability': Probability.MEDIUM,
                    'impact': Impact.MAJOR
                },
                {
                    'pattern': 'integration_complexity',
                    'title': "Integration issues with external components",
                    'category': RiskCategory.INTEGRATION,
                    'triggers': ['third party', 'COTS', 'supplier'],
                    'probability': Probability.HIGH,
                    'impact': Impact.MODERATE
                },
                {
                    'pattern': 'requirements_volatility',
                    'title': "Requirements changes during development",
                    'category': RiskCategory.REQUIREMENTS,
                    'triggers': ['unclear requirements', 'customer changes', 'scope creep'],
                    'probability': Probability.HIGH,
                    'impact': Impact.MAJOR
                },
                {
                    'pattern': 'resource_availability',
                    'title': "Key resource unavailability",
                    'category': RiskCategory.RESOURCE,
                    'triggers': ['single point', 'expert', 'specialized'],
                    'probability': Probability.MEDIUM,
                    'impact': Impact.MAJOR
                },
                {
                    'pattern': 'test_equipment',
                    'title': "Test equipment availability",
                    'category': RiskCategory.RESOURCE,
                    'triggers': ['HIL', 'test bench', 'hardware'],
                    'probability': Probability.MEDIUM,
                    'impact': Impact.MODERATE
                }
            ]
        }

    def _load_historical_risks(self) -> List[Risk]:
        """Load risks from historical projects."""
        # In production, load from database
        return []

    def identify_risks(self, project_context: Dict) -> List[Risk]:
        """Identify risks based on project context."""

        identified_risks = []

        # Pattern-based identification
        for pattern in self.risk_patterns['embedded_software']:
            if self._pattern_matches(pattern, project_context):
                risk = self._create_risk_from_pattern(pattern, project_context)
                identified_risks.append(risk)

        # Historical pattern matching
        similar_risks = self._find_similar_historical_risks(project_context)
        identified_risks.extend(similar_risks)

        # De-duplicate and rank
        unique_risks = self._deduplicate_risks(identified_risks)
        ranked_risks = self._rank_risks(unique_risks)

        return ranked_risks

    def _pattern_matches(self, pattern: Dict, context: Dict) -> bool:
        """Check if a risk pattern matches project context."""

        context_text = ' '.join(str(v).lower() for v in context.values())

        for trigger in pattern['triggers']:
            if trigger.lower() in context_text:
                return True

        return False

    def _create_risk_from_pattern(self, pattern: Dict,
                                  context: Dict) -> Risk:
        """Create risk instance from matched pattern."""

        return Risk(
            id=f"RSK-{pattern['pattern'][:8].upper()}-{datetime.now().strftime('%Y%m%d')}",
            title=pattern['title'],
            category=pattern['category'],
            description=self._generate_description(pattern, context),
            cause=self._infer_cause(pattern, context),
            consequence=self._infer_consequence(pattern),
            probability=pattern['probability'],
            impact=pattern['impact'],
            risk_level=pattern['probability'].value * pattern['impact'].value,
            mitigation="TBD - Define mitigation strategy",
            owner="TBD",
            status="identified",
            identified_date=datetime.now(),
            ai_confidence=0.75  # Calibrate based on pattern match quality
        )

    def _generate_description(self, pattern: Dict, context: Dict) -> str:
        """Generate risk description from pattern and context."""
        return f"Risk identified based on pattern '{pattern['pattern']}'. " \
               f"Project characteristics suggest this risk is applicable."

    def _infer_cause(self, pattern: Dict, context: Dict) -> str:
        """Infer risk cause."""
        cause_map = {
            'new_target_platform': "Team lacks experience with target platform",
            'real_time_constraints': "Complex timing requirements may not be achievable",
            'safety_certification': "Safety assessment and documentation requirements",
            'integration_complexity': "Dependencies on external components",
            'requirements_volatility': "Customer requirements not fully defined",
            'resource_availability': "Critical skill dependency on limited resources",
            'test_equipment': "Shared or limited test infrastructure"
        }
        return cause_map.get(pattern['pattern'], "To be determined")

    def _infer_consequence(self, pattern: Dict) -> str:
        """Infer risk consequence."""
        consequence_map = {
            'new_target_platform': "Schedule delay, quality issues",
            'real_time_constraints': "System requirements not met",
            'safety_certification': "Delayed product release",
            'integration_complexity': "Integration issues, rework",
            'requirements_volatility': "Scope changes, rework",
            'resource_availability': "Schedule impact, knowledge loss",
            'test_equipment': "Testing delays"
        }
        return consequence_map.get(pattern['pattern'], "To be determined")

    def _find_similar_historical_risks(self, context: Dict) -> List[Risk]:
        """Find similar risks from historical projects."""
        # Placeholder for similarity matching
        return []

    def _deduplicate_risks(self, risks: List[Risk]) -> List[Risk]:
        """Remove duplicate risks."""
        seen_titles: Set[str] = set()
        unique = []
        for risk in risks:
            if risk.title not in seen_titles:
                seen_titles.add(risk.title)
                unique.append(risk)
        return unique

    def _rank_risks(self, risks: List[Risk]) -> List[Risk]:
        """Rank risks by risk level."""
        return sorted(risks, key=lambda r: r.risk_level, reverse=True)


def generate_risk_register(risks: List[Risk]) -> str:
    """Generate risk register markdown."""

    report = ["# Risk Register\n"]
    report.append(f"**Generated**: {datetime.now().strftime('%Y-%m-%d %H:%M')}\n")

    # Summary
    report.append("## Summary\n")
    report.append(f"| Total Risks | High | Medium | Low |")
    report.append(f"|-------------|------|--------|-----|")

    high = len([r for r in risks if r.risk_level >= 15])
    medium = len([r for r in risks if 8 <= r.risk_level < 15])
    low = len([r for r in risks if r.risk_level < 8])
    report.append(f"| {len(risks)} | {high} | {medium} | {low} |\n")

    # Risk table
    report.append("## Risk Details\n")
    report.append("| ID | Title | Category | P | I | Level | Status |")
    report.append("|-----|-------|----------|---|---|-------|--------|")

    for risk in risks:
        report.append(
            f"| {risk.id} | {risk.title} | {risk.category.value} | "
            f"{risk.probability.name} | {risk.impact.name} | "
            f"{risk.risk_level} | {risk.status} |"
        )

    return "\n".join(report)

Risk Assessment Matrix

The diagram below presents the risk assessment matrix, mapping probability against impact to classify risks into severity zones that drive treatment priority.

Risk Assessment Matrix


Risk Register Template

Note: Dates and owners are illustrative; actual registers use project-specific information.

# Risk Register (template)
risk_register:
  project: (Project Name)
  version: 1.0
  last_updated: (date)

  risks:
    - id: RSK-001
      title: "HIL test equipment availability"
      category: resource
      description: |
        HIL test bench is shared across multiple projects.
        Availability conflicts may delay integration testing.
      cause: "Limited HIL infrastructure, high utilization"
      consequence: "Integration testing delayed by 2-4 weeks"
      probability: medium
      impact: major
      risk_level: 12
      treatment: mitigate
      mitigation_actions:
        - action: "Reserve HIL time slots 4 weeks in advance"
          owner: Test Lead
          due_date: 2025-01-20
          status: complete
        - action: "Develop SIL simulation as backup"
          owner: SW Architect
          due_date: 2025-02-01
          status: in_progress
      residual_risk: 6
      owner: Project Manager
      status: mitigating
      monitoring_frequency: weekly

    - id: RSK-002
      title: "Requirements changes from customer"
      category: requirements
      description: |
        Customer has indicated potential changes to door lock timing
        requirements based on field feedback from other vehicles.
      cause: "Customer field experience driving specification changes"
      consequence: "Rework in implementation and testing phases"
      probability: high
      impact: moderate
      risk_level: 12
      treatment: mitigate
      mitigation_actions:
        - action: "Weekly sync with customer requirements team"
          owner: Project Manager
          due_date: ongoing
          status: active
        - action: "Implement configurable timing parameters"
          owner: SW Architect
          due_date: 2025-01-25
          status: in_progress
      residual_risk: 6
      owner: Project Manager
      status: mitigating
      monitoring_frequency: weekly

    - id: RSK-003
      title: "Cold temperature qualification failure"
      category: technical
      description: |
        Door lock timing may not meet requirements at extreme
        cold temperatures (-40°C) due to transistor characteristics.
      cause: "Temperature-dependent hardware behavior"
      consequence: "Design change required, schedule impact"
      probability: medium
      impact: major
      risk_level: 12
      treatment: mitigate
      mitigation_actions:
        - action: "Early cold temperature characterization"
          owner: HW/SW Integration Lead
          due_date: 2025-01-15
          status: complete
        - action: "Implement temperature compensation"
          owner: SW Developer
          due_date: 2025-01-30
          status: planned
      residual_risk: 4
      owner: Technical Lead
      status: mitigating
      monitoring_frequency: bi-weekly

    - id: RSK-004
      title: "Key resource departure"
      category: resource
      description: |
        Senior developer with unique CAN stack expertise
        may leave during project execution.
      cause: "Market demand for embedded expertise"
      consequence: "Knowledge loss, schedule delay"
      probability: low
      impact: major
      risk_level: 8
      treatment: mitigate
      mitigation_actions:
        - action: "Document CAN stack architecture and design decisions"
          owner: SW Architect
          due_date: 2025-01-31
          status: in_progress
        - action: "Cross-train second developer"
          owner: Team Lead
          due_date: 2025-02-15
          status: planned
      residual_risk: 4
      owner: Project Manager
      status: mitigating
      monitoring_frequency: monthly

Risk Monitoring Dashboard

The following diagram shows the risk monitoring dashboard, providing real-time visibility into open risks, mitigation progress, trend indicators, and risk exposure over time.

Risk Monitoring Dashboard


Work Products

WP ID Work Product AI Role
08-25 Risk management plan Template generation
08-26 Risk register Pattern-based identification
13-22 Risk status report Automated generation
08-27 Risk mitigation plan Suggestion generation

Summary

MAN.5 Risk Management:

  • AI Level: L1-L2 (AI identifies and scores, human decides)
  • Primary AI Value: Risk identification, scoring, monitoring
  • Human Essential: Risk acceptance, mitigation decisions
  • Key Outputs: Risk register, mitigation plans
  • Continuous: Risk management throughout project lifecycle