View a markdown version of this page

Lagging indicators - AWS Prescriptive Guidance

Lagging indicators

Lagging indicators confirm whether the effort is paying off. Fewer production incidents, faster recovery times, and reduced customer impact are the outcomes that justify the investment.

Incident reduction metrics

These metrics show whether proactive risk management is translating into fewer and less severe production incidents.

Production incident frequency

  • Definition: Number of production incidents per month

  • Target: 20% reduction within 6 months of FMEA implementation

  • Measurement: Count of production incidents from monitoring systems

  • Frequency: Monthly tracking with quarterly trend analysis

Preventable incident rate

  • Definition: Percentage of incidents that were identified as risks in FMEA analysis

  • Target: Increase to 60% of incidents having been pre-identified as risks

  • Calculation: (Incidents matching FMEA risks / Total incidents) × 100

  • Frequency: Monthly analysis of incident correlation

Critical incident reduction

  • Definition: Number of severity 1/2 incidents per quarter

  • Target: 30% reduction within 12 months

  • Measurement: Count of high-severity incidents from incident management system

  • Frequency: Quarterly tracking and trending

Recovery and response metrics

These metrics measure how quickly your team detects and resolves incidents when they do occur.

Mean time to recovery (MTTR)

  • Definition: Average time to restore service after incident

  • Target: 15% improvement within 6 months

  • Measurement: Time from incident start to resolution

  • Frequency: Monthly calculation, quarterly trending

Mean time to detection (MTTD)

  • Definition: Average time to detect incidents after occurrence

  • Target: 25% improvement through enhanced monitoring from FMEA

  • Measurement: Time from incident occurrence to detection

  • Frequency: Monthly calculation, quarterly trending

First-time fix rate

  • Definition: Percentage of incidents resolved without recurrence within 30 days

  • Target: 10% improvement through better root cause understanding

  • Calculation: (Incidents with no recurrence / Total resolved incidents) × 100

  • Frequency: Monthly tracking with 30-day lag

Customer impact metrics

These metrics capture how effectively FMEA is protecting the end-user experience from service disruptions.

Customer-affecting incidents

  • Definition: Number of incidents that impact customer experience

  • Target: 25% reduction within 9 months

  • Measurement: Count of incidents with customer impact classification

  • Frequency: Monthly tracking, quarterly business review

Service availability

  • Definition: Percentage uptime of critical application services

  • Target: Maintain or improve existing SLA commitments

  • Calculation: (Total uptime / Total time) × 100

  • Frequency: Real-time monitoring, monthly reporting

Customer satisfaction impact

  • Definition: Customer satisfaction scores related to service reliability

  • Target: Maintain or improve baseline scores

  • Measurement: Customer survey responses and support ticket sentiment

  • Frequency: Quarterly assessment