Incident reduction metrics Recovery and response metrics Customer impact metrics

Lagging indicators

Lagging indicators confirm whether the effort is paying off. Fewer production incidents, faster recovery times, and reduced customer impact are the outcomes that justify the investment.

Incident reduction metrics

These metrics show whether proactive risk management is translating into fewer and less severe production incidents.

Production incident frequency

Definition: Number of production incidents per month
Target: 20% reduction within 6 months of FMEA implementation
Measurement: Count of production incidents from monitoring systems
Frequency: Monthly tracking with quarterly trend analysis

Preventable incident rate

Definition: Percentage of incidents that were identified as risks in FMEA analysis
Target: Increase to 60% of incidents having been pre-identified as risks
Calculation: (Incidents matching FMEA risks / Total incidents) × 100
Frequency: Monthly analysis of incident correlation

Critical incident reduction

Definition: Number of severity 1/2 incidents per quarter
Target: 30% reduction within 12 months
Measurement: Count of high-severity incidents from incident management system
Frequency: Quarterly tracking and trending

Recovery and response metrics

These metrics measure how quickly your team detects and resolves incidents when they do occur.

Mean time to recovery (MTTR)

Definition: Average time to restore service after incident
Target: 15% improvement within 6 months
Measurement: Time from incident start to resolution
Frequency: Monthly calculation, quarterly trending

Mean time to detection (MTTD)

Definition: Average time to detect incidents after occurrence
Target: 25% improvement through enhanced monitoring from FMEA
Measurement: Time from incident occurrence to detection
Frequency: Monthly calculation, quarterly trending

First-time fix rate

Definition: Percentage of incidents resolved without recurrence within 30 days
Target: 10% improvement through better root cause understanding
Calculation: (Incidents with no recurrence / Total resolved incidents) × 100
Frequency: Monthly tracking with 30-day lag

Customer impact metrics

These metrics capture how effectively FMEA is protecting the end-user experience from service disruptions.

Customer-affecting incidents

Definition: Number of incidents that impact customer experience
Target: 25% reduction within 9 months
Measurement: Count of incidents with customer impact classification
Frequency: Monthly tracking, quarterly business review

Service availability

Definition: Percentage uptime of critical application services
Target: Maintain or improve existing SLA commitments
Calculation: (Total uptime / Total time) × 100
Frequency: Real-time monitoring, monthly reporting

Customer satisfaction impact

Definition: Customer satisfaction scores related to service reliability
Target: Maintain or improve baseline scores
Measurement: Customer survey responses and support ticket sentiment
Frequency: Quarterly assessment

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Leading indicators

Business value metrics