Postmortem Index

Explore incident reports from various companies

Category

Cascading Failure

One small failure that snowballed: retries, thundering herds, or thread pool exhaustion that took out adjacent services.

Postmortems
38
Companies
25
Years covered
37
Date range
Jan 1990 – Feb 2026
Title Company Date Other categories
GitHub Actions and Codespaces outage of February 2026 GitHub 2026-02-02 – 2026-02-03
incident.io service disruption during AWS us-east-1 outage on October 20, 2025 incident.io 2025-10-20
LaunchDarkly service disruption due to AWS us-east-1 outage and internal cascading failures (October 2025) Launchdarkly 2025-10-20 – 2025-10-21
incident.io database outage due to PGAudit incident.io 2025-04-09
GitHub DNS infrastructure failure and service degradation on October 11, 2024 GitHub 2024-10-11 – 2024-10-12
Cloudflare Control Plane and Analytics Outage due to Flexential Power Failure Cloudflare 2023-11-02 – 2023-11-04
Datadog Infrastructure Connectivity Issue March 2023 Datadog 2023-03-08 – 2023-03-10
BigQuery Storage WriteAPI elevated error rates in US Multi-Region Google 2022-10-13 – 2022-10-14
Honeycomb Ingest System Outage: Shepherd Cache Delays Honeycomb 2022-09-08
Slack’s Incident on 2-22-22 Slack 2022-02-22
GitHub November 2021 Availability Incident due to MySQL Schema Migration Github 2021-11-27
CircleCI jobs stuck in "not running" state on November 8, 2021 CircleCI 2021-11-08
Cloudflare API and dashboard availability incident on 2020-11-02 Cloudflare 2020-11-02
Datadog US region infrastructure connectivity issue DataDog 2020-09-24 – 2020-09-25
Flowdock outage and cross-organization data leak Broadcom (CA Technologies) 2020-04-21 – 2020-04-22
Cloudflare global outage on July 2, 2019 Cloudflare 2019-07-02
Google Cloud internal blob storage disruption March 2019 Google 2019-03-13
Elastic Cloud AWS us-east-1 outage of February 2019 Elastic 2019-02-04
Authentication Latency on DUO1 Deployment Duo 2018-08-29
Unavailable Guilds & Connection Issues Discord 2017-10-13
Discord Connectivity Issues (March 2017) Discord 2017-03-20
Leap second affected Cloudflare DNS Cloudflare 2017-01-01
Buildkite outage of August 22nd, 2016 Buildkite 2016-08-22
GitHub January 28th, 2016 datacenter power disruption GitHub 2016-01-28
CircleCI DB performance issue CircleCI 2015-07-07 – 2015-07-08
NPM Fastly VCL misconfiguration outage on 2014-01-28 NPM 2014-01-28
GitHub DNS Outage on January 8, 2014 GitHub 2014-01-08
Stackdriver Intelligent Monitoring application outage on October 23, 2013 Stackdriver 2013-10-23 – 2013-10-26
India NEW grid blackouts Indian Electricity Grid (POSOCO / CERC) 2012-07-30 – 2012-07-31
Windows Azure Service Disruption on Feb 29th, 2012 Azure 2012-02-29 – 2012-03-01
Northeast blackout of 2003 FirstEnergy / General Electric 2003-08-14 – 2003-08-16
Northeast blackout U.S.-Canada Power System Outage Task Force 2003-08-14 – 2003-08-18
Ariane 5 Flight 501 launch failure of June 1996 European Space Agency 1996-06-04
1990 AT&T Long Distance Network Collapse AT&T 1990-01-15
Untitled postmortem Elastic
GitHub availability incidents in February and March 2026 GitHub
Multiple Slack service disruptions in October 2014 Slack
Supermarket Intermittent Unresponsiveness Chef.io