{"UUID":"bb1d9c14-e8d8-4aa4-a806-31f31f3d96fe","URL":"https://stackstatus.net/post/147710624694/outage-postmortem-july-20-2016","ArchiveURL":"https://web.archive.org/web/20160720200842if_/https://stackstatus.net/post/147710624694/outage-postmortem-july-20-2016","Title":"Stack Exchange Network outage of July 20, 2016","StartTime":"2016-07-20T14:44:00Z","EndTime":"2016-07-20T15:18:00Z","Categories":null,"Keywords":["stack exchange","outage","regex","cpu","load balancer","health check","backtracking","stack overflow"],"Company":"Stack Exchange","Product":"Stack Exchange Network","SourcePublishedAt":"2016-07-20T19:47:49-04:00","SourceFetchedAt":"2026-05-04T17:52:30.071812Z","Summary":"Backtracking implementation in the underlying regex engine turned out to be very expensive for a particular post leading to health-check failures and eventual outage.","Description":"On July 20, 2016, the Stack Exchange Network experienced a 34-minute outage starting at 14:44 UTC. The incident involved a rapid response, with 10 minutes to identify the cause, 14 minutes to develop a fix, and 10 minutes to deploy it, restoring Stack Overflow's availability.\n\nThe direct cause of the outage was a malformed post that triggered high CPU consumption by a regular expression on the web servers. This post was present in the homepage list, causing the expensive regex to be executed with every homepage view. As a result, the homepage became unresponsive.\n\nSince the homepage is used by the load balancer for health checks, its unresponsiveness led the load balancer to remove the affected servers from rotation. This action made the entire Stack Exchange Network unavailable to users.\n\nThe root cause was a specific regular expression, `^[\\s\\u200c]+|[\\s\\u200c]+$`, designed to trim unicode whitespace. A malformed post containing approximately 20,000 consecutive whitespace characters exploited a performance characteristic of the backtracking regex engine. This caused the engine to perform an O(n²) number of character checks, leading to excessive CPU usage.\n\nTo resolve the immediate issue, the problematic regular expression was replaced with a substring function. As follow-up actions, Stack Exchange plans to audit other regular expressions, improve post validation workflows, add controls to the load balancer to disable health checks during outages, and create a comprehensive outage checklist to streamline future incident responses."}