{"UUID":"62dd1eda-63e8-4bf5-a5f8-46a222121474","URL":"https://circleci.statuspage.io/incidents/hr0mm9xmm3x6","ArchiveURL":"","Title":"CircleCI DB performance issue","StartTime":"2015-07-07T09:55:00Z","EndTime":"2015-07-08T15:32:00Z","Categories":["cascading-failure","config-change"],"Keywords":["github","database","build queue","performance","clojure","outage","downtime","capacity"],"Company":"CircleCI","Product":"database","SourcePublishedAt":"0001-01-01T00:00:00Z","SourceFetchedAt":"2026-05-04T17:48:49.315445Z","Summary":"A GitHub outage and recovery caused an unexpectedly large incoming load. For reasons that aren't specified, a large load causes CircleCI's queue system to slow down, in this case to handling one transaction per minute.","Description":"On July 7, 2015, CircleCI experienced a severe and lengthy downtime where its build queue came to a complete standstill. This began after GitHub push hooks resumed with an unprecedented intensity following a GitHub outage, leading to a sustained surge in new build requests.\n\nThe rapid insertion of these new builds caused severe performance degradation in CircleCI's main database, which underpins the complex build queue system. The database quickly became unresponsive, going from normal operation to fully locked within minutes due to resource contention and slow, timing-out queries.\n\nCustomers faced a complete halt in their build processes, with the queue dequeueing only one build per minute instead of many per second. Many queued builds aged significantly, losing their value, and the platform became largely inaccessible.\n\nInitial attempts to salvage the queue and throttle incoming requests were unsuccessful. Engineers utilized the live patching capabilities of Clojure to disable problematic queries and modify code in production. They then cleared the \"usage queue\" and \"run queue\" using scripts, a process that took over an hour.\n\nAfter gaining control of the database and queue, CircleCI initiated a switch to new, upgraded database hardware. This allowed them to restore service, scale capacity, and eventually clean up temporary fixes, bringing the system back to normal operation by July 8, 2015."}