{"UUID":"5b73c1c1-8ec4-4901-91f3-0d993f582dd4","URL":"https://hacks.mozilla.org/2019/07/add-ons-outage-post-mortem-result/","ArchiveURL":"","Title":"Firefox Add-ons Outage due to Certificate Expiration","StartTime":"2019-05-04T00:00:00Z","EndTime":"0001-01-01T00:00:00Z","Categories":["config-change"],"Keywords":["firefox","add-ons","certificate","expiration","outage","normandy","balrog","qa"],"Company":"Mozilla","Product":"Firefox Add-ons","SourcePublishedAt":"2019-07-12T16:08:26Z","SourceFetchedAt":"2026-05-04T17:48:22.850058Z","Summary":"Most Firefox add-ons stopped working around May 4th 2019 when a certificate expired. Firefox requires a valid certificate chain to prevent malware. About nine hours later, Mozilla pushed a privileged add-on that injected a valid certificate into Firefox's certificate store, creating a valid chain and unblocking add-ons. This disabled effectively all add-ons, about 15,000, and the resolution took approximately 15-21 hours for most users. Some user data was lost. Previously Mozilla [posted](https://hacks.mozilla.org/2019/05/technical-details-on-the-recent-firefox-add-on-outage) about the technical details.","Description":"Around May 4th, 2019, most Firefox add-ons stopped functioning due to an expired certificate. The incident was triggered when a certificate used for signing add-ons expired, preventing Firefox from validating their authenticity.\n\nThe root cause was identified as a misunderstanding within the team responsible for the signing system, who incorrectly believed Firefox ignored certificate expiration dates. This misconception was exacerbated by a previous incident where end-entity certificate checking was disabled, causing confusion regarding intermediate certificate validation. Furthermore, Firefox's QA plan lacked testing for certificate expiration or future date behaviors, preventing early detection of the problem.\n\nTo remediate the issue, a fix was delivered via the Studies system (internally known as Normandy), which injected a valid certificate. This method was chosen for its speed, despite requiring some users to enable Telemetry, leading to temporary over-collection of data. Over the subsequent weeks, numerous fixes and dot releases were deployed to address various deployment targets and defects in initial patches.\n\nLessons learned from the incident include the need for improved communication and documentation regarding system components, better integration of this information into engineering and QA processes, and the development of a faster, independent update mechanism not tied to Telemetry or the Studies system. The importance of dedicated QA resources during incident response was also highlighted."}