Postmortem Index

Explore incident reports from various companies

Cloudflare service token incident on January 24, 2023

Cloudflare · service tokens

On January 24, 2023, Cloudflare experienced an incident lasting 121 minutes, affecting several services. The issue began at 16:55 UTC when an Access engineering team initiated a code release. Although the release was rolled back at 17:05 UTC, the damage had already occurred, leading to a staggered impact across the network. The incident was declared at 18:12 UTC, and full resolution was achieved by 19:51 UTC after restoring service tokens from a database backup.

The incident stemmed from a new feature release for service tokens, intended to add a ‘Last seen at’ field. During a read-write transaction to update this field, the system inadvertently overwrote the ‘client_secret’ value of service tokens with an empty string. This occurred because the read operation redacted the client secret for security, and the subsequent write used this incomplete data, coupled with an insufficient database check that allowed empty strings despite a ‘not null’ constraint.

This error rendered service tokens invalid for impacted accounts, which included critical internal Cloudflare accounts powering multiple services. Consequently, a wide range of Cloudflare products suffered degradation, including the Workers platform, Zero Trust solution, CDN control plane functions, Cloudflare WARP, Cloudflare API, Cache Purge, Cache Reserve, Images, and R2. Customers experienced issues such as failed authentication, inability to enroll new devices, reauthentication failures, blocked traffic, and elevated request failure rates.

To resolve the issue, Cloudflare manually restored correct service token values for affected accounts and then performed a full restoration from an older database backup. As preventative measures, the Access engineering team plans to implement new unit tests to catch similar overwrites, add automatic alerts for authentication failures, improve rollback processes for specific database tables, and update database fields to explicitly check for empty strings in addition to existing ‘not null’ checks.

Keywords

service tokenscloudflarewarpzero trustapir2cdncache purge