NPM Fastly VCL misconfiguration outage on 2014-01-28
NPM · npm registry
On January 28, 2014, npm experienced an outage due to a configuration change made to the Varnish VCL on Fastly. This change inadvertently misrouted all incoming requests, including those intended for CouchDB, to Manta.
As Manta was not equipped to handle these requests, users received 403 Forbidden responses. Fastly’s configuration, which does not cache error codes, exacerbated the issue by creating a “thundering herd” effect, prolonging the incident.
The root cause was identified as a bug in the VCL configuration. Specifically, req.backend was set within a vcl_fetch function, and a subsequent restart call reset req.backend to the default first backend (Manta) instead of maintaining the intended CouchDB routing.
To prevent recurrence, future Fastly VCL configuration changes will ensure req.backend is explicitly set within the vcl_recv function, even after a restart. Additionally, npm plans to implement a separate Varnish instance for staging VCL changes to catch errors before deployment.