{"UUID":"f824c623-7f26-4bf7-9c02-7c39475af100","URL":"https://gist.github.com/jomo/2bae3821acb433d0446d","ArchiveURL":"","Title":"Google Code Jam 2014 Repeated Email Incident","StartTime":"0001-01-01T00:00:00Z","EndTime":"0001-01-01T00:00:00Z","Categories":["automation","cloud"],"Keywords":["google","code jam","email","repeated emails","cron job","app engine","datastore","mail system"],"Company":"Google","Product":"Code Jam email system","SourcePublishedAt":"0001-01-01T00:00:00Z","SourceFetchedAt":"2026-05-04T17:55:30.773046Z","Summary":"A mail system emailed people more than 20 times. This happened because mail was sent with a batch cron job that sent mail to everyone who was marked as waiting for mail. This was a non-atomic operation and the batch job didn't mark people as not waiting until all messages were sent.","Description":"On an unspecified date, the automated email system for Google Code Jam experienced an incident where it repeatedly sent \"Registration Now Open for Google Code Jam 2014!\" emails. A large number of registrants from 2013 received more than 20 copies of this email. The issue was identified and the system was manually stopped after a contestant alerted Google.\n\nThe incident stemmed from a bug introduced during a refactoring of the mail system. The system used an App Engine datastore with \"notification\" objects having a \"status\" property (\"Waiting\", \"Sending\", \"Sent\"). A cron job, MailCheckWorker, was designed to find \"Waiting\" notifications and initiate email sending, marking them \"Sent\" only after all emails were dispatched.\n\nThe core problem was a non-atomic operation. The MailCheckWorker did not transition the notification status to \"Sending\" *before* starting the email dispatch. Consequently, while the first batch of emails was still being sent, the MailCheckWorker would re-evaluate the notification, find it still in \"Waiting\" status, and initiate another round of email sending. This cycle repeated multiple times.\n\nThis flaw led to a significant customer impact, with many users receiving numerous identical emails. Smaller-scale tests of the mail system had not detected this bug because they completed within a minute, before the MailCheckWorker could re-process the \"Waiting\" notification. The Code Jam team issued an apology for the inconvenience caused."}