{"UUID":"cc87682c-1689-4b23-82e1-aaf3dff8728f","URL":"https://status.cloud.google.com/incidents/mREMLwZFe3FuLLn3zfTw","ArchiveURL":"","Title":"BigQuery Storage WriteAPI elevated error rates in US Multi-Region","StartTime":"2022-10-13T23:30:00Z","EndTime":"2022-10-14T04:30:00Z","Categories":["automation","cascading-failure","cloud","time"],"Keywords":["bigquery","storage writeapi","streaming api","us multi-region","rpc","deadlock","connection failures"],"Company":"Google","Product":"BigQuery Storage WriteAPI","SourcePublishedAt":"0001-01-01T00:00:00Z","SourceFetchedAt":"2026-05-04T17:53:25.543179Z","Summary":"On 13 October 2022 23:30 US/Pacific, there was an unexpected increase of incoming and logging traffic combined with a bug in Google’s internal streaming RPC library that triggered a deadlock and caused the Write API Streaming frontend to be overloaded. And BigQuery Storage WriteAPI observed elevated error rates in the US Multi-Region for a period of 5 hours.","Description":"On October 13, 2022, at 23:30 US/Pacific, Google BigQuery's Storage WriteAPI experienced elevated error rates in the US Multi-Region, affecting customers for approximately 5 hours until 04:30 US/Pacific on October 14. The issue manifested as increased connection failures for users making calls to the Write API and a slight increase in failures for the InsertAll API.\n\nThe incident was triggered by an unexpected increase in incoming and logging traffic to the BigQuery Storage Write API. This traffic surge, combined with a bug in Google's internal streaming RPC library, caused a deadlock and overloaded the Write API Streaming frontend. Although automated systems attempted to scale up instances, a separate bug in the Write API prevented existing instances from recovering, leading to continued elevated error rates due to load balancing.\n\nCustomers utilizing the Write API in the US Multi-Region observed increased levels of connection failures. Additionally, customers using the InsertAll API in the US region may have experienced a slight increase in failures due to the subsequent traffic increase.\n\nGoogle engineers were alerted at 23:47 US/Pacific and began investigation. Initial automated and manual scaling attempts did not resolve the issue. The fix involved manually restarting the stuck instances, an action that commenced between 03:20 and 04:30 US/Pacific on October 14, leading to full mitigation by 04:30 US/Pacific.\n\nTo prevent recurrence, Google has completed fixing the bug in its internal RPC library and the bug in the Write API that caused the cascading deadlock. They are also deploying additional automation in the Write API backend for automatic load balancing based on concurrent connections and improved error handling."}