From 15:04 to 21:48 UTC on 2025-09-22, a third party provider experienced delayed data writes across all regions. This affected a subset of their customers, including Workiro, and resulted in lengthy delays in our application reading and writing document and thread data. Users were affected in that they were unable to determine whether documents had been correctly uploaded or threads correctly created. Whilst the third party systems were recovering and data was replicating geographically documents and threads were intermittently displayed.
This incident was triggered by another customer of the third party accidentally triggering a resync of their full production dataset. These operations didn’t use the recommended method for large updates and the latency of the data writes dramatically increased from the usual 160ms. When the third party’s on-call team was made aware of the issue, they identified the customer responsible for the large number of expensive writes and worked with them to identify and cease the operations that erroneously introduced these expensive write requests (if the operations could not be identified, they were prepared to enforce hard write limits on that environment to stabilize the platform). In order to preserve data integrity, their team waited for multiple regions to catch up and stay caught up. Once they validated sustained low write delays in these regions, the team initiated a manual operation to bring the remaining delayed regions in sync with the recovered ones.
Throughout the incident the Workiro product and engineering team worked with the third party to ascertain their understanding of the situation and timeline for the fix, whilst testing our own application to understand the ramifications on user experience.
Near term
Medium term