On October 20, AWS users faced significant disruptions when a DNS issue impacted the DynamoDB service in the US-EAST-1 region. The outage began just after midnight Pacific Time, causing high error rates that affected not only DynamoDB but also other Amazon services relying on it.
Among the various services impacted, the AI search company Perplexity confirmed it was "experiencing an outage related to an AWS operational issue." Similarly, Canva reported problems connected to its underlying cloud provider, resulting in increased error rates for its users.
Real-time monitoring by Downdetector indicated that other platforms such as Venmo, Roku, Lyft, Zoom, and the McDonald’s app might also be experiencing issues tied to AWS.
At 12:11 a.m. Pacific Time, AWS acknowledged the problems through its service health status page, stating that it was investigating increased error rates and latency across several services. As further details emerged, AWS identified the issue as linked to the DNS resolution of the DynamoDB API endpoint and confirmed it was impacting various services.
The repercussions were not limited to the East Coast, with indications that global services relying on US-EAST-1 endpoints, such as IAM updates and DynamoDB Global tables, were also affected. AWS worked on multiple strategies to expedite resolution and by 2:27 a.m. Pacific Time, reported some initial mitigations had been applied. Users were advised to retry failed requests, although some services were still experiencing backlogs.
By 3:11 a.m. Pacific Time, AWS indicated that most global services reliant on US-EAST-1 had recovered and promised further updates as more information became available.
This incident underscores vulnerabilities within cloud infrastructure, emphasizing that even a single API failure can have widespread implications. In recent months, similar incidents had affected other cloud service providers, including Microsoft Azure and IBM Cloud, raising concerns among customers about their reliance on these platforms.