Helping our customers through the CrowdStrike outage - The Official Microsoft Blog
webCredibility Rating
High quality. Established institution or organization with editorial oversight and accountability.
Rating inherited from publication venue: Microsoft
This Microsoft blog post documents the July 2024 CrowdStrike software update incident that disrupted 8.5 million Windows devices globally, illustrating systemic risks from software deployment failures in critical infrastructure — a concrete case study relevant to AI safety discussions about deployment safety and cascading failures.
Metadata
Summary
Microsoft's official response to the July 2024 CrowdStrike faulty update that affected 8.5 million Windows devices globally, detailing remediation steps including engineer deployment, cross-cloud collaboration with AWS and GCP, and technical workarounds. The post highlights the interconnected nature of the tech ecosystem and the importance of safe deployment practices and disaster recovery mechanisms.
Key Points
- •CrowdStrike's faulty software update affected approximately 8.5 million Windows devices (<1% of all Windows machines) but caused broad economic and societal disruption.
- •Microsoft deployed hundreds of engineers and collaborated with AWS and GCP to develop scalable remediation solutions.
- •The incident underscores systemic risks from software update failures in critical enterprise infrastructure.
- •Microsoft emphasized the importance of safe deployment practices and disaster recovery mechanisms across the tech ecosystem.
- •Cross-industry collaboration was highlighted as essential for effective incident response and recovery.
Cited by 2 pages
| Page | Type | Quality |
|---|---|---|
| AI Cyber Damage: Bounding the Tail | Analysis | -- |
| Catastrophic Cyber Tail Risk | Risk | -- |
Cached Content Preview
On July 18, CrowdStrike, an independent cybersecurity company, released a software update that began impacting IT systems globally. Although this was not a Microsoft incident, given it impacts our ecosystem, we want to provide an update on the steps we’ve taken with CrowdStrike and others to remediate and support our customers.
Since this event began, we’ve maintained ongoing communication with our customers, CrowdStrike and external developers to collect information and expedite solutions. We recognize the disruption this problem has caused for businesses and in the daily routines of many individuals. Our focus is providing customers with technical guidance and support to safely bring disrupted systems back online. Steps taken have included:
Engaging with CrowdStrike to automate their work on developing a solution. CrowdStrike has recommended a workaround to address this issue and has also issued a public statement. Instructions to remedy the situation on Windows endpoints were posted on the Windows Message Center .
Deploying hundreds of Microsoft engineers and experts to work directly with customers to restore services.
Collaborating with other cloud providers and stakeholders, including Google Cloud Platform (GCP) and Amazon Web Services (AWS), to share awareness on the state of impact we are each seeing across the industry and inform ongoing conversations with CrowdStrike and customers.
Quickly posting manual remediation documentation and scripts found here .
Keeping customers informed of the latest status on the incident through the Azure Status Dashboard here .
We’re working around the clock and providing ongoing updates and support. Additionally, CrowdStrike has helped us develop a scalable solution that will help Microsoft’s Azure infrastructure accelerate a fix for CrowdStrike’s faulty update. We have also worked with both AWS and GCP to collaborate on the most effective approaches.
While software updates may occasionally cause disturbances, significant incidents like the CrowdStrike event are infrequent. We currently estimate that CrowdStrike’s update affected 8.5 million Windows devices, or less than one percent of all Windows machines. While the percentage was small, the broad economic and societal impacts reflect the use of CrowdStrike by enterprises that run many critical services.
This incident demonstrates the interconnected nature of our broad ecosystem — global cloud providers, software platforms, security vendors and other software vendors, and customers. It’s also a reminder of how important it is for all of us across the tech ecosystem to prioritize operating with safe deployment and disaster recovery using the mechanisms that exist. As we’ve seen over the last two days, we learn, recover and move forward most effectively when we collaborate and work together. We appreciate the cooperation and collaboration of our entire sector, and we will conti
... (truncated, 3 KB total)da77a02884b55432 | Stable ID: sid_HxxkNqawSA