On July 19th, 2024 at 12:09am EST, security technology provider CrowdStrike released an update that triggered crashes worldwide, causing the “blue screen of death” (BSOD) and disrupting travel, healthcare, and more for millions around the globe Silverchair’s Incident Response Team jumped into action to address this issue for the clients on our platform (and several of our own colleagues staring at the BSOD), keeping them informed every step of the way.  

The Incident

Beginning at 1:18am Silverchair systems received the CrowdStrike Falcon Sensor update and were affected by the crashes. The crashes were due to a defect in the CrowdStrike Sensor Rapid Response Content, which went undetected during CrowdStrike validation checks. The defect caused Windows crashes on many of our servers running Windows with CrowdStrike Falcon Sensor installed. 

As soon as the crashes occurred, the Silverchair Incident Response Team was alerted and immediately began to investigate the cause of the Windows server crashes. On-call teams were brought in and quickly coordinated in a dedicated Teams channel, with representatives from TechOps, Incident Response, Client Services, Leadership, and more.  

Tech Solutions

Thanks to quick, collaborative trouble-shooting, additional ‘cold-spare’ servers were brought online, and disaster recovery (DR) initial protocols were instigated in case they were deemed necessary later in the incident. As these temporary fixes were rolled out, news began to spread online that the incident was caused by a major CrowdStrike defect that affected more than 8.5 million devices globally. The fix announced from CrowdStrike was systematically applied to each of our servers, restoring connectivity and bringing the impacted services back online. Public platform sites were the first sites restored, followed soon after by Silverchair’s self-serve publisher tools and client portal, then our internal workflow tools (JIRA and Confluence).  

As the rest of our team members began their workdays and encountered the BSOD on their work laptops, the tech teams worked to identify all affected Silverchair devices and apply the fix remotely. The company’s handful of Mac users had a very uneventful morning.  

Communications

Throughout the incident, Silverchair’s Client Services team worked closely with the Incident Response team to keep clients informed, sending 7 client-wide emails with updates. The first message alerted clients to the incident, and updates sent every 1-2 hours informed clients what we knew about the cause of the incident and the work our teams were undertaking to restore stability. As sites and tools became available, Client Services immediately notified Silverchair’s publishers community, and were able to resolve the incident with minimal disruption.   

Though unfortunate, this incident is a good example of Silverchair’s sophisticated security, alerting, and technical processes as well as our client-centric approach to communication. 

Emilie Delquié, SVP of Product & Customer Success said: ”Seeing the Silverchair Incident Response Team jump into action so quickly, calmly and effectively – let alone in the middle of the night for some of the team members – was amazing! What was a complex issue was resolved as swiftly as it could have been and I’m very grateful for our team’s dedication and quick thinking that day.” 

Read Crowdstrike’s full post-incident report here.  

1993 1999 2000s 2010 2017 calendar facebook instagram landscape linkedin news pen stats trophy twitter zapnito