Saturday 3 August 2024

The day Microsoft stood still for CrowdStrike

It's unclear exactly when the term "BSOD" entered the public consciousness. More than a decade ago when I was working as desktop support, I knew the term to mean "Blue Screen Of Death", alluding to Microsoft's infamous default screen which warns users of a system crash. Thus, it was a little strange to see the term "BSOD" being bandied around the last couple week.


Suffice to say, the term "BSOD" has found its way into several conversations over the course of the past few weeks as Microsoft systems around the world crashed in a cacophony of azure-colored screens. The outage was caused by cybersecurity vendor CrowdStrike, as a consequence of their security program Falcon pushing a faulty update to the Microsoft branch. Once the update was downloaded and installed by roughly 8 million machines worldwide, it resulted in those systems crashing.

The fix, according to CrowdStrike, was to reboot the machines in Safe Mode, remove the file, and then reboot the machines in Normal Mode. Nothing fancy if you've ever worked in desktop support, but perhaps a little out of the grasp of the average end-user. The desktop support personnel of these affected companies would have been on high alert especially during the initial weekend. 

Damage

It's all very well to say "8 million machines crashed", but it would be far more illuminating to know ˆ those machines were doing and ˆ they were deployed. For the most part, home users who had no compelling reason to install CrowdStrike's Falcon program, were spared the outage. It should also be noted that China, having long moved on from the days where a significant portion of the country was using Windows, was unaffected (to my knowledge thus far) by this outage.

For significantly larger corporations in USA and Canada, large swathes of Europe, and India; however, this was a different story.  Airline queues were jam-packed as systems went down. Healthcare and emergency services were similarly impacted as hospital IT systems using Microsoft 365 crashed. And banks - easily the biggest item on the list. Minutes of lost service could translate to millions of dollars in transactions gone, potentially, and in terms of these outages, we're talking days

This was seen a lot.

The irony here, of course, is that CrowdStrike was employed by these companies for the express purpose of guarding against malicious attacks much like this outage. Though this is one case where we should attribute the cause to incompetence rather than malice.

CrowdStrike's error was a null pointer exception, if the Internet is to be believed. These are fairly common; not really high-level stuff. From testing procedures, deployment practices and just good old due diligence, the number of potential shortfalls in any of these areas could have resulted in the faulty update being pushed.

Silly comments

Social Media has been rife with glee from Apple fans gloating that we should all stop using Microsoft products because this was the greatest outage in history, and that Apple has never been affected to this scale. That's ridiculous when you consider that the conditions that led to these outages - cybersecurity threats, third-party cybersecurity vendors, faulty software updates - would be present in all modern operating systems regardless.

Use Apple products
instead!

Covert to MacOS to avoid outages? Absurd.

Such deranged comments are ultimately the province of mere fanboys allowing emotion rather than logic to guide the words that come out of their mouths. I'm inclined to be forgiving here; I'd be less understanding if those words came from actual tech professionals.

Finally...

It's early days yet. The fallout is still being felt around the world, though much of the initial damage has been, hopefully, resolved. Still, this is not something we'll all be forgetting anytime soon.

Over and outage,
T___T

No comments:

Post a Comment