A risky trade-off made CrowdStrike’s outage so devastating – cybersecurity leaders say there’s no easy fix


The flaw essentially caused computers running Microsoft Windows to freeze up and display the dreaded ‘blue screen of death’. Affected systems needed to be brought back to life, one by one. — Getty Images/The New York Times

When Michael Armer’s phone started blowing up at 4am Friday morning, he “freaked out”.

Armer, the chief information security officer at RingCentral, was receiving notifications about a stunning computer outage that was knocking down airport, bank, and hospital tech systems like dominos.

The scope of the chaos raised fears of a major cybersecurity breach or a state-sponsored attack. “That’s enough to get your blood flowing really quickly,” Armer said.

It turns out that the massive computer outage was not the work of nefarious hackers. It was the result of a glitch in a routine software update by security company CrowdStrike. “We were all very fortunate that this was related to one of their standardised and automated software deployments,” Armer says of the CrowdStrike update snafu.

But along with the relief that the disruption was not a cyberattack, the incident has highlighted the fragility and frightening interconnectedness of the technology modern society depends on – and the extent of the danger posed by today’s convoluted system of software updates which security experts say stretches staff thin at even the largest organisations and forces a constant balancing act of risky trade-offs.

The problem with patches

Security software like CrowdStrike provide “patches”, or software updates, when threats are detected. Given the number of hackers probing companies' systems and devising new lines of attack, the need for patches is constant – sometimes as many as several times a day. Organisations move quickly and often automate these updates to ensure that there are no holes in their protective shields.

The problem is that new software is like an untested pharmaceutical drug – each new line of code could have a bug or defect that causes problems, unexpected side effects, and dangerous interactions with other software. In an ideal situation, a company would take the time to test each software update before deploying it to all their computers.

“It’s a really difficult conundrum, you cannot keep up with the number,” said a CISO at a top law firm in New York City. “Sometimes you have to put out a security patch because it’s critical and you’ve got vendors breathing down your neck and there’s no way to [test] it,” he said. “Sometimes there are several updates within a 24-hour period so you’d be caught in a recursive circle of testing where you would just never be done.”

For many in-house security teams, that means striking a balance between speed and risk. “The antivirus products are pushing up multiple updates per day because in some ways we've pushed them into a corner,” said Paul Davis, field CISO at software supply chain platform JFrog. “The faster that they can respond to detect a piece of software or malicious activity, the better they are. So that being the case, then the requirement to test multiple times a day becomes onerous.”

The real challenge, he said, is how to protect the organisation that is responding to cybersecurity threats which can spread in hours, or even minutes, and at the same time make sure those software updates are tested. “We have to test the basic functionality of the software, but we rely on these automated updates to be safe, and it’s almost like a calculated risk.”

Hands-on CPR for each affected computer

The New York City law firm uses more than 30 separate security tools from a variety of vendors that run on laptops, desktops or servers. Normally, if an update causes problems, the software vendor will deploy a fix that an organisation can quickly push to thousands of computers within the same day.

But because of the nature of the CrowdStrike flaw however, that wasn’t possible. The flaw essentially caused computers running Microsoft Windows to freeze up and display the dreaded “blue screen of death”. Affected systems needed to be brought back to life, one by one.

“You have to physically walk over to every computer and power it down and then bring it up, and when the screen comes up, you have to hit F3 to go into what they call Safe Mode and then go and delete a file somewhere,” the New York law firm CISO explained. “It’s just a nightmare.”

Some CISOs, however, put the bulk of the blame on Microsoft, not on CrowdStrike – and even avoid Windows altogether if they can. “In Silicon Valley, tech companies tend to avoid Windows,” said the CISO of a medium-sized AI company, who requested anonymity due to the sensitivity of discussing security mitigations.

He said that it is because of the design of Windows in its core architecture that leads to malware, spyware and the driver instability that occurred today as a result of the CrowdStrike flawed update.

“CrowdStrike has clear process improvements to make, obviously, but it should not be possible in 2024 to have a kernel (core architecture) which is destabilised by a third party,” he said. “Microsoft has had a bad year, from a security perspective, and they have to win the trust of the ecosystem back.” Microsoft did not respond to a request for comment other than pointing to its existing statement about the outage.

In a statement posted online July 19, CrowdStrike CEO George Kurtz apologised for the incident, which he said involved a “content update for Windows hosts”, noting that Mac and Linux hosts were not affected. “All of CrowdStrike understands the gravity and impact of the situation. We quickly identified the issue and deployed a fix, allowing us to focus diligently on restoring customer systems as our highest priority.”

Post-game analysis

JFrog’s Davis pushed back on the idea that a typical organisation could get away with not using Windows. “Windows is still the predominant operating system,” he said. “When you join a company, you’re [usually] offered either a Windows machine or a Mac machine.”

John Paul Cunningham, CISO at identity security company Silverfort, said that Friday's outage should be a wake-up for call for organisations, and make companies more leery of automated software updates. In Cunningham's view, all threats are not created equal and companies can exercise more discretion by not always defaulting to the automated updates.

“Companies like CrowdStrike often suggest doing auto updates with this premise that staying on the most current release of the product is more secure,” he said. But companies can take more time to test it before pushing it out, he said, even if it takes a little more work. “As long as the security team knows there is an update, they can push it out manually – the update itself is still automatic.”

The bottom line is that for most cybersecurity leaders, figuring out how to strike a balance – between risk and speed, and between operating systems – will require some post-game analysis and decision-making, said RingCentral's Armer.

And while getting a grip on software updates is important, he noted that companies should also be thankful Friday's outage was not even worse. “I personally am thankful that it wasn't a state-sponsored attack,” he said. – Fortune.com/The New York Times

Follow us on our official WhatsApp channel for breaking news alerts and key updates!
   

Next In Tech News

Exclusive-Amazon likely to face investigation under EU tech rules next year, sources say
US natgas producers chase AI-driven surge in power demand to weather low prices
Snowflake shares surge on rosy forecast, AI deal with Anthropic
Digital banks lead profitability gains among Brazilian lenders, says central bank
PayPal fixes outage that affected thousands worldwide
X's former top policy chief takes job with Elon Musk rival, Sam Altman
Alibaba integrates e-commerce platforms into a single business unit
US watchdog issues final rule to supervise Big Tech payments, digital wallets
Nvidia to build AI school in Indonesia, VP says
A Google PC running Android could be in the works

Others Also Read