CyberVoices

Canadian cybersecurity news and thought leadership

hero-jobbies-7

A Research Perspective On Those Unavoidable IT Outages

In light of the recent incident involving a faulty update to CrowdStrike's security software, which has been the subject of extensive discussion, we aim to shift the focus of the conversation. outages and disruptions are increasingly common and are expected to become more frequent over time. This blog seeks to refocus the discourse on research around this phenomenon and on the valuable lessons we can learn from such incidents, rather than dwelling on the company's specific errors. By examining the broader implications and lessons to be drawn, we can better prepare for and mitigate future challenges in cybersecurity.

Summary of the incident, in case you missed it

On July 19, 2024, CrowdStrike, a prominent American cybersecurity company, inadvertently released a faulty update to its security software. This update led to severe disruptions for computers running Microsoft Windows, causing an unprecedented outage. Approximately 8.5 million systems crashed and failed to restart properly.

The impact was felt globally, affecting daily life, businesses, and government operations. Critical industries such as airlines, airports, banks, hotels, hospitals, manufacturing, stock markets, broadcasting, gas stations, and retail stores were all disrupted. Additionally, government services, including emergency response and official websites, were severely impacted. The financial damage from this worldwide outage is estimated to exceed US$10 billion.

Not An Isolated Case: The prevalence of outages

Researchers tracked publicly reported outages from 2016 to 2022 and documented a total of 656 incidents. In 2022 alone, 60% of organizations reported experiencing at least one outage, with 14% of these incidents being classified as serious. The primary causes of these outages were network or connectivity issues, accounting for 31% of the cases, followed closely by power outages. Additionally, 18% of outages stemmed from IT systems failures.

A noteworthy finding is that 42% of all outages were attributed to problems with third-party services. This statistic underscores the pervasive interdependence within modern technological ecosystems. While the specialization of services and reliance on third parties can enhance operational efficiency, it also introduces vulnerabilities and risks. The benefits of specialization come with the challenge of managing and mitigating the potential impact of disruptions from external service providers. This highlights the importance of not only advancing our technological capabilities but also fortifying our strategies to address and adapt to these interconnected vulnerabilities.

 

Imagining the worst-case scenario: The non-fictional case of Estonia

In the recent case that garnered significant media attention, the outage was attributed to an IT system failure. However, the media landscape is often saturated with news about cyberattacks that lead to similar disruptions. To understand the potential dramatic consequences of information technology outages, it’s instructive to examine the historic example of Estonia.

In April 2007, Estonia, renowned for its advanced digital infrastructure, faced a severe crisis when it was hit by a series of cyberattacks that significantly disrupted its digital landscape. Commencing on April 27 and persisting for several weeks, these attacks primarily involved Distributed Denial of Service (DDoS) assaults, which inundated websites and online services with overwhelming volumes of traffic, rendering them inaccessible.

The attacks targeted critical infrastructure, including government websites, banking systems, and media outlets, plunging many Estonians into confusion and anxiety. The public grappled with uncertainty about the nature and origin of the attacks, leading to heightened stress and fear. The lack of clear communication exacerbated the situation, as people struggled to access information, conduct transactions, and connect with government services. Some individuals mistakenly believed they were under a more traditional form of aggression rather than a digital assault. Reports emerged of increased tensions and sporadic violence in the streets, illustrating how the disruption of essential services can escalate into broader social unrest. The situation underscored the tragic consequences that can arise when a population is left in the dark during a critical crisis.

How people perceive outages

Researchers have investigated how people would perceive and respond to an extended Internet outage. The central question is whether such an event would be seen as a minor inconvenience or as a catastrophic disruption. While most respondents recognize that a long-term Internet outage is a plausible scenario, their perceptions of its impact vary widely. The majority approach the potential disruption from an egocentric perspective, contemplating how their personal relationships and daily activities might be affected. They envision a range of outcomes, including both positive aspects such as increased face-to-face interactions and time for offline pursuits, as well as negative consequences like interruptions to communication and personal convenience. In contrast, a smaller segment of respondents adopts a societal perspective, considering the broader ramifications of such an outage. They foresee severe disruptions to infrastructure and widespread societal problems, reflecting concerns about systemic failures and the potential breakdown of essential services that depend on the Internet. This dichotomy in perspectives underscores the diverse ways people anticipate and prepare for the impacts of significant digital disruptions.

We need to expect increased probability of naturally caused outages

We've discussed IT system failures and cyberattacks, but it's also important to recognize that outages can be caused by natural events as well. Regardless of the cause, these outages all lead to similar disruptive consequences for people. Outages are expected to become more frequent due to factors such as climate change, an aging electrical grid, and rising energy demand. Research conducted in the United States has examined outage patterns from 2018 to 2020, revealing some concerning trends. On average, there were 520 million customer-hours without power annually across 2,447 counties—covering 73.7% of the U.S. population. This period saw 17,484 outages lasting over eight hours, a duration with potential health implications, and 231,174 outages lasting over one hour.

The data highlights a greater frequency of outages in the Northeastern, Southern, and Appalachian regions. Specifically, counties in Arkansas, Louisiana, and Michigan face a dual challenge of frequent extended outages and high social vulnerability, particularly affecting those reliant on electricity-dependent medical equipment.

Moreover, 62.1% of outages lasting more than eight hours were associated with extreme weather events, including heavy precipitation, anomalous heat, and tropical cyclones. These findings underscore the growing intersection of infrastructure resilience and climate-related challenges, emphasizing the need for enhanced strategies to address both immediate and long-term impacts on energy reliability.

Building on resilience

While long-duration power outages can cause considerable economic damage to local communities, the potential human catastrophe resulting from the failure of essential services often far exceeds the financial impact. The effects of power outages are not uniformly distributed among individuals; access to resources and support can greatly influence how people cope with extended disruptions. Literature highlights that socioeconomic and demographic factors correlate with increased health risks, varying levels of power outage preparedness, and differing capacities for evacuation if needed. This underscores the importance of identifying socially vulnerable groups and communities to ensure that during such events, information, assistance, and resources are delivered in a more targeted and effective manner.

Researchers have developed a three-dimensional metric of social vulnerability to assess the degree to which individuals' lives or livelihoods are at risk during a prolonged power outage. This metric encompasses dimensions of health, preparedness, and evacuation, providing a comprehensive framework for understanding and addressing the varying impacts of power outages on different segments of the population.

Researchers have identified six U.S. states that exhibit the highest levels of power resilience. Notably, power resilience against natural hazards has seen significant improvements in the South and Northeast regions of the United States. This enhanced resilience reflects advancements in infrastructure and response strategies that have bolstered these areas' ability to withstand and recover from extreme weather events.

However, the analysis also reveals a concerning trend: outages caused by human attacks are disproportionately prevalent in the Western Electricity Coordinating Council (WECC) region. This region, which covers the western part of the United States, experiences a higher frequency of outages resulting from deliberate sabotage or cyberattacks compared to other regions. This disparity highlights the need for targeted security measures and improved defenses to protect critical infrastructure in areas particularly vulnerable to human-caused disruptions.

Conclusion

It is important to recognize that IT outages, whether caused by power failures or cyberattacks, are profoundly disruptive and are likely to become more frequent over time. Our growing reliance on technology heightens our vulnerability to these disruptions. No organization, regardless of its size or expertise, is immune to such risks. Building resilience is crucial for mitigating the impact of these events. Establishing a robust Computer Emergency Response Team (CERT) is essential for enhancing an organization's capacity to respond effectively to future cyber incidents. Moreover, it is increasingly necessary to consider the development of a nationwide CERT to provide a coordinated and comprehensive approach to managing and mitigating these pervasive threats.