Aug 2024
CrowdStrike chronicles: five essential lessons for tech resilience
Written by Ali Negus
Cyber threats are sophisticated and pervasive, so organisations must prioritise cybersecurity and operational resilience over many other initiatives.
The recent incident involving CrowdStrike—a leading cybersecurity firm—serves as a powerful reminder of the vulnerabilities that even the most robust systems can face. While the issue was ultimately attributed to a quality control problem within the firm’s supply chain, it actually exposed vulnerabilities that can arise even within the cybersecurity industry itself.”
As organisations continue to rely heavily on cloud-based solutions and interconnected technologies, the lessons learned from this incident are invaluable. From the importance of offline system preparedness, to the necessity of effective communication strategies – the fallout from the CrowdStrike situation offers critical insights to help organisations bolster their defences and ensure continuity in the face of adversity.
Here are five key takeaways from the CrowdStrike incident which can guide organisations in enhancing their cybersecurity posture and operational readiness.
Offline preparedness is a safety net
One of the understated lessons from the CrowdStrike incident is the importance of having offline systems ready to deploy in case of a critical failure. Many organisations rely heavily on cloud-based solutions, which are vulnerable to outages or cyberattacks. By maintaining offline systems that are regularly updated, organisations can ensure continuity of operations even when primary systems are compromised. These systems should be part of a broader disaster recovery plan, ensuring that they can be quickly activated and integrated into the existing infrastructure when needed.
Communication is always key
The incident underscores the necessity of having robust communication strategies in place, especially during IT outages. When traditional communication channels like email are compromised, organisations need alternative methods to reach critical staff. Implementing an SMS-based alert system or using secure messaging apps can ensure that key personnel are informed and can coordinate responses effectively. Regular drills and simulations can help staff become familiar with these alternative communication methods, ensuring a swift and coordinated response during actual incidents.
Don’t underestimate testing
CrowdStrike’s experience highlights the critical importance of comprehensive testing and deployment practices. Organisations should implement multi-layered testing procedures that include automated tests, manual reviews, and staged rollouts. This approach helps catch potential issues before they reach production environments. Additionally, organisations should establish a rollback plan for updates, allowing them to quickly revert changes if problems arise. By adopting these practices, organisations can minimise the risk of deploying faulty updates and ensure smoother operations.
Managing third parties
The CrowdStrike incident serves as a critical lesson in the importance of managing third-party vendor relationships effectively. The widespread disruption caused by a faulty update underscores the need for organisations to regularly assess the risk management and disaster recovery capabilities of their key vendors, ensuring they have robust plans in place to handle outages and provide timely support during incidents. Additionally, the incident highlights the importance of reviewing and adjusting third-party rollouts, such as utilising capabilities like throttling to manage deployment risks effectively. Safeguarding contracts by including clauses that address service-level agreements (SLAs), response times, and penalties for prolonged outages is also crucial. These strategies help mitigate risks and ensure accountability, ultimately enhancing an organisation’s resilience against disruptions.
Be ready for action
The CrowdStrike incident emphasises the necessity of having a well-defined incident response and recovery plan. Organisations should establish clear protocols for identifying, responding to, and recovering from cybersecurity incidents. This includes defining roles and responsibilities, creating communication plans, and conducting regular training exercises to ensure that staff are prepared to act swiftly in the event of a breach. A robust incident response plan not only helps in minimising damage but also aids in restoring normal operations more quickly.
By learning from these lessons—whether it’s having offline systems ready, ensuring effective communication, or planning for incidents—businesses can better protect themselves against future challenges. It’s all about being prepared and proactive, so when the unexpected happens, it can be handled without missing a beat.