Over time, application owners find themselves compelled to continuously refine their applications and the underlying infrastructure to enhance the products they deliver, whether to internal or external customers. These modifications inevitably lead to changes in the configuration of both applications and infrastructure. While some of these changes may be benign, others can unintentionally steer the systems away from their securely configured state, a phenomenon commonly referred to as "configuration drift." Left unaddressed, the extent of this drift can introduce substantial risks to the organization.
Traditionally, agent-based automation configuration management tools have been favored as the primary solution for tackling configuration drift.
However, is this approach genuinely the most effective strategy?
According to AWS's well-architected framework, the concept of a Fault Isolation Zone (FIZ) is crucial, characterized by isolation boundaries like Availability Zones (AZ), Regions, control planes, and data planes. While this concept is centered in a cloud context, the principles behind FIZ remain relevant in traditional data centers and at the network edge. The core idea is to minimize the impact of errors, particularly human misconfigurations, that can propagate beyond a defined Fault Isolation Zone.
Are misconfigurations resulting from human error still a matter of concern?
According to a 2023 report by CrowdStrike, "The number of observed cloud exploitation cases grew by 95% year-over-year in 2022, and adversaries are using a broad array of Tactics, Techniques, and Procedures (TTPs)(e.g., misconfigurations, credential theft, etc.) to compromise critical business data and applications in the cloud. Stopping cloud breaches requires agentless capabilities to protect against misconfiguration, control plane and identity-based attacks, combined with runtime security that protects cloud workloads." Even as developers increasingly adopt strategies such as containers or immutable VMs, the risk of operational human errors remains a looming threat.
The question then arises: Are classical agent-based configuration management tools the answer to detect and remediate configuration drift until we achieve complete immutability?
The challenge with many agent-based tools is that they necessitate the installation of agents on all endpoints with administrative privileges. Unfortunately, this approach leaves your fault isolation zones vulnerable to extensive consequences if an inadvertent setting is applied and propagated across all endpoints.
In today's dynamic and ever-evolving threat landscape, does it make sense to deploy agent-based tools across all your critical IT assets, which drive your business's revenue, solely for the purpose of detecting and remediating drift?
I have witnessed the potentially costly chaos that agent-based tools can incite when not properly configured with essential safeguards in place. But what if there were a more refined approach? Imagine being able to detect drift without the need for agents requiring administrative privileges.
Picture a solution that allows you to restrict the impact to only the affected endpoint where drift has been detected. With this solution, inadvertent settings cannot be applied and propagated across all endpoints. Consider the prospect of an agentless solution that can identify drifted files without requiring administrative access across your entire network of endpoints, significantly reducing the blast radius within your fault isolation zones.
In a recent blog, the use of systemd, path units, and a callback provisioner illustrates how automation can identify configuration drift and take action, limiting the impact to the affected endpoint. This approach helps make sure that misconfigured settings cannot spill over predefined boundaries. But what about Windows systems?
I encourage you to explore my colleague's latest video, which introduces a more risk-averse strategy for drift detection that includes Windows systems using Event-Driven Ansible. This method empowers your automation to promptly identify drift and take immediate action in response to unforeseen drift events—all without the need for agents, limiting the fault isolation zone to only the affected Windows or Red Hat Enterprise Linux endpoint with configuration drift.
In conclusion, let's automate drift detection and remediation when it makes sense, but let's do so with a cautious and calculated approach. Your operations and information security teams will undoubtedly appreciate this alternative method. Don't miss this compelling opportunity! Discover a safer approach to drift detection and remediation—the Ansible way—by watching my colleague's (Alexander Dworjan) latest video.