Nelson, the director of operations of a large manufacturing company, told me that he has a highly leveraged staff. That is, most of the work on the company's critical cloud project is being done by consultants and vendors, and not by their team members. Unfortunately, much of the staff’s time is spent making sure that the infrastructure as code (IaC) implementation is in compliance with the standards and policies that his company has for cloud resources. Nelson said, “Oftentimes the code works, but doesn’t conform to naming conventions, or is missing labels, or has security issues, which impacts downstream workflows and applications. We need the environment to conform to our policies, and I don’t want my staff burning cycles to ensure that this is the case.” This was the reason why he brought me in to run a proof of concept (POC). The POC would validate what would become a Policy as Code solution based on one of the common IaC products.
When the technical team and I reviewed the proof points for the POC, and it was a standard demonstration of Policy as Code capabilities, it was determined that Red Hat Ansible Automation Platform would satisfy all of the requirements and in some cases, would satisfy them much better than the product being reviewed in the POC.
What is “Policy as Code?”
Policy As Code aligns technical environments, processes and resources to agreed standards. Many of the policies are applied by doing pattern matching or using boolean logic through a policy engine, validating the IaC. For example, checking to make sure that none of the computing resources have a direct route to the Internet (violating a security policy), or limiting the service ports to just HTTPS and SSH. The policy engine stores the policies and uses them to ensure the resource creation will be in compliance. Most of the solutions allow for deny, warn or report based on compliance for business requirements of the specific attribute. This is a simplistic use case, but there can be variations and much more complex policies requiring a greater compliance need in frequency and scale.
The proof points that were set out for Nelson’s company are the most common examples of the non-compliance issues that had been identified. These include:
- Demonstrate the ability to verify that the right naming convention is being followed.
- Demonstrate the ability to verify that the standard labels are added to all resources.
- Validate that the Network Security Group “NSG” and routes are set correctly based on the Virtual Private Cloud “VPC”.
- Audit existing environments and identify resources that are not in compliance.
- Integrate with the Azure repository. The new cloud marketplace was in Azure.
The team took a full implementation of a vendor’s IaC for building out an environment. The vendor had submitted Terraform code for evaluation, and the team needed to review the code to verify that it was in compliance. This satisfies the first three proof points listed above. This vendor’s environment included a web front end, application services and a highly available SQL server implementation. Not overly complex, but it’s a good example of where Policy as Code needs to be used; it would normally take the staff a significant amount of time to review, update and approve as a new service.
Note: Ansible Automation Platform, Azure Resource Manager Templates and Terraform already provided IaC for this critical cloud service. So if the Policy as Code solution couldn’t evaluate multiple IaC solutions, it would result in a lot of rework for the environment that already had a lot of time and money invested.
The POC was launched and everyone signed off on the proof points.
In the product review, three main challenges emerged:
- All of the code for evaluation needed to be shipped to the cloud. This is perceived as increasing the security risk.
- The solution only worked with one IaC solution. This means that much of the existing IaC would need to be reworked to use the solution and drive lock-in going forward, limiting flexibility and scalability.
- The solution couldn’t easily audit the existing environment, so there wasn’t any ability to review and correct existing resources in the environment. This is especially critical when considering that if a new policy was added, or an existing policy was changed, there is no easy way to apply these policies to the existing environment.
Now, let’s discuss how to automate Policy as Code with Ansible Automation Platform.
Applying Policy As Code to Ansible Automation Platform
Shortly after the POC was organized, IBM Client Engineering reached out to see if they could help with the effort. IBM Client Engineering specializes in doing proof points to help customers succeed. I enlisted them to do a parallel effort using Ansible Automation Platform instead of the other product.This portion of the blog covers the combination of tasks and components that generated a successful result.
High level details of the tasks
This provides a quick overview of the tasks that were launched and accomplished. While there are a lot more details, this provides a high level overview of the successful project execution. It took two months to complete; the people involved were time slicing.
- Launch the project - Defining the goals, and acquiring the resources, including demo environment, people, timeline and tools. This also included setting the meeting cadence.
- Build out the components - Installing the products and determining roles.
- Build out the proof points - Validate the viability of the solution.
- Build out the demo - Practice and validate.
- Planning for production - Support, training and processes.
- Turnover to production - Put in place the security, processes and operational components required to meet production requirements and start servicing customers.
Resources and tools
Ansible Automation Platform was the central product being used and leveraged Open Policy Agent (OPA). OPA is one of many offerings that enable a Policy as Code solution. Even though the environment was set up in Azure, nothing limits this effort to Azure. The solution would work the same for any datacenter, cloud service provider or hybrid cloud model. Similarly, the code and work was done in GitHub, but any GitOps service would provide the same functionality. It would take a technical difficulty of medium to senior level experience to build this from scratch with Ansible Automation Platform and open source. The beauty of this solution is that it combines Policy as Code with automation to provide capabilities that go well beyond the customer’s expectations.
Overview of the solution
All of the proof points can be demonstrated using Ansible Automation Platform to automate Policy as Code. For more information or a demonstration, reach out to the author of this blog. The beauty of using Ansible Automation Platform is that it works with any IaC solution and can be used by most consumers of IaC. The images below is based on the results of building and satisfying the POCs.
The next illustration provides a little more detail on OPA. Ansible Automation Platform is currently being used in the environment and the capabilities are well understood. OPA is also a mature and graduated open source project, like Ansible Automation Platform, and provides the Policy as Code specific functionality. You can learn more information on OPA here.
First, let’s address the proof points that were required for the full application stack. This diagram shows that Terraform is being used for building out the environment. The beauty of Ansible Automation Platform is that it would work with any IaC stack, no refactoring the code into another IAC solution is necessary. Ansible Automation Platform and OPA were used to validate that the code can conform to the established policies and standards. It validated both the negative, which shows that it fails if the policy isn’t met, and the positive, meaning it succeeds if the policy is met. Ansible Automation Platform can chain together actions, so it also provides a way to notify on failure, mitigate failures, etc. In this scenario, the proof point was successful.
The next demonstration shows how to test and validate new policies. Again, the Policy as Code solution can also verify that the policies being built work, and perform as expected. New policies can also be vetted and validated using the solution.
Also, all of the policies and code are maintained in Azure Repos. But again, any other common repository can be used. Finally OPA is lightweight enough that a developer can easily use it to validate a new policy on their laptop before checking it in as code.
The final illustration shows that Ansible Automation Platform can audit an existing environment, either based on a schedule or based on introducing a new policy and retrofitting an existing environment. It is possible to also automatically generate and even run playbooks to remediate the areas that don’t comply with the policies. For the sake of the POC, this was well beyond the expectations of the customer. Ansible Automation Platform and the team hit it out of the park with the solution. In addition to all of the proof points being met, this makes a great solution for the automation of checking the policies, the automation of building the playbooks and the support for checking all of the IaC solutions.
In conclusion, we recommend Ansible Automation Platform with OPA as the top choice for a Policy as Code solution because of the following:
- It can be used with and works with all IaC choices.
- Developers can pretest the code before submitting it into the repository and can validate the code without moving it offsite.
- It provides the ability to check existing environments, or mitigate environments to align new or updated policies.
- It can automatically generate playbooks to fix non-compliance, or even adjust the code to maintain compliance.
- It provides a central management and workflow engine to develop and deploy the policies, and also takes steps based on the IaC alignment to the policies.
- Using the role-based access control capabilities of Ansible Automation Platform, it is easy to assign the right tasks to the right roles, i.e., if a portion of the code deals with the network and another portion deals with the Kubernetes cluster, the remediation is routed correctly.
- It leverages all of the other advantages that Ansible Automation Platform provides, and becomes part of the large ecosystem of playbooks, modules, knowledge and existing implementations.
I would like to acknowledge the individuals that led the technical effort:
- John E Martin - IBM - Architect and technical lead
- Jennifer Nguyen - IBM - Automation Engineer
- Ryan DeCoster - IBM - Automation Engineer