Creating custom Event-Driven Ansible source plugins

Creating custom Event-Driven Ansible source plugins

We're surrounded! Our modern systems and applications are constantly generating events. These events could be generated by service requests, application events, health checks, etc. With the wealth of information from event traffic surrounding everything we do, Event-Driven Ansible allows for automated responses to incoming events.

But not only are we completely engulfed in event data, we're also enveloped by event sources. Think about your organization or even your household for a minute and consider how many pieces of equipment or applications are generating data that could be put to use if only you were able to easily collect it.

Event source plugins within Event-Driven Ansible act as a bridge between Ansible and event generating applications and services. Event-Driven Ansible already has a handful of event plugins to consume events from a variety of sources. But what if your source plug-in isn't represented in that list? Or what if you're a Red Hat partner who wants to connect Event-Driven Ansible to your own solution? The good news is, developing event source plugins for Event-Driven Ansible can be a relatively painless endeavor.

What is a source plugin?

Event-Driven Ansible leverages rulebooks to codify the response to an event. Rulebooks combine sources, conditions and actions. An action is executed based on one or more conditions of an event coming from a source. Event source plugins allow rulebooks to receive events from things like cloud services, applications and brokers. Without an event source, events aren't received and actions aren't taken.

Event sources are Python scripts contained within an Ansible Content Collection. Within a rulebook, event sources are called by name and parameters included in the rulebook source configuration are passed into the event source plugin. Within the event source plugin, routines should be written as asynchronous to prevent blocking, allowing events to be received and addressed as efficiently as possible across multiple event sources. For this reason, you'll notice that all of the initial source plugins like Kafka and webhook take advantage of the asynchronous IO paradigm.

Source plugin guidelines

Scoping a new event source plugin should be straightforward. For that reason, there aren't many requirements for the plugin. To get started with plugin development, here are some guidelines for source plugins:

  1. The source plugin must contain a specific entry point.
  2. Each source must have nested keys which match arguments expected by the main function.
  3. Source plugins should be documented with intended purpose, expected arguments, and a rulebook example.
  4. Event source plugins should be distributed within Collections.
  5. Python routines should be written as non-blocking or asynchronous.
  6. Source plugins should include a way to test the plugin outside of Event-Driven Ansible.

To demonstrate some of these guidelines, I'll use an example source plugin that I created. My source plugin is called new_records and it watches a table within ServiceNow for new records to be created (e.g. new incidents, problems and change requests). If you'd like to test this source plugin for yourself, you'll need a ServiceNow instance which you can provision as part of the ServiceNow developer program

Before you go out and test my example plugin, please know that this plugin is coming from a sub-par python person, is meant to be an example and not at all endorsed or suggested for production use. ServiceNow instances also have rate limit rules for REST resources that you may hit by polling too often. Considering that the event push paradigm is preferred for Event-Driven Ansible source plugins, a better implementation of this source plugin might be to create a ServiceNow webservice to push event details to an event aggregator! In this scenario, our integrated application (ServiceNow) would PUSH event details to something like JetStream or Kafka (for which there is already an event source plugin!).

The source plugin must contain a specific entry point.

A source plugin requires a pretty specific entrypoint configuration. This entrypoint represents a function within the Python script that will be called by ansible-rulebook, the component of Event-Driven Ansible responsible for executing rulebooks. Let's take a look at the very beginning of my custom source plugin for ServiceNow:

import asyncio
import time
import os
from typing import Any, Dict
import aiohttp

# Entrypoint from ansible-rulebook
async def main(queue: asyncio.Queue, args: Dict[str, Any]):

After all of the import statements at the beginning of my plugin, you can see the entrypoint is an asynchronous function called main, which accepts two arguments. The first argument is an asyncio queue that will be consumed by ansible-rulebook as this source is used within a rulebook. The second argument creates a dictionary of arguments that my particular source plugin requires to make a connection to my ServiceNow instance. This dictionary will include things like username, password and URL for my ServiceNow instance. That's really all that's expected as far as the entrypoint is concerned. 

Each source must have nested keys which match arguments expected by the main function.

This is a slightly more complicated way of saying that the arguments I require within my custom ServiceNow event plugin should also be keys within the rulebook used to configure the source plugin. To demonstrate this, look at the source configuration for my custom plugin within a rulebook and then look at the arguments expected by the main function that ansible-rulebook executes:

Rulebook example:

- name: Watch for new records
  hosts: localhost
  sources:
    - cloin.servicenow.new_records:
            instance: https://dev-012345.service-now.com
            username: ansible
            password: ansible
            table: incident
            interval: 1

Plugin code:

# Entrypoint from ansible-rulebook
async def main(queue: asyncio.Queue, args: Dict[str, Any]):

    instance = args.get("instance")
    username = args.get("username")
    password = args.get("password")
    table   = args.get("table")
    query   = args.get("query", "sys_created_onONToday@javascript:gs.beginningOfToday()@javascript:gs.endOfToday()")
    interval = int(args.get("interval", 5))

As a note, if you're worried about distributing rulebooks with credentials or other sensitive arguments, ansible-rulebook also accepts variables set in vars files or from environment variables using --vars or --env-vars respectively. This would mean that your rulebook source configuration could look more like:

- name: Watch for new records
  hosts: localhost
  sources:
    - cloin.servicenow.new_records:
        instance: {{ SN_HOST }}
        username: {{ SN_USERNAME }}
        password: {{ SN_PASSWORD }}
        table: incident
        interval: 1

Source plugins should be documented with purpose, expected arguments, and a rulebook example.

This is sort of a no-brainer that even I, an incredibly sub-par Python developer, can get on board with. In fact, this is actually one of my New Year's resolutions for 2023. Take a look at the top of my source plugin as an example:

"""
new_records.py

Description:
event-driven-ansible source plugin example
Poll ServiceNow API for new records in a table
Only retrieves records created after the script began executing
This script can be tested outside of ansible-rulebook by specifying
environment variables for SN_HOST, SN_USERNAME, SN_PASSWORD, SN_TABLE

Arguments:
  - instance: ServiceNow instance (e.g. https://dev-012345.service-now.com)
  - username: ServiceNow username
  - password: ServiceNow password
  - table:  Table to watch for new records
  - query:  (optional) Records to query. Defaults to records created today
  - interval: (optional) How often to poll for new records. Defaults to 5 seconds

Usage in a rulebook:
- name: Watch for new records
  hosts: localhost
  sources:
    - cloin.servicenow.new_records:
            instance: https://dev-012345.service-now.com
            username: ansible
            password: ansible
            table: incident
            interval: 1
  rules:
    - name: New record created
      condition: event.sys_id is defined
      action:
            debug:
"""

Fair enough of a guideline, right? The documentation pretty clearly lays out that this is an Event-Driven Ansible plugin, what the plugin can be expected to do, the arguments the plugin accepts and how to use this plugin within a rulebook. 

Event source plugins should be distributed within Collections.

Ansible Content Collections represent the model by which Ansible content can be easily distributed. Typically, these Collections contain things like plugins, roles, playbooks and documentation, and demonstrate Ansible's extensibility. Event source plugins and rulebooks become just additional content types that can be distributed by way of Ansible Content Collections. This is demonstrated in my plugin documentation here:

- name: Watch for new records
  hosts: localhost
  sources:
    - cloin.servicenow.new_records:
            instance: https://dev-012345.service-now.com

Python routines should be written as non-blocking or asynchronous.

The asynchronous model says that, for example, requests against the ServiceNow API by the new_records source plugin shouldn't block or slow down requests to another API by another source plugin. By using asyncio along with async and await within the plugin, we simply pause that one routine and await a result instead of blocking other routines from executing. If you combine two source plugins written to utilize only synchronous routines into the same rulebook, you could find that your rulebook executes slowly or reacts to events long after they occurred. Here's an example from my source plugin:

            async with session.get(f'{instance}/api/now/table/{table}?sysparm_query={query}', auth=auth) as resp:
                if resp.status == 200:

                    records = await resp.json()
                    for record in records['result']:
…
                      await queue.put(record)

Note the keywords async and await. The async keyword lets Python know that this coroutine will be executed asynchronously within an event loop while waiting on the result from whatever has been "awaited" designated by the await keyword, in this case, the response from the ServiceNow API call.

Another line worth mentioning is the final await in the above snippet of queue.put(record). This is an essential line as this is how the record can be consumed by the rulebook engine. By putting the record returned by the ServiceNow API onto the queue, we're able to execute actions defined in the rulebook  based on the record returned by the API request.

Source plugins should include a way to test the plugin outside of Event-Driven Ansible.

This one really isn't a hard and fast rule for creating source plugins. I'd say it's more helpful in the plugin development process and may more resemble a best practice or general tip than anything else. By including a function that only runs when the script is called directly by running, for example: python new_records.py, you're able to quickly test changes to the script without first setting up a rulebook and starting ansible-rulebook. For my sample plugin, I use the following:

# this is only called when testing plugin directly, without ansible-rulebook
if __name__ == "__main__":
    instance = os.environ.get('SN_HOST')
    username = os.environ.get('SN_USERNAME')
    password = os.environ.get('SN_PASSWORD')
    table   = os.environ.get('SN_TABLE')

    class MockQueue:
        async def put(self, event):
            print(event)

    asyncio.run(main(MockQueue(), {"instance": instance, "username": username, "password": password, "table": table}))

If you take a look at that code example, you can see a comment that this is really just for testing the Python script directly. If you want to test this code yourself, you can define the four environment variables (e.g. export SN_TABLE=incident...) and then execute the script. From there, open up your ServiceNow instance and create a new record in the table you're watching (in the case of SN_TABLE=incident,  you'd want to create a new incident) and see that the script prints out the newly created record.










The Zen of Ansible

The Zen of Ansible

This blog post is based on my presentation at AnsibleFest 2022 in Chicago and virtually.

Recently, a suggestion was made to adopt Tim Peters' "The Zen of Python" as an overall guiding principle for designing good automation content. That gave me pause because it didn't seem like the right thing to me. While there is definitely some very good advice to "The Zen of Python" that can be applied to Ansible content, adopting it in its entirety would not provide the best user experience that Ansible is capable of and known for. Its presence as a guiding principle for content design gives the wrong impression and re-enforces a mindset we don't want to recommend.

This got me thinking, what is "the zen" of Ansible?

I considered the spirit of "The Zen of Python" and then I returned to the Ansible best practices talk that I first co-presented back in 2016 at Red Hat Summit. In that talk, I said that Ansible was designed with a philosophy of sorts from the very beginning.

"The Ansible way" is to provide an automation tool that is simple, powerful and agentless. Ansible enables users with no special coding skills to do powerful things across multiple IT domains. Its human readable automation can be utilized and shared by every IT team so they can get productive quickly and contribute their expertise. Its agentless architecture provides the flexibility to be applied across all IT infrastructure domains.

Ansible simple powerful agentless

It is this thinking behind its design that everything in this post relates back to in one way or another.

Besides "The Zen of Python" and my Ansible best practices talk, I also considered what I've heard talking to hundreds of you in my many years within the Ansible ecosystem. What I came up with are these 20 aphorisms for Ansible.

Ansible zen image

  1. Ansible is not Python.
  2. YAML sucks for coding.
  3. Playbooks are not for programming.
  4. Ansible users are (most likely) not programmers.
  5. Clear is better than cluttered.
  6. Concise is better than verbose.
  7. Simple is better than complex.
  8. Readability counts.
  9. Helping users get things done matters most.
  10. User experience beats ideological purity.
  11. "Magic" conquers the manual.
  12. When giving users options, use convention over configuration.
  13. Declarative is better than imperative -- most of the time.
  14. Focus avoids complexity.
  15. Complexity kills productivity.
  16. If the implementation is hard to explain, it's a bad idea.
  17. Every shell command and UI interaction is an opportunity to automate.
  18. Just because something works, doesn't mean it can't be improved.
  19. Friction should be eliminated whenever possible.
  20. Automation is a journey that never ends.

Your Ansible automation content doesn't necessarily have to follow this guidance, but they're good ideas to keep in mind. These aphorisms are opinions that can be debated and sometimes can be contradictory. What matters is that they communicate a mindset for getting the most from Ansible and your automation.

Let me take you deeper into each of the aphorisms and explain what they mean to your automation practice.

Ansible is not Python. YAML sucks for coding. Playbooks are not for programming. Ansible users are (most probably) not programmers.

These aphorisms are at the heart of why applying guidelines for a programming language to good Ansible automation content didn't seem right to me. As I said, it would give the wrong impression and would reinforce a mindset we don't recommend -- that Ansible is a programming language for coding your playbooks. 

These aphorisms are all saying the same thing in different ways -- certainly the first 3. If you're trying to "write code" in your plays and roles, you're setting yourself up for failure. Ansible's YAML-based playbooks were never meant to be for programming.

So it bothers me when I see Python-isms bleeding into what Ansible users see and do. It may be natural and make sense if you write code in Python, but most Ansible users are not Pythonistas. So, it can be challenging and confusing when these isms are incorporated, thereby introducing friction that degrades their user experience and the value that Ansible provides. 

By Ansible not being a programming language, all parts of your organization can contribute to automating your entire IT stack rather than relying on skill programmers to understand your operations to write and maintain code for it.

If you are a programmer creating Ansible modules and plugins, assume you are not the target audience for what you are developing and your target audience won't have the same skills and resources you possess.

Clear is better than cluttered. Concise is better than verbose. Simple is better than complex. Readability counts.

These are really just interpretations of aphorisms in "The Zen of Python". The last one is taken directly from it because you can't improve on perfection.

In the original Ansible best practices talk, we recommended users optimize for readability. This holds true even more so today. If done properly, your content can be the documentation of your workflow automation. Take the time to make your automation as clear and concise as possible. Iterate over what you create and always look for opportunities to simplify and clarify.

These aphorisms don't just apply to those writing playbooks and creating roles. If you are a module developer, think about how your work can assist users, be clear and concise, do things simply and just get things done.

Helping users get things done matters most. User experience beats ideological purity.

Whether you are creating modules, plugins and collections or writing playbooks or designing a cross domain hybrid automation workflow -- Ansible is for helping you get things done. Always consider and look to maximize the user experience. Don't get caught up and beholden to some strict interpretation of standards or ideological purity that shifts the burden on the end user. 

"Magic" conquers the manual.

Arthur C. Clarke wrote, "Any sufficiently advanced technology is indistinguishable from magic."

The "magic" in Ansible is its playbook engine and module system. It is how Ansible provides powerful and flexible capabilities in a straightforward and accessible way by abstracting users from all of the complex implementation details that lie beneath. This frees users from doing time consuming and error prone manual operations or writing brittle one-off scripts and code, enabling them the time to put their valuable expertise to use where it is needed.

Design automation that amazes users can make difficult or tedious tasks easy and almost effortless. Look to provide powerful time saving capabilities that are quick to deploy and utilize them to get things done.

When giving users options, use convention over configuration.

I am a big proponent of convention over configuration and don't think it gets enough consideration in the Ansible community. Convention over configuration is a design paradigm that attempts to decrease the number of decisions that a developer is required to make without necessarily losing flexibility so they don't have to repeat themselves. It was popularized by Ruby on Rails.

A playbook developer utilizing your work should only need to specify unique and unconventional aspects of their automation tasks and workflows and no more. Look to reduce the number of decisions and implementation details a user needs to make. Take the time to handle the most common use cases for them. Look to provide as many sensible defaults with modules, plugins and roles as possible. Optimize for users to get things done quickly. 

Declarative is better than imperative -- most of the time.

This aphorism is particularly for Ansible Content Collection developers. Ansible is a desired state engine by design. Think declaratively first. If there truly is no way to design something declaratively, then use imperative (procedural) means. 

Declarative means that configuration is guaranteed by a set of facts instead of by a set of instructions, for example, "there should be 10 RHEL servers", rather than "depending on how many RHEL servers are running, start/stop servers until you have 10, and tell me if it worked or not". 

This aphorism is an example of the "user experience beats ideological purity" aphorism in practice. Rather than strictly adhering to a declarative approach to automation, Ansible incorporates declarative and imperative means. This mix offers you the flexibility to focus on what you need to do, rather than strictly adhere to one paradigm.

Focus avoids complexity. Complexity kills productivity.

Remember that complexity kills productivity. The Ansible team at Red Hat really means it and believes that. That's not just a marketing slogan. Automation can crush complexity and give you the one thing you can't get enough of ⎯ time. 

Follow Linux principles of doing one thing, and one thing well. Keep roles and playbooks focused on a specific purpose. Multiple simple ones are better than having a huge single playbook full of conditionals and "programming" that Ansible is not well suited for.

We strive to reduce complexity in how we've designed Ansible and encourage you to do the same. Strive for simplification in what you automate. 

If the implementation is hard to explain, it's a bad idea.

This aphorism, like "readability counts", is also taken directly from "The Zen of Python" because you cannot improve upon perfection.

In his essay on Literate Programming, Charles Knuth wrote, "Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to human beings what we want a computer to do." So it goes that if you cannot explain or document your implementation easily, then it's a bad idea that needs to be rethought or scrapped. If it is hard to explain, what chance do others have of understanding it, using it and debugging it? Kernighan's Law says "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it."

Ansible is designed for how real people think and work. Recall earlier when I said Ansible Playbooks are human readable automation with no special coding skills needed. Take advantage of that. Then, if you are having trouble explaining what you are trying to do, pause and re-consider your implementation and the process you are trying to automate. How can I make it easier to explain? Can my process be improved or streamlined? How can I simplify and clarify? Can I break it down into smaller more focused parts and iterate over this? 

This will help you identify a bad idea sooner and avoid the types of friction that will slow down you and your organization over time.

Every shell command and UI interaction is an opportunity to automate.

This aphorism comes from my personal experience talking about Ansible and automation for many years. Sometimes I am asked what they should automate. Other times, I am challenged that an automation tool like Ansible is unnecessary or does not apply to what they are doing. No matter if we were talking about RHEL, Windows, networking infrastructure, security, edge devices, or cloud services, my response has essentially been the same over the years. I have repeated it so often, that I have jokingly formulated the point into my own theorem on automation. So call it "Appnel's Theorem on Automation" if you will.

If you are wondering what should be automated, look for anything anyone is typing into a Linux shell and clicking through in a user interface. Then ask yourself "is this something that can be automated?" Then ask "what is the value of automating this?" Most Ansible modules wrap command line tools or use the same APIs behind UIs.

Given a sufficient number of things to automate is identified, start with those that cause the most pain and those that you can get done quickly. Remember you want to create a virtuous cycle of releasing reliability, feedback and building trust across your organization. Showing progress and business value quickly will help do that.

Just because something works, doesn't mean it can't be improved. Friction should be eliminated whenever possible.

This first aphorism just so happens to be a quote from the movie Black Panther, and it elegantly expresses some important wisdom when it comes to Ansible automation.

Always iterate and adapt to real world feedback from your operations. Optimize readability. Continue to find ways to simplify and reduce friction in your organization and its processes. As changes are introduced into your environments and IT policies over time, they will create new friction and pain points. They will also create new opportunities to apply your automation practices to eliminate them.

Automation is a journey that never ends.

Heraclitus, a Greek philosopher, said "change is the only constant in life. Nothing endures but change." 

Anyone who has been around the IT industry for any length of time knows there is constant change. This is why it is so vital to be agile and prepared to respond to ongoing change, innovation and business demands quickly and reliably. 

Automation is not a destination. It is a practice. It is a culture, a mindset and an attitude. Automation is a continuous process of feedback and learning and adapting to change and improving upon what you did before. 

Automation creates opportunities and we at Red Hat see opportunities for automation everywhere. 

So the question I pose to you is: Where will your automation journey lead you?

Further Reading

If you want to dive more deeply into the application of the zen of Ansible and its origins, I recommend these resources.

The Ansible Community of Practice (CoP) has assembled a comprehensive repository of "good practices" for Ansible content development. The Ansible Lint tool has now been added to the Red Hat Ansible Automation Platform and codifies many of these practices in rules and profiles to help you quickly identify and enforce consistent application to your work.

If you are interested in understanding more about "The Zen of Python", I recommend starting with Al Sweigart's explanation of those aphorisms.