Thoughts on Restructuring the Ansible Project

Thoughts on Restructuring the Ansible Project

Ansible became popular largely because we adopted some key principles early, and stuck to them.

The first key principle was simplicity: simple to install, simple to use, simple to find documentation and examples, simple to write playbooks, and simple to make contributions.

The second key principle was modularity: Ansible functionality could be easily extended by writing modules, and anyone could write a module and contribute it back to Ansible.

The third key principle was "batteries included": all of the modules for Ansible would be built-in, so you wouldn't have to figure out where to get them. They'd just be there.

We've come a long way by following these principles, and we intend to stick to them.

Recently though, we've been reevaluating how we might better structure Ansible to support these principles. We now find ourselves dealing with problems of scale that are becoming more challenging to solve. Jan-Piet Mens, who has continued to be a close friend to Ansible since our very earliest days, recently described those problems quite succinctly from his perspective as a long-time contributor -- and I think his analysis of the problems we face is quite accurate. Simply, we've become victims of our own success.

Success means growth, and growth means more users, more customers, more contributors, and more responsibilities -- which  bring increases in complexity. We've continued to build tools like Ansibot to help us manage that complexity, but as we continue towards hyperscale, even as we merge more and more community code, we're seeing more pull requests and issues fall through the cracks.

Consider the following visual representation of the evolution of contributions to the Ansible project:

visual representation of the evolution of contributions to the Ansible project

Most of our current challenges stem from increased complexity that our simple model was not built to handle. If we want to break through our current constraints, we're going to need to build a new organizational model to do it.

That's exactly what we've been working on -- and it's taking some time, because it's a complex set of challenges -- but we're getting there.

So let's discuss some of our key challenges.

First, there's the growing support challenge.

Originally, Ansible had a simple policy: if we shipped it, we supported it. In the very beginning, this policy made perfect sense; we had comparatively few modules, and we also had comparatively few customers. The Ansible support team knew all of the modules well enough to provide support for all of them, to anyone who was willing to pay for that support.

In truth, though, supporting modules ourselves can be a tricky proposition, and the larger we grow, the trickier it becomes. The majority of our modules are community maintained. We obviously know Ansible itself very well, but we don't know the community maintained modules as well as our contributors do. In some cases, we may not even have access to the underlying software or hardware with which the modules interface; in such cases we are completely reliant upon our community to keep the modules working.

Some of our community maintainers are exceptionally responsive. Some are less responsive. That's the nature of community developed software. But because all of the modules live in the same place, and are a part of our "batteries included" model, many people -- including paying customers -- don't realize that such a distinction exists.

It's unfair for us to place an enterprise support burden on volunteer contributors. It's also important that we're as clear as possible with our customers about what is fully supported as part of their subscription, and what is not.

Next, there's the lifecycle challenge.

As Ansible itself becomes more mature and used by more enterprise customers, the lifecycle of Ansible is slowing down. Even until fairly recently, we would cut a major release of Ansible every four months, but our most recent release cycle was eight months, and that slower release cycle will become the rule.

This is a challenge because it means that over time it will take longer for new code to reach users. This will be especially constraining for our partners; under the current structure, they can only update their modules and plug-ins on our schedule. We've already received feedback from many partners that they want the ability to release their own modules and plugins independently of our release cycle, and as our release cycle continues to slow down, we expect these calls to grow louder.

Then, there's the challenge of the rising bar.

Everyone, both partners and community, want modules to be ever better: better written, better tested, more secure. With every release, we try to raise the quality bar.

For the upcoming Ansible 2.9 release, for instance, we expect soon to be asking contributors to provide basic integration tests for every module.

That rising bar comes with its own challenges. How do we handle contributions that have previously been good enough, but no longer meet the new standards? How do we deal with contributors who are not necessarily able or willing to do the work necessary to reach these standards? What do we do about existing modules that don't keep up with our rising quality standard -- do we mark them in some way, or do we kick them out of Ansible entirely, even if they're relatively stable modules that a lot of people depend upon? We continue to grapple with these questions.

Which brings us to the new module contributor challenge.

graph of survival curves for Ansible PRs in the past year

the average merge time for PRs. New Modules (blue) vs everything else (Red). Notice that over the past year, on average 80% of non-new-module PRs are merged within 22.4 days.

As the quality bar goes up, our ability to bring new contributors onboard goes down -- or at least slows down. It just takes contributors more time and effort to get their new modules accepted than it once did.

It's comparatively easy to bring in PRs to extant modules, because those modules generally have maintainers that have earned our collective trust. Our PR merge numbers for extant modules are actually quite good (we can always improve, of course).

But new modules require a higher degree of vetting, because we're not only vetting the code, we're also implicitly vetting the contributors of that code for their interest and ability to maintain that code.

Given our current structure, this is an unfortunate but necessary barrier. Our support challenge makes us more reluctant to merge new modules without strong assurances that the maintainer will be willing and able to maintain those modules to increasingly stringent requirements.

At the heart of all of these challenges is the fact that we've got one code base that's supporting two categories of participants that have different primary interests.

Enterprise users and partners need, more than anything, a stable and well supported platform that they can trust to automate their IT infrastructure.

But our community users and contributors need something else, and that's what Ansible has always delivered in the past: an easy way to install Ansible, and easy ways to contribute to Ansible.

To those of us who lived through the old days at Red Hat, these problems are eerily similar to the problems we experienced around the original Red Hat Linux product -- problems that led to the creation of the Fedora Project and Red Hat Enterprise Linux. Our problems aren't identical, but similar.

Which is why we believe that the solutions should be not identical, but similar.

So let's talk about our proposal to solve some of these challenges.

From a development perspective, Ansible would be broken out into different components:

  • The core engine, which would essentially be the platform to run everything else. Keeping this engine stable, more secure, and well-tested will be critically important for everyone. The Core Team would be responsible for maintaining this engine. Community contribution policies would be the same as present policies.

  • The core modules and plugins, which are the modules and plugins that the Ansible team would support directly. These would be the most used modules and plugins (think template, copy, lineinfile, and so on.) Community contribution policies would be the same as present policies, though no new modules would be introduced.

  • The community modules and plugins, which would be where most non-core modules and plugins would live. Community contribution policies would be relaxed to some degree, to help onboard new content and new contributors, but we would still maintain a bar of quality to help ensure that community content would be functional, documented, and usable. The separate structure would allow the community to be much more effectively involved in the curation process.

  • Various supported partner modules and plugins, which would be broken out and managed more directly by partners. Community contribution policies would be up to the discretion of the individual partners.

All of these different components would be built in the form of Ansible Content Collections, which we first introduced in Ansible 2.8.

From a deployment perspective, Ansible would be delivered in one of two fundamental ways:

  • A batteries-included method, which would be very similar to how Ansible is delivered currently: a bundling of the core engine, all of the core modules and plugins, all of the community modules and plugins, and select partner modules and plugins, all via collections. There would be no official Red Hat support offered for this method.

  • A supported enterprise method, which would be only the fully supported subset of that content: the core modules and plugins, and select partner modules and plugins, all via collections. This would be the method that would be supported by Red Hat as part of the Red Hat Ansible Automation product. Customers would retain the ability to install and use any additional content at their discretion, but the separation between Red Hat supported content and non-Red Hat supported content would be much more explicit.

Both of these methods would depend heavily on Ansible Galaxy as the de facto delivery mechanism, which we would plan to improve substantially to handle the increased traffic load.

Some may note that there are similarities between this new proposed structure and the Ansible Extras structure that we moved to, and then moved away from, a few years back. It's true; there are definite similarities, and many of the advantages and potential disadvantages are the same. It's our hope, and intention, to learn the lessons from that previous attempt to gain the advantages while also mitigating the potential disadvantages.

We believe that these structural changes will help Ansible keep our strong community focus, while also providing the structure necessary to support our growing base of partners and customers. We recognize that these are significant changes, which is why we plan to move very carefully towards them. We want to make sure that we understand the implications of these changes before we make them. None of these changes are imminent, but we believe that we've come to a point at which we are prepared to discuss the possibilities.

There are many questions yet to be answered: infrastructure questions, licensing questions, release policy questions, and others. We will be discussing some of those questions in an upcoming webinar. 

We will also be digging deeply into these questions at our community contributor conference at AnsibleFest Atlanta in September. We hope to see our contributors there in person, but we strive for full remote participation as well, as always. Please join us however you can.

In the early days of Ansible, we could only have dreamt of this kind of success. In our seven years of existence, we have built one of the top open source projects in the world, with a dedicated community pushing us and supporting us from the very beginning. Had we imagined the kinds of challenges we face today, we would surely have put them in the category of "good problems to have."

But "good problems" are still problems, and if we fail to solve them, they won't stay "good problems" for long. It's time for us to take the next step, so that we can continue to be a reliable partner for all of our users, customers, and contributors. Without all of you, we would never have made it nearly so far.




Ansible and ServiceNow Part 2

Parsing facts from network devices using PyATS/Genie

This blog is part two in a series covering how Red Hat Ansible Automation can integrate with ticket automation. This time we'll cover dynamically adding a set of network facts from your switches and routers and into your ServiceNow tickets.

Suppose there was a certain network operating system software version that contained an issue you knew was always causing problems and making your uptime SLA suffer. How could you convince your management to finance an upgrade project? How could you justify to them that the fix would be well worth the cost? Better yet, how would you even know?

A great start would be having metrics that you could track. The ability to data mine against your tickets would prove just how many tickets were involved with hardware running that buggy software version. In this blog, I'll show you how to automate adding a set of facts to all of your tickets going forward. Indisputable facts can then be pulled directly from the device with no chance of mistakes or accidentally being overlooked and not created.

This blog post will demonstrate returning structured data in JSON using Ansible in conjunction with Cisco pyATS and Cisco Genie. This allows us to retrieve the output from operational show commands and convert them in any format we want, in this case pushing them into ServiceNow.

There are many ways to parse facts from network devices with Ansible. The following blog example could also all be done via the open source Network Engine Ansible Role, but for this example we are using Cisco's sponsored pyATS/Genie implementation to parse the following show version command. As you can see this is not very friendly to programmatically interact with:

image7

Step 1: Create a Python3 virtual environment in Red Hat Ansible Tower

With the release of Ansible Tower 3.5, we can now use Python 3 virtual environments (virtualenv) for added playbook flexibility and compatibility across Python versions. This is great news because Python3 is required to use the pyATS and Genie packages. We need to create a new (virtualenv) that is running Python3 and install all of the dependencies.

su -
yum -y install rh-python36
yum -y install python36-devel gcc
scl enable rh-python36 bash
python3.6 -m venv /var/lib/awx/venv/pyats-sandbox
source /var/lib/awx/venv/pyats-sandbox/bin/activate
umask 0022
pip install pyats genie python-memcached psutil pysnow paramiko
pip install -U "ansible == 2.8

Once a custom virtualenv is created a new field appears in the Job Templates section in Ansible Tower. You can select your newly created venv from the following dropdown menu:image1-6

Cisco has released two Python3 packages that are very useful for network automation - pyATS, and Genie. The first one, pyATS, functions as a python framework while Genie builds on top of it. Genie can be used to parse, learn, and diff. Implementing Genie is accomplished by installing and calling the Galaxy role in our playbook named parse_genie.

Step 2: Create a requirements.yml file in your roles directory

roles/requirements.yml

---
- name: parse_genie
  src: https://github.com/clay584/parse_genie
  scm: git
  version: master

By default, Ansible Tower has a system-wide setting that allows roles to be dynamically downloaded via a requirements.yml file in your Git repo. So there is no need to run the ansible-galaxy install -r roles/requirements.yml command like you might do if using Ansible Engine on the CLI.

For more information about Projects in Ansible Tower, refer to the documentation.

Step 3: Call the parse_genie Ansible Role

Now that you have a Python 3 virtualenv in Tower and a roles/requirements.yml file, you can write and test a playbook. In the first play of the playbook, define the name, hosts identified for Ansible to run against, the connection plugin and disabling gather_facts for network devices. Next, create a roles: section and invoke the parse_genie role:

---
- name: parser example
  hosts: ios
  gather_facts: no
  connection: network_cli
  roles:
    - parse_genie

Then create the tasks: section and add a show version task. This will execute the show version command via the ios_command module, then store the output to a variable named version.

tasks:
- name: show version
  ios_command:
    commands:
      - show version
    register: version

The next tasks will apply the parse_genie filter plugin to create structured data out of the show version command we executed. As well as set the structured data as a fact and debug it.

- name: Set Fact Genie Filter
  set_fact:
    pyats_version: "{{ version['stdout'][0] | parse_genie(command='show version', os='ios') }}"

- name: Debug Genie Filter
  debug:
    var: pyats_version

Step 4: Run the Ansible Playbook

At this point the playbook is largely complete and you can execute and then test it.

---
- name: parser example
  hosts: ios
  gather_facts: no
  connection: network_cli
  roles:
    - parse_genie

tasks:
- name: show version
  ios_command:
    commands:
      - show version
  register: version

- name: Set Fact Genie Filter
  set_fact:
    pyats_version: "{{ version['stdout'][0] | parse_genie(command='show version', os='ios') }}"

- name: Debug Genie Filter
  debug:
    var: pyats_version

The parser takes the command output and creates a structured data in JSON format. The facts that you want to use later in your playbook, are now easily accessible.

Step 5: Validate the Ansible Playbook run

After running the playbook (we did it via Ansible Tower), the following is the debug Genie Filter Task from playbook run:

image6-2

The full output:

TASK [Debug Genie Filter] ******************************************************

ok: [192.168.161.9] => {
    "msg": {
        "version": {
            "chassis": "WS-C3550-24",
            "chassis_sn": "CAT0651Z1E8",
            "curr_config_register": "0x10F",
            "hostname": "nco-rtr-9",
            "image_id": "C3550-IPSERVICESK9-M",
            "image_type": "developer image",
            "last_reload_reason": "warm-reset",
            "main_mem": "65526",
            "number_of_intfs": {
                "FastEthernet": "24",
                "Gigabit Ethernet": "2"
            },
            "os": "C3550 boot loader",
            "platform": "C3550",
            "processor_type": "PowerPC",
            "rom": "Bootstrap program is C3550 boot loader",
            "rtr_type": "WS-C3550-24",
            "system_image": "flash:c3550-ipservicesk9-mz.122-44.SE3/c3550-ipservicesk9-mz.122-44.SE3.bin",
            "uptime": "44 minutes",
            "version": "12.2(44)SE3",
            "version_short": "12.2"
        }
       }
}

Step 6: Integrate parsed content into ServiceNow tickets

What I would like to do now is add some new fields in the ServiceNow incident layout. Let's add the version, uptime, hostname, platform, device type, serial number, and last reload reason facts to every incident ticket Ansible creates.

In the ServiceNow Web dashboard, add these new fields in Configure > Form Layout.

image2-6

Now when you run your playbook from part one of this blog with the table parameter set as incident. When you debug the incident.record dictionary it should now have the new fields you just created, such as u_device_up_time, u_ios_version, etc.

Snippet of the record dictionary the ServiceNow API sends back:

image4-3

We can use these new fields in the data section of our Ansible Playbook that uses the snow_record module. The following is the complete playbook that runs the show version command, parses the output and adds the parameters into the new fields:

---
- name: create ticket with notes
  hosts: ios
  gather_facts: no
  connection: network_cli
  roles:
    - parse_genie

  tasks:
  - name: include vars
    include_vars: incident_vars.yml

  - name: show version
    ios_command:
      commands:
        - show version
    register: version

  - name: Set Fact Genie Filter
    set_fact:
      pyats_version: "{{ version['stdout'][0] | parse_genie(command='show version', os='ios') }}"

# Example 1 showing version information
  - name: Debug Pyats facts
    debug:
      var: pyats_version.version.version

# Example 2 showing uptime
  - name: Debug Pyats facts
    debug:
      var: pyats_version.version.uptime

  - name: Create an incident
    snow_record:
      state: present
      table: incident
      username: "{{ sn_username }}"
      password: "{{ sn_password }}"
      instance: "{{ sn_instance }}"
      data:
        priority: "{{ sn_priority}}"
        u_device_up_time: "{{ pyats_version.version.uptime }}"
        u_ios_version: "{{ pyats_version.version.version }}"
        u_hostname: "{{ pyats_version.version.hostname }}"
        u_platform: "{{ pyats_version.version.platform }}"
        u_device_type: "{{ pyats_version.version.rtr_type }}"
        u_serial_number: "{{ pyats_version.version.chassis_sn }}"
        u_last_reload_reason: "{{ pyats_version.version.last_reload_reason }}"
        short_description: "This ticket was created by Ansible"

  - debug: var=new_incident.record.number

Two additional debug examples are provided above to show how to work with the pyATS dictionary that was returned. With structured output it is much easier to grab the specific information you want using the key (e.g. pyats_version.version.uptime is the key that returns the value for the uptime of the system). The full dictionary is provided above in step 5.

The following screenshot is the output of the playbook shown from Red Hat Ansible Tower:

image3-3

The new fields are now populated in our ServiceNow incident ticket:

image5 copy

During an outage things can become chaotic. We have all seen how on certain days in the network field, tickets can become a very low priority. Automating the creation and dynamic facts solves this and allows engineers to remain focused on the outage.

Final thoughts

Something like this may help your organization adopt automation in steps. These Ansible Playbooks are low risk because they do not modify any configurations, they are read-only. This might be a great first step for network engineers, without having to be doing holistic automation or even config management. You may consider replacing the ios entry in the filter plugin to use ansible_network_os variable that was introduced with the network_cli connection plugin. That way you could run against nxos, ios, junos, etc. all in the same inventory and playbook run. In this blog we left it as ios so it could be easier to grasp if this is your first time seeing it.

Stay tuned for part 3 of this series - we will cover integration from ServiceNow to Ansible Tower's API. Where you can automatically have ServiceNow execute Ansible Playbooks.




The Song Remains The Same

The Song Remains The Same

Now that Red Hat is a part of IBM, some people may wonder about the future of the Ansible project.

Here is the good news: the Ansible community strategy has not changed.

As always, we want to make it as easy as possible to work with any projects and communities who want to work with Ansible. With the resources of IBM behind us, we plan to accelerate these efforts. We want to do more integrations with more open source communities and more technologies.

One of the reasons we are excited for the merger is that IBM understands the importance of a broad and diverse community. Search for "Ansible plus open source project" and you can find Ansible information, such as playbooks and modules and blog posts and videos and slide decks, intended to make working with that project easier. We have thousands of people attending Ansible meetups and events all over the world. We have millions of downloads. We have had this momentum because we provide users flexibility and freedom. IBM is committed to our independence as a community so that we can continue this work.

We've worked hard to be good open source citizens. We value the trust that we've built with our users and our contributors, and we intend to continue to live up to the trust that our community has placed in us. IBM is committed to the same ideals and will be supportive of our ongoing efforts to build a strong, diverse community. The song remains the same.

If you have questions or would like to learn more about the IBM acquisition, we encourage you to review the list of materials below. Red Hat CTO Chris Wright will host an online Q&A session July 23 in the coming days where you can ask questions you may have about what the acquisition means for Red Hat and our involvement in open source communities. Details will be announced on the Red Hat blog.

Additional resources:




Configure Network Cards by PCI Address with Ansible Facts

Configure Network Cards by PCI Address with Ansible Facts

In this post, you will learn advanced applications of Ansible facts to configure Linux networking. Instead of hard-coding device names, you will find out how to specify network devices by PCI addresses. This prepares your configuration to work on different Red Hat Enterprise Linux releases with different network naming schemes.

Red Hat Enterprise Linux System Roles

The RHEL System Roles provide a uniform configuration interface across multiple RHEL releases. However, the names of network devices in modern Linux distributions can often not be stable for various releases. In the past, the kernel named the devices after their order of appearance. The first device got the name eth0, the next eth1, and so on.

To make the device names more reliable, developers introduced other methods. This interferes with creating a release-independent network configuration based on interface names. An initial solution to this problem is to address network cards by MAC address. But this will require an up-to-date inventory with MAC addresses of all network cards. Also, it requires updating the inventory after replacing broken hardware. This results in extra work. To avoid this effort, it would be great to be able to specify network cards by their PCI address. With a uniform hardware setup (same model, same slot, same motherboard), the PCI address should be stable. This is because it defines the PCI bus, device and function.

Ansible facts

Ansible facts already expose the PCI address for network cards as pciid. The following playbook shows how to obtain the PCI address for the network card enp0s31f6:

---
- hosts: localhost
  vars:
    nic: enp0s31f6
  tasks:
    - name: Show PCI address (pciid) for a network card
      debug:
        msg: "The PCI address for {{ nic }} is {{ ansible_facts[nic]['pciid'] }}."

When running the playbook, it shows that the PCI address in this case is 0000:00:1f.6:

ansible-playbook show_pciid.yml
[...]

TASK [Show PCI address (pciid) for a network card] **************************
ok: [localhost] => {
    "msg": "The PCI address for enp0s31f6 is 0000:00:1f.6."
}

[...]

Transforming the facts

Selecting a network card by PCI address is not always straightforward. Ansible facts can't query devices by their attributes directly. Luckily, there are filters in Ansible that make it possible to reorganize the facts. From them, the json_query filter allows users to reorganize and filter data using the JMESPath query language for JSON. To be able to use it, you might need to install the python2-jmespath or python3-jmespath package. Ansible uses a dictionary with the device names as keys to organize the network facts. But we need the key to be the PCI address. To do this, we will use a JMESPath expression that extracts all values of the Ansible facts dictionary (@.*) and then selects only the values that contain a pciid key ([?pciid]). Then we will use the expression {key: pciid, value: device} to create a new dictionary with an item named key for the PCI ID and one named value for the interface name. This structure allows us to use the items2dict filter (introduced in Ansible 2.7) to build the final dictionary.

Example

The following playbook shows how to create the dictionary device_by_pci_address this way. It will contain a mapping from PCI address to device name:

---
- hosts: localhost
  vars:
    pci_address: "0000:00:1f.6"
    device_by_pci_address: "{{
        ansible_facts | json_query('@.* | [?pciid].{key: pciid, value: device}') | items2dict
    }}"

The following tasks shows the structure of this dictionary and how to use it:

tasks:
  - name: Show devices by PCI address
    debug:
      var: device_by_pci_address
  - name: "Show device with PCI address {{ pci_address }}"
    debug:
      msg: "The device {{ device_by_pci_address[pci_address] }} is at the
         PCI address {{ pci_address }}"

When running these tasks, Ansible outputs the following:

TASK [Show devices by PCI address] *****************************************
ok: [localhost] => {
    "device_by_pci_address": {
        "0000:00:1f.6": "enp0s31f6",
        "0000:3a:00.0": "wlp58s0",
        "6-1:1.0": "enp8s0u1"
    }
}

TASK [Show device with PCI address 0000:00:1f.6] ***************************
ok: [localhost] => {
    "msg": "The device enp0s31f6 is at the PCI address 0000:00:1f.6"
}

If you look carefully, you will notice one device has a different PCI address format (6-1:1.0). This is actually a USB device. On virtual machines you might encounter other types of addresses. Virtio devices have addresses like virtio0, virtio1 and so on. Using the device name in the configuration makes it still specific for certain releases. With a small change it is also possible to look up MAC addresses:

---
- hosts: localhost
  vars:
    pci_address: "0000:00:1f.6"
    macaddress_by_pci_address: "{{
        ansible_facts | json_query('@.* | [?pciid].{key: pciid, value: macaddress}') | items2dict
    }}"

[...]

Note that we changed value: device to value: macaddress here.

Combining with the network role

To put this all together, here is an example about how to use these variables with the Network RHEL System Role:

---
- hosts: localhost
  vars:
    pciid: "0000:00:1f.6"
    macaddress_by_pci_address: "{{
        ansible_facts | json_query('@.* | [?pciid].{key: pciid, value: macaddress}') | items2dict
    }}"
    network_connections:
      - name: internal_network
        mac: "{{ macaddress_by_pci_address[pciid] }}"
        type: ethernet
        state: up
        ip:
          address:
            - 192.0.2.73/31

  tasks:
    - name: Import network role
      import_role:
        name: rhel-system-roles.network

This will configure the connection profile internal_network. It limits the profile to the device at the PCI address 0000:00:1f.6 using the device's MAC address.

Outlook

Since the on-disk configuration still uses the MAC address, changing a network card will require to run the playbook again. To avoid this, NetworkManager would need to allow specifying the PCI address in the configuration directly. I filed an RFE proposal for NetworkManager to support this in the future. Depending on the installed version of the Jinja2 templating engine, the dict() constructor allows to create the dictionary without items2dict:

vars:
  macaddress_by_pci_addresss: "{{
      dict(ansible_facts | json_query('@.* | [?pciid].[pciid, macaddress]'))
  }}"

This works on RHEL 8 and recent versions of Fedora now. But, RHEL 7 does not support it, yet.

Conclusion

In this post, we've learned about network interface naming in modern versions of Linux. The ability to identify the PCI address for network cards becomes useful in larger environments to maintain consistency. Being able to transform facts in Ansible Automation allows for many possibilities, including using facts to identify which device to configure when used with RHEL System Roles or any other role for that matter.

If you are interested in learning more about certified networking modules approved by the Ansible community and Red Hat, check out [nsible Automation Certified Content today! Or, you can learn more about Ansible network automation solutions. 




Ansible and ServiceNow Part 1, Opening and Closing Tickets

Ansible and ServiceNow Part 1, Opening and Closing Tickets

As a Network Engineer, I hated filling out tickets. Anytime a router would reboot or a power outage took place at a remote site, the resulting ticket generation took up about 50% of my day. If there had been a way to automate ticket creation, I would have saved a lot of time. The only thing unique to provide would have been case-specific comment sections needing additional information about the issue.

While ticket creation was a vital activity, automating this was not an option at the time. This is surprising because my management was always asking me to include more information in my tickets. Tickets were often reviewed months later and sometimes never got created or did not have much relevant information included.

Fast forward to today, companies are now data mining from tickets with a standard set of facts that are pulled directly from the device during ticket creation, such as network platform, software version, uptime, etc.  Network operations (NetOps) teams now use massive amounts of ticket data to make budget decisions.

For example, if there were 400 network outages due to power issues, NetOps could then make a case to spend \$40,000 on battery backups, having proved that it would prevent around 400 outages a year. Having access to these metrics is extremely valuable for making informed business decisions.

This first blog in the series covers how Ansible automates change requests from ServiceNow, a popular cloud-based SaaS provider. For convenience, ServiceNow provides developers a test instance to use Ansible Playbooks, which is utilized for this and future blog posts. You can sign up for your own free developers instance at the ServiceNow Developer portal.

Creating a ServiceNow ticket

The Ansible distribution includes the snow_record module that makes it easy to open and close ServiceNow tickets. The pysnow Python library will first need to be installed to use this module.

The next requirement is getting the username, password and instance for authentication to your recently created developer cloud based ServiceNow instance.

NOTE: the instance should look something like this instance: dev99999 not the full URL

instance:_http://dev99999.service-now.com as shown below in change_request_vars.yml:

---
#snow_record variables

sn_username: admin
sn_password: my_password
sn_instance: dev99999

#data variables

sn_severity: 2
sn_priority: 2

The following is the Ansible Playbook to create a ServiceNow ticket:

---
- name: Create ticket with notes
  hosts: localhost
  gather_facts: no
  connection: local

  tasks:
  - name: include vars
    include_vars: change_request_vars.yml

  - name: Create a change request
    snow_record:
      state: present
      table: change_request
      username: "{{ sn_username }}"
      password: "{{ sn_password }}"
      instance: "{{ sn_instance }}"
      data:
        severity: "{{ sn_severity }}"
        priority: "{{ sn_priority }}"
        short_description: "This is a test opened by Ansible"
    register: new_incident

  - debug:
      var: new_incident.record

Leveraging the ServiceNow API

The table parameter determines what type of ticket will be opened. A great way to determine the other parameters available is to view the JSON dictionary the ServiceNow API sends back after you have created your ticket. I am using register to give a variable name to that dictionary and then using debug to view it in the terminal. The following is just a portion of the full dictionary for the sake of brevity:

blog_leverage-servicenow-api

This is very handy in spelling out the parameters you can add under the data section of your task. If you want to see just one parameter of the dictionary, for example the ticket number, you can simply modify your debug to look like the following:

- debug: var=new_incident.record.number

This variable (var) is defined as pulling from the stored register new_change_request to then show the dictionary named record and the parameter of that dictionary called number.

blog_leverage-servicenow-api-2

You could do the same thing with any parameter of the record dictionary such as close_code, state, comments, and many others.

Validating changes in ServiceNow web portal

Next, log into your developers instance of ServiceNow and view the Change->all section in the left menu bar. You should see your change request in the list.

blog_servicenow-screen

Notice that the short description has been filled out by our Ansible Playbook task: This is a test opened by Ansible as well as the priority 2 - High.

blog_servicenow-screen-2

Closing a ServiceNow ticket

Now that we've demonstrated the opening of ServiceNow tickets, we should demonstrate closing or resolving the ticket as well. This is done by specifying the state parameter in another Ansible task. This is where it can get tricky because state is a parameter of the record dictionary as well as a parameter of the snow_record module. Please be mindful of this multi-purpose parameter used in Ansible.

The following is a snippet from the record dictionary when we created our ticket:

blog_closing-servicenow-ticket

Notice the original state was -5. The Ansible task below will change it to -3, which results in a ticket state changing from New to Authorize.

---
  - name: Modify a change request
    snow_record:
      state: present
      table: change_request
      username: "{{ sn_username }}"
      password: "{{ sn_password }}"
      instance: "{{ sn_instance }}"
      number: CHG0030002
      data:
        state: -3
    register: incident

  - debug: 
      var: incident.record.state

In ServiceNow a change_request needs to be walked through a few different states before it can be closed. The numeric values for the different states can be found in the ServiceNow documentation. I recommend you have five separate Ansible tasks that each change the state in this order: -3, -2, -1, 0, 3. Please note that these values are for the ServiceNow Kingston release and that other releases may use different state numbers. Your organization may have other steps required along the way, but hopefully this article was enough to get you started. At this point you've learned how to open tickets, and close tickets with specific labels via Ansible Playbooks.

Stay tuned for part 2 - I'll describe adding a set of parsed facts to your tickets.




Using Infoblox as a dynamic inventory in Red Hat Ansible Tower

Using Infoblox as a dynamic inventory in Red Hat Ansible Tower

Do you still use spreadsheets to keep track of all your device inventory? Do you have Infoblox Appliances deployed in your infrastructure? Do you want to start automating without the burden of maintaining a static register of devices? If you answered yes to any of these questions, this blog is for you.

Operations teams often struggle to keep their Configuration Management Databases (CMDBs) up-to-date, primarily because they were not involved in the specification process to share what pieces of information are relevant to them, or even if they were, once it is put in place:

Teams are not allowed to change any of their Configuration Items (CI) because they have only read-only access!

The reality is that a lot of the time when we talk about a CMDB, we are talking about tables in a database without any version control mechanism, therefore only read access is provided to end users.

The impact is that in order to perform lifecycle management (Create/Update/Decommission) of their configuration items, teams must go through a fastidious and manual process until they give up changing CIs (Configuration Items) in the CMDB and just leave everything as it is. What happens next? Different teams start to rely on their own CMDBs (A.K.A spreadsheets), to track subnets, IP allocations, DNS records, Zones, Views, etc. What's the end result? End users request their machines and still need to wait at least a week before someone from the NetOps team consults their own CMDB (yes, the spreadsheet) to provide them DNS records and IP addresses.

Dynamic Inventory

Dynamic Inventory is one of the most powerful features in Red Hat Ansible Tower. Dynamic Inventory allows Ansible to query external systems and use the response data to construct its inventory. Red Hat Ansible Tower provides some out-of-the-box integrations through dynamic inventory scripts, and also allows users to extend these capabilities by providing their own custom dynamic inventory script.

Red Hat Ansible Tower and Infoblox

Let's take a look at the steps required to configure a custom dynamic inventory script to query Infoblox and rely on it as our inventory source of truth.

Install infoblox-client

First we need to install the infoblox-client python library in Red Hat Ansible Tower's venv of each node of the cluster, and the configuration file required by the infoblox inventory script:

# source /var/lib/awx/venv/awx/bin/activate
# pip install infoblox-client

NOTE: You could also create a playbook to do this, using the Ansible pip_module.

Create the infoblox configuration file in /etc/ansible/infoblox.yaml:

---
filters:
  extattrs: {}
  view: null

NOTE: Follow this Ansible GitHub Issue where I suggest taking configuration items from an environment variable or a file for added flexibility.

Credential Type

After the installation in the previous step completes successfully in all the nodes of the cluster, we need to specify in Ansible Tower the credential and hostname to establish communication with Infoblox Appliances. As of today we don't have any specific Ansible Tower Credential for Infoblox, so let's create a custom credential type. We can then provide the information required to communicate with Infoblox, have the password protected by Ansible Tower and RBAC (Role-Based Access Control).

As Administrator, go to Credential Types in the left menu.

Create a new credential type: INFOBLOX_INVENTORY (Green + sign)

Credential Types - Infoblox Inventory

screenshot

Define the inputs required in the INPUT CONFIGURATION field:

fields:
  - type: string
    id: hostname
    label: Hostname
  - type: string
    id: username
    label: Username
  - secret: true
    type: string
    id: password
    label: Password
required:
  - username
  - password

Define the injection of inputs as environment variables in INJECTOR CONFIGURATION field:

env:
  INFOBLOX_HOST: '{{ hostname }}'
  INFOBLOX_PASSWORD: '{{ password }}'
  INFOBLOX_USERNAME: '{{ username }}'
Credential

After the creation of the credential type INFOBLOX_INVENTORY in Ansible Tower, we can use it to create a new credential, specifying the information to communicate with the Infoblox Appliance.

Create a credential to communicate with Infoblox Appliance: infoblox-ip.ip.ip.ip

Create credential

NOTE: In the example, the name includes the IP or FQDN, so we can know what appliance this particular credential refers to.

Inventory Script

Creation of custom inventory script to query Infoblox Appliances and parse the output to the format expected by Ansible inventory.

Create a new custom inventory script: _infoblox-inventory-script.py

Get the infoblox.py from Ansible's GitHub and paste into the CUSTOM SCRIPT field:

Create inventory script

Inventory Source

Creation of inventory with the infoblox dynamic script as dynamic source and sync to populate our inventory with entries returned by Infoblox Appliance.

Go to Inventories and create a new Inventory: netops

Create inventory

Add Source referring to the infoblox-dynamic-script.py:

add source

Sync the Inventory Source:

sync inventory source

Check Sync Status:

check sync status

Inventory Entries

Verification if the hosts, groups and variables are being populated correctly in the inventory, based on existing entries in Infoblox Appliance:

Check host entries in inventory:  netops -> hosts

check host inventory

Check variables associate to a host entry: netops -> hosts -> rtr01.acme.com

check variables

host details

check inventories

At this point we have servers and routers in our dynamic inventory, therefore from now on we can execute any Ansible Playbooks against them.  In the next section we'll cover how the configurations looks like in the infoblox side.

Infoblox

At this point you may be wondering: How are these variables in Ansible Tower's Inventory specified in my Infoblox Appliance? The answer is that we are using Extensible Attributes in Infoblox to fulfill ansible_* variables, so they are automatically populated in Ansible Tower's inventory. Follow below some screenshots taken from Infoblox's WEBUI:

Extensible Attributes Configuration in Infoblox, for the variable "ansible_host":

Extensible Attributes Configuration in Infoblox

Why are we using Extensible Attributes?

The answer is simple. It is common to have entries in the DNS that refers to the production interface of the server or the service being provided, meanwhile the management access is only available via a dedicated out-of-band management interface. The ansible_host extra attribute defines that for this particular entry, Ansible shall use its value to establish communication with the server, via the management interface.

Additionally, we could rely on Extensible Attributes variable to specify if an entry is managed by Ansible Tower or not (Ex: ansible_managed: true/false), and update our "Dynamic Inventory Configuration File" accordingly, to use this particular attribute as a filter. The result is that Ansible Tower's inventory will only populate with entries that we want to automate (ansible_managed: true).




Summary of Authentication Methods For Red Hat Ansible Tower

Summary of Authentication Methods For Red Hat Ansible Tower

Red Hat Ansible Tower 3.4.0 has added token authentication as a new method for authentication so I wanted to use this post to summarize the numerous enterprise authentication methods and the best use case for each. Ansible Tower is designed for organizations to centralize and control their automation with a visual dashboard for out-of-the box control while providing a REST API to integrate with your other tooling on a deeper level. We support a number of authentication methods to make it easy to embed Ansible Tower into existing tools and processes to help ensure the right people can access Ansible Tower resources. For this blog post I will go over four of Ansible Tower's authentication methods: Session, Basic, OAuth2 Token, and Single Sign-on (SSO). For each method I will provide some quick examples and links to the relevant supporting documentation, so you can easily integrate Ansible Tower into your environment.

Session Authentication

Session authentication is what's used when logging in directly to Ansible Tower's API or UI. It is used when a user wants to remain logged in for a prolonged period of time, not just for that HTTP request, i.e. when browsing the UI or API in a browser like Chrome or Firefox. When a user logs in, a session cookie is created, which enables the user to remain logged in when navigating to different pages within Ansible Tower.

Blog-TAO-Login

How does it work?

Blog-TAO-API

Using the Curl tool, let's take a deeper look at what happens when you log in to Ansible Tower.

  1. GET to /api/login/ endpoint to grab the csrftoken cookie

    ```bash curl -k -c - https:///api/login/

    localhost FALSE / FALSE 0 csrftoken AswSFn5p1qQvaX4KoRZN6A5yer0Pq0VG2cXMTzZnzuhaY0L4tiidYqwf5PXZckuj ```

  2. POST to the /api/login/ endpoint with username, password, and X-CSRFToken=<token-value>

    bash curl -X POST -H 'Content-Type: application/x-www-form-urlencoded' \ --referer https://<tower-host>/api/login/ \ -H 'X-CSRFToken: K580zVVm0rWX8pmNylz5ygTPamgUJxifrdJY0UDtMMoOis5Q1UOxRmV9918BUBIN' \ --data 'username=root&password=reverse' \ --cookie 'csrftoken=K580zVVm0rWX8pmNylz5ygTPamgUJxifrdJY0UDtMMoOis5Q1UOxRmV9918BUBIN' \ https://<tower-host>/api/login/ -k -D - -o /dev/null

All of this is done by Ansible Tower when you log in to the UI or API in the browser, and should only be used when authenticating in the browser. For programmatic integration with Ansible Tower, you should use OAuth 2 tokens, not the process described above.

Note: The session expiration time can be changed by setting the SESSION_COOKIE_AGE setting.

Example with browsable API:

Blog-TAO-Cred-List

Basic Authentication

Basic Authentication is stateless, thus the base64 encoded `username` and password must be sent along with each request via the Authorization header.

Use Case: For API calls from curls, python scripts, or individual requests to the API.  OAuth2 Authentication is recommended for accessing the API when at all possible.  

Example with curl:

curl -X GET -H 'Authorization: Basic dXNlcjpwYXNzd29yZA==’
https://<tower-host>/api/v2/credentials -k -L

# the --user flag adds this Authorization header for us
curl -X GET --user 'user:password' https://<tower-host>/api/v2/credentials -k -L

For more information about the Basic HTTP Authentication scheme, see RFC 7617.

Note: Basic Auth can be disabled for security purposes, see the docs for more info.

OAuth 2 Token Authentication

OAuth (Open Authorization) is an open standard for token-based authentication and authorization. OAuth 2 authentication is commonly used when interacting with the Ansible Tower API programmatically. Like Basic Auth, an OAuth 2 token is supplied with each API request via the Authorization header. Unlike Basic Auth, OAuth 2 tokens have a configurable timeout and are scopable. Tokens have a configurable expiration time and can be easily revoked for one user or for the entire Ansible Tower system by an admin if needed. This can be done with the tower-manage revoke_oauth2_tokens management command. Here is more information on doing that. Additionally, the type of users able to create tokens can be limited to users created in Ansible Tower, as opposed to external users created from an SSO (see SSO section below). For more on how to do this see the note in these docs.

Different methods for obtaining OAuth 2 Access Tokens in Ansible Tower:

  • Personal access tokens (PAT)
  • Application Token: Password grant type
  • Application Token: Implicit grant type
  • Application Token: Authorization Code grant type

First, a user needs to create an OAuth 2 Access Token in the API, or in their User's Token tab in the UI. For the purposes of this article, we will use the personal access token method (PAT) for creating a token. Upon token creation, the user can set the scope. The expiration time of the token can be configured system-wide as well.

Below is an example of creating a PAT in the UI:\  Blog-TAO-Token

Token authentication is best used for any programmatic use of Ansible Tower's API, such as Python scripts or tools like curl. See the example for a personal access token (PAT) below:

Curl Example

First, create an OAuth 2 token without an associated Application; in other words, a personal access token. In this example, we will do so through the API with curl.

curl -u user:password -k -X POST https://<tower-host>/api/v2/tokens/

You can now use that token to perform a GET request for an Ansible Tower resource, e.g., Hosts.

curl -k -X POST \
    -H “Content-Type: application/json”
    -H “Authorization: Bearer <oauth2-token-value>” \
    https://<tower-host>/api/v2/hosts/

Similarly, a job can be launched by making a POST to the job template that you want to launch.

curl -k -X POST \
    -H "Authorization: Bearer <oauth2-token-value>" \
    -H "Content-Type: application/json" \
    --data '{"limit" : "ansible"}' \
    https://<tower>/api/v2/job_templates/14/launch/

Python Example

Tower-CLI is an open source tool that makes it easy to use HTTP requests to access Ansible Tower's API. You can have Tower-CLI authenticate to Tower using your OAuth 2 token by setting it in tower-cli config, or have it acquire a PAT on your behalf by using the tower-cli login command. It is easy to use and I would recommend checking it out:

pip install ansible-tower-cli

tower-cli config tower
tower-cli login

For more information on how to use OAuth 2 in Ansible Tower in the context of integrating external applications, check out these docs.

If you need to write custom requests, you can write a Python script using the Python library requests. Here is an example.

import requests

oauth2_token_value = 'y1Q8ye4hPvT61aQq63Da6N1C25jiA'   # your token value from Tower
url = 'https://<tower-host>/api/v2/users/'
payload = {}
headers = {'Authorization': 'Bearer ' + oauth2_token_value,}

# makes request to Tower user endpoint
response = requests.request('GET', url, headers=headers, data=payload,
allow_redirects=False, verify=False)

# prints json returned from Tower with formatting
print(json.dumps(response.json(), indent=4, sort_keys=True))

SSO Authentication

Single sign-on (SSO) authentication methods are fundamentally different because the authentication of the user happens external to Ansible Tower. For example, with GitHub SSO GitHub is the single source of truth, which verifies your identity based on the username and password you gave Tower.

Once you have configured an SSO method in Ansible Tower, a button for that SSO will be present on the login screen. If you click that button, it will redirect you to the Identity Provider, in this case GitHub, where you will present your credentials. If the Identity Provider verifies you successfully, then Ansible Tower will make a user linked to your GitHub user (if this is your first time logging in via this SSO method), and log you in.

  • LDAP - a directory of identities external to Ansible Tower that can be used to check authentication credentials against. Active Directory can be configured via the LDAP SSO in Ansible Tower.
  • SAML - allows Ansible Tower users to authenticate via a single sign-on authentication service, so that authentication is consistent for the user across multiple services used by their team. SAML is particularly useful for maintaining permission groups across services.
  • GitHub - allows Ansible Tower users to authenticate with their GitHub credentials if they are in the Github Organization, Team or User that the system admin specified in `/api/v2/settings/authentication/`. Ansible Tower uses OAuth 2 to verify the user's credentials with GitHub.
  • Azure Active Directory - allows Ansible Tower users to authenticate with the Azure credentials. Ansible Tower uses OAuth 2 to authenticate to Azure to verify your credentials and obtain user group data.
  • RADIUS - is an authentication protocol generally used for network devices. It can minimize network traffic for authentication, as it is lightweight.
  • Google OAuth - allows Ansible Tower users to authenticate with their Google Cloud. Ansible Tower authenticates to Google using the OAuth 2 protocol to check your username and password credentials against the identities in your Google organization.

Which Authentication is right for me?

I've shown you four types of authentication you can use in Ansible Tower. Each method has pros and cons and lends itself to certain use cases.

  • Session Authentication (logging in to the UI or browsable API): I am using Ansible Tower to manually create resources (inventory, project, job template) and launch jobs in the browser.
  • Basic Authentication:  I am troubleshooting Ansible Tower with curl, HTTPie, or another similar tool and have not yet set up an OAuth 2 Token for my user
  • OAuth 2 Token Authentication
    • Authorization Code Flow -I am a user of an application interfacing with Ansible Tower
    • Personal Access Tokens (PAT) - I am automating my usage of Ansible Tower programmatically
  • SSO: I am using Ansible Tower inside a large organization and want to use a central Identity provider or want to allow users to authenticate using external authentication like Google SSO, Azure SSO, LDAP, SAML, or GitHub.

You now have the knowledge needed to choose the most effective authentication methods for your needs! I hope this guide helps to clarify your options for authenticating with Ansible Tower.




Three quick ways to move your Ansible inventory into Red Hat Ansible Tower

Three quick ways to move your Ansible inventory into Red Hat Ansible Tower

If you've been using Ansible at the command line for a while, you probably have a lot of servers, network devices, and other target nodes listed in your inventory. You know that Red Hat Ansible Tower makes it easier for everyone on your team to run your Ansible Playbooks. So you've thought about using Ansible Tower to take your automation to the next level, but you want to retain all the data and variables in your existing inventory file or directory. Are you worried about transferring your inventory from command-line use to Ansible Tower? Let me show you how easy it is to import your existing Ansible inventory into Ansible Tower!

This blog covers three quick and effective ways to connect your existing Ansible inventory into Ansible Tower:

  1. Migrating an inventory file from the Ansible Tower control node (awx-manage)
  2. Migrating an inventory file from anywhere with a playbook
  3. Setting Tower to access a git source-controlled inventory file

If you're using dynamic inventory, you don't need to import your inventory into Ansible Tower. Dynamic inventory retrieves your inventory from an existing source. With dynamic inventory, you don't need to manage an inventory file at all, you just retrieve the latest and most up-to-date listing every time. Ansible Tower seamlessly integrates with popular dynamic inventory sources including Red Hat OpenStack Platform, Red Hat Satellite, public cloud platforms (Amazon Web Services/AWS, Google Compute Engine/GCE, Microsoft Azure), and virtualization solutions like Red Hat Virtualization and VMware vCenter. You can use scripts to integrate Infoblox DDI and ServiceNow CMDB for dynamic inventory in Ansible Tower as well.

NOTE: This blog does not cover the importing of Ansible Playbooks or Ansible Tower workflows into Ansible Tower and is strictly focused on Ansible inventory portability.

Migrating an inventory file from the Ansible Tower control node (awx-manage)

The command line tool awx-manage, which comes with your Ansible Tower installation, is a simple and effective tool to import your inventory. Using awx-manage makes the most sense when your inventory is a flat file in YAML or ini format that already lives on your Ansible control node. You run the command and point to your existing inventory file then Ansible Tower will be loaded with all the hosts.

  1. Using the WebUI login to Ansible Tower and create an empty inventory.

    inventory

  2. Login via SSH to your Ansible Tower control node (This is the Linux machine that has Ansible Tower installed on it).

  3. Locate the flat-file that represents your Ansible inventory.

  4. Run the awx-manage inventory_import command like this

    sudo awx-manage inventory_import --source=/path/to/hosts --inventory-name="My Inventory"
    

    On the terminal window you will receive some output similar to the following:

    1.387 INFO Updating inventory 3: My Inventory
    1.475 INFO Reading Ansible inventory source: /path/to/hosts
    2.119 INFO Processing JSON output...
    2.120 INFO Loaded 6 groups, 6 hosts
    2.329 INFO Inventory import completed for (My Inventory - 9) in 0.9s
    
  5. Now when you login via the WebUI you will see all the hosts under the inventory

    loaded_inventory

The awx-manage command line tool is very simple and fast. It only took me a couple seconds to take my existing inventory and import it into Ansible Tower.

For teams that use Ansible Tower to run playbooks, but manage inventory outside of Ansible Tower, importing with awx-manage is not the best option, since you would need to re-import the flat-file inventory every time a change is made to your inventory file. If your team will continue to manage inventory outside of Ansible Tower, you probably want to use the GitHub option described below.

Migrating an inventory file from anywhere with a playbook

You can use the Ansible Tower modules to automate the transfer of your inventory into Ansible Tower. These modules make it possible to use Ansible Playbooks to automate and manage everything, including inventory, in your Ansible Tower instance. There is a tower_inventory module that will let us create an inventory, and there is a tower_host module that lets us add a host to an existing inventory. Assume that we already created an inventory called "Network Routers" and I will build an Ansible Playbook to add all my routers in the group routers to that inventory using the tower_host module. The Ansible Playbook will look like this:

    - name: NETWORK SETUP
      hosts: routers
      connection: local
      become: yes
      gather_facts: no
      tasks:
        - name: ADD NETWORK HOSTS INTO TOWER
          tower_host:
                name: "{{ inventory_hostname }}"
                inventory: "Network Routers"
                tower_username: admin
                tower_password: ansible
                tower_host: https://localhost
                variables:
                  ansible_network_os: "{{ansible_network_os}}"
                  ansible_host: "{{ansible_host}}"
                  ansible_user: "{{ansible_user}}"
                  ansible_connection: "{{ansible_connection}}"
                  ansible_become: yes
                  ansible_become_method: enable

The Ansible Playbook will add all devices in the group routers simultaneously. The playbook output will look similar to this: 

Ansible-Playbook

The advantage of this method is you don't have to be on the control node, you can run the Ansible Playbook from anywhere. Like the awx-manage option, transferring your inventory to Ansible Tower with an Ansible Playbook works well only if you will manage your inventory in Tower in future. These two methods are migration strategies to Tower. Ansible If you use dynamic inventory or source control to manage inventory, you'd have to re-run the playbook for Ansible Tower every time you changed your inventory.

Setting Tower to access a git source-controlled inventory file

The final method I want to cover in this post is using source control to manage my inventory. I have a flat-file inventory file stored in a Github repo. I made an example repo to illustrate this concept here:

https://github.com/ipvsean/sample_inventory

Unlike the previous two methods, this is not meant as a migration strategy, but a more permanent way to manage your Ansible inventory using git and source control. Inventory can be managed in Github and Ansible Tower can simply reflect those changes. 

First we need to create an Ansible Tower Project. An Ansible Tower Project is how we can sync Ansible Tower to source code management (SCM) system supported by Ansible Tower, including Git, Subversion, and Mercurial. I will add a Project named Sean's Github, set the SCM Type to Git, and put the SCM URL I listed above.

Tower project ui

Now I need to create an Inventory that will use this Ansible Tower project. I will:

  1. Create an inventory called Sean Github Inventory.
  2. Add a Source called Sean Github Source, and choose the Ansible Tower Project previously created (named Sean's Github).
  3. As soon as the Project is selected a drop down menu will appear and allow us to point directly the hosts flat-file.
  4. Once you create the source you can sync it using the circular arrow sync button. The hosts and groups will automatically show up under the hosts button as shown in the animation below.

github_inventory

Using source control for managing inventory is popular with Ansible Tower users and can scale really well.




Deep Dive on cli_command for Network Automation

Deep Dive on cli_command for Network Automation

In October Ansible 2.7 was released and brought us two powerful agnostic network modules, cli_command and cli_config. Do you have two or more network vendors within your environment? The goal of agnostic modules is to simplify Ansible Playbooks for network engineers that deal with a variety of network platforms. Rather than having to deal with platform specific modules (e.g. eos_config, ios_config, junos_config), you can now use cli_command or cli_config to reduce the amount of tasks and conditionals within a playbook, and make the playbook easier to use. This post will demonstrate how to use these modules and contrast them to platform specific modules. I'll show some playbook examples and common use cases to help illustrate how you can use these new platform agnostic modules.

Both the cli_command and cli_config only work with the network_cli connection plugin. The goal of network_cli is to make playbooks look, feel and operate on network devices, the same way Ansible works on Linux hosts.

What can you do with the cli_command?

The cli_command allows you to run arbitrary commands on network devices. Let's show a simple example using the cli_command, on an Arista vEOS device.

---
- name: RUN COMMAND AND PRINT TO TERMINAL WINDOW
  hosts: arista
  gather_facts: false

  tasks:

- name: RUN ARISTA COMMAND
  cli_command:
    command: show ip interface brief
  register: command_output

- name: PRINT TO TERMINAL WINDOW
  debug:
    msg: "{{command_output.stdout}}"

Previously this would require the eos_command module and would look like this:

---
- name: RUN COMMAND AND PRINT TO TERMINAL WINDOW
  hosts: arista
  gather_facts: false

  tasks:

- name: RUN ARISTA COMMAND
  eos_command:
    commands: show ip interface brief
  register: command_output

- name: PRINT TO TERMINAL WINDOW
  debug:
    msg: "{{command_output.stdout}}"

Both Ansible Playbooks are simple and will output identically. This is what it would look like:

screenshot

Now these two playbooks don't look much different yet, but when you add multiple vendors the playbook complexity without these new agnostic network modules can increase quickly. Previously if I had a mixed vendor environment, I would see the playbook evolve a couple different ways. Sometimes they would contain numerous conditionals (the when statement) like this:

- name: RUN ARISTA COMMAND
  eos_command:
    commands: show ip int br
  when: ansible_network_os == 'eos'

- name: RUN CISCO NXOS COMMAND
  nxos_command:
    commands: show ip int br
  when: ansible_network_os == 'nxos'

- name: RUN JUNOS COMMAND
  junos_command:
    commands: show interface terse
  when: ansible_network_os == 'junos'

Or somewhat better, network automation playbooks would evolve like this:

- name: RUN JUNOS COMMAND
  include_tasks: “{{ansible_network_os}}”

This second method is much cleaner. The include_tasks calls an Ansible Playbook named eos.yml, ios.yml, nxos.yml, etc and runs the corresponding command or tasks that were needed. While this is much better because you can separate Ansible Playbooks based on the network platform, it is still not as succinct or easy as agnostic modules. The underlying functionality is the same, but the Ansible Playbooks become much simpler.

The reason I bring up this include_tasks method is that there is still going to be a time and place, even with agnostic modules, to separate out the playbook logic. For example the command shown above for Juniper is different compared to Arista and Cisco (show ip interface brief versus show interface terse).

With the cli_command let's look at how we can make this agnostic playbook for Cisco, Juniper and Arista extremely simple:

---
- name: RUN COMMAND AND PRINT TO TERMINAL WINDOW
  hosts: routers
  gather_facts: false

  tasks:
    - name: RUN SHOW COMMAND
      cli_command:
        command: "{{show_interfaces}}"
      register: command_output

    - name: PRINT TO TERMINAL WINDOW
      debug:
        msg: "{{command_output.stdout}}"

Three *os_command tasks are reduced to one task. The show_interfaces variable is stored as a group variable on a per-platform basis. For a full example look at this GitHub repository.

Backup example

Let's look at another use-case with the cli_command module. Backing up network configurations is a common network operational task. Ansible Network Automation modules have a backup parameter that helps network engineers automate this mundane, yet critical, task. For example with Arista EOS we can do this:

---
- name: BACKUP NETWORK CONFIGURATIONS
  hosts: arista
  gather_facts: false

  tasks:

    - name: BACKUP CONFIG
      eos_config:
        backup: yes

The cli_command module does not have a backup parameter. Why? Because the backup parameter can be quite inflexible and hard to manipulate. One of the most common feature requests from Ansible users is for every config module to be able to set the backup destination. Rather than recreate an incredible amount of logic and code in each config module, we can reuse an existing module. In this case we can leverage the already widely used copy module!

---
- name: RUN COMMAND AND PRINT TO TERMINAL WINDOW
  hosts: arista
  gather_facts: false

  tasks:

- name: RUN ARISTA COMMAND
  cli_command:
    command: show run
  register: backup

- name: PRINT TO TERMINAL WINDOW
  copy:
    content: "{{backup.stdout}}"
    dest: "{{inventory_hostname}}.backup"

This becomes easy to manipulate what command output we want to save. In this case it is the running configuration, but now we can switch to startup-config just as easily. It also gives the user the control to pick the backup destination directory and file name. An example of an agnostic playbook for backups for Arista EOS, Juniper Junos and Cisco IOS can be found here:

https://github.com/network-automation/agnostic_example

There are a lot of incredible things we can do with the agnostic modules that help make our Ansible Network Automation Playbooks much more succinct and simple. The cli_comand and cli_config modules have been in the Ansible project since October 2018. Consider upgrading if you have not already. If you are already using the cli_command or cli_config module, please share! I will be highlighting more examples using agnostic modules in subsequent blog posts so stay tuned.




Ansible Tips and Tricks, Dealing with Unreliable Connections and Services

Ansible Tips and Tricks, Dealing with Unreliable Connections and Services

Red Hat Ansible Automation is widely known to automate and configure Linux and Windows hosts, as well as network automation for routers, switches, firewalls and load balancers. Plus, there are a variety of modules that deal with the cloud and the API around it such as Microsoft Azure, Amazon Web Services (AWS) and Google Compute Engine.  And there are other modules that interact with Software as a Service (SaaS) tools like Slack or ServiceNow. Although the downtime for these APIs is very minimal, it does happen, and it is even more likely that the connection between your Ansible control host (where you are running Ansible from) and the cloud-centric API could be down.

In this blog post, I will cover some tips and tricks for dealing with unreliable connections to cloud-centric APIs and how I build Ansible Playbooks in a reliable manner. As a technical marketing engineer, I consider my customers the Red Hat field teams, and often Solutions Architects are running playbooks from unreliable hotel wireless, coffee shops and sometimes even airplanes! I have to make sure playbooks have some more robustness built in for these odd situations. It is hair-pulling frustrating to get through a 20 task playbook for it to fail on the 19th task because your wireless went out for a couple seconds. This is especially frustrating if you are at the airport just trying to setup a demo or playground to show something to a client.

The Until Loop

Many people that use Ansible are very familiar with the loop construct. A loop (previously known as with_items) is very simple and powerful and allows you to iterate over a list or dictionary in an easy fashion. However, I find that many people are not aware of the until loop. Let us look at how this can work.

The module ec2_vpc_net allows us to create an AWS Virtual Private Cloud.

- name: Create AWS VPC sean-vpc
  ec2_vpc_net:
    name: "sean-vpc”
    cidr_block: "192.168.1.0/16”
    region: "us-east-1”
  register: create_vpc
  until: create_vpc is not failed
  retries: 5

The name, cidr_block and region are module parameters for the ec2_vpc_net module. However the register, until and retries are task level parameters, meaning that you can use these on any module. This task will attempt to create the VPC five times before it gives up and fails.

Let's step back a minute to see how this works. Each time we run a task there are some common variables that the task returns to let us know how the task performed:

- name: test local playbook
  hosts: localhost
  gather_facts: false

  tasks:
      - name: dumb easy command
        shell: ls -la
        register: task_variable

      - name: debug the var
        debug:
          var: task_variable

When we run this playbook with ansible-playbook test_output.yml we get some standard output (via the debug module) printed to the terminal window (or browser window when using Ansible Tower).

TASK [debug the var] **************************************************************
ok: [localhost] =>
 task_variable:
      changed: true
      cmd: ls -la
      delta: '0:00:00.011018'
      end: '2018-12-07 09:53:14.595811'
      failed: false
...

One of the key, value pairs we always get returned from any Ansible task is a failed key. If the task completed successfully the task will return a failed: false. If the task failed, the task will return a failed: true. Looking back at the until loop logic for the AWS VPC task:

register: create_vpc
until: create_vpc is not failed
retries: 5

We are registering the result of the task so we can look at the failed key, value pair. The until value is the conditional we are applying. In this case we keep running the task until the create_vpc does not have failed: true. However we don't want the task to run this for infinity. The default value for "retries" is 3, however I have increased this to 5. The until loop provides significant robustness to the task. There is also a delay parameter that can be combined with the until loop.  The delay is how much time to wait between retries.  The default value for the delay is 5 seconds.  Check out the documentation for more details and examples of the until loop and the delay parameter.

Changing What A Failure Means

By default, if Ansible fails the playbook will end on that task, for the respective host it was running on. If I had a playbook running on 10 hosts, and it failed on 1 host on task three out of ten, the 7 subsequent tasks would not run for that host. The other hosts would remain unaffected.

With unreliable connections to an outside API we need to think about what is required and not required to define success for a playbook to finish. For example if you had a task spin up a DNS record on AWS's Route53 service, the DNS can be nice to have, but isn't required for you to begin using the instance you created. I can use an until loop to make the route53 tasks more reliable, but it might be OK if the Route53 service is down and unusable. I can use the IP address to get some work done done on my instance until I get a more reliable internet connection to re-run the playbook or the Route53 service becomes available again. There are some tasks that are "nice to have" vs. required.

The way to ignore a failed value is to use the ignore_errors parameter which is a task level parameter outlined in the documentation here. I think there is plenty of content in the docs and various blogs about using the ignore_errors so I think it is sufficient to summarize that ignore_errors will show red and report a failed: true key, value pair, but the playbook will continue on.

What happens if we want to combine the until loop with an ignore_errors?

- name: failure test playbook
  hosts: localhost
  gather_facts: false
  tasks:

    - name: purposely fail
      shell: /bin/false
      register: task_register_var
      until: task_register_var is not failed
      retries: 5
      ignore_errors: yes

    - name: debug task_register_var
      debug:
        msg: "{{ task_register_var }}"

We actually get the best of both worlds with an unreliable task. We get robustness with the until loop, combined with an ignore_errors which allows the playbook to complete regardless of that task completing successfully. I find myself using this combination of ignore_errors and until loops in conjunction with services like Let's Encrypt where it's not 100% required for me to have an SSL cert to start using the web app (I can rely on a self-signed cert until I can figure out the problem).

The Ansible Playbook outputs like this:

TASK [purposely fail] *************************************************************
FAILED - RETRYING: purposely fail (5 retries left).
FAILED - RETRYING: purposely fail (4 retries left).
FAILED - RETRYING: purposely fail (3 retries left).
FAILED - RETRYING: purposely fail (2 retries left).
FAILED - RETRYING: purposely fail (1 retries left).
fatal: [localhost]: FAILED! => changed=true
  attempts: 5
  cmd: /bin/false
  delta: '0:00:00.007936'
  end: '2018-12-07 13:23:13.277624'
  msg: non-zero return code
  rc: 127
  start: '2018-12-07 13:23:13.269688'
  stderr: '/bin/sh: /bin/false: No such file or directory'
  stderr_lines:
  - '/bin/sh: /bin/false: No such file or directory'
  stdout: ''
  stdout_lines:
...ignoring

TASK [debug task_register_var] ****************************************************
  msg:
    attempts: 5
    changed: true

In the Ansible workshops I am actually using this combination of error handling for Let's Encrypt to make it easy for Ansible users to troubleshoot the issue.  If there are any tasks that have a failure that can be skipped, I can add it to a variable and print it at the end of the workshop playbook (the playbook responsible for provisioning instances for students to use).

- name: failure test playbook
  hosts: localhost
  gather_facts: false
  vars:
    summary_information: |
      PROVISIONER SUMMARY
      *******************

  tasks:
    - name: ISSUE CERT
      shell: certbot certonly --standalone -d student1.testworkshop.rhdemo.io --email ansible-network@redhat.com --noninteractive --agree-tos
      register: issue_cert
      until: issue_cert is not failed
      retries: 5
      ignore_errors: yes

    - name: set facts for output
      set_fact:
      summary_information: |
        {{summary_information}}
        - The Lets Encrypt certbot failed, please check https://letsencrypt.status.io/ to make sure the service is running
      when: issue_cert is failed

    - name: print out summary information
      debug:
        msg: "{{summary_information}}"

This prints out a very easy to understand message to the terminal window:

Terminal Readout

In conclusion, Ansible is extremely flexible at adding some additional logic when it is necessary. The until loop can add robustness and the ignore_errors allows us to determine success criteria. In combination your Ansible Playbooks can be much more user proof, allowing you to have a proactive vs. a reactive approach to troubleshooting issues. Ansible can't control if an API or service is down, but we can definitely operate more robustly than home made scripts or DIY API implementations. The playbooks provided are extremely human readable and easy for novice users to understand.