Ansible and Matrix

Building bridges - where the Ansible community needs to go next with chat

This post is long, because I have a lot to cover. But I'm going to lead with the big picture, and if you want details, you can keep reading. Please note that at this stage, this represents my views - as I will repeat at the end, my goal is to have the community ratify this, but first I have to convince you :)

The world turns, and we move with it. The way in which we communicate has evolved, as it always has, and periodically we must catch up if we want to attract new people to our community. Today, that means offering a place where new and existing contributors feel welcomed, safe, and valued, with a rich interface and a low barrier to entry. I want to help build that, because I fear that if we don't, we will fade, and Ansible deserves better than that.

I'm going to lay out how to use the solid base of IRC to become a Matrix-first (but not only, IRC will remain) community, and how I hope to move things forward from the Red Hat side. I know this can be a hot topic for some folks, so (a) please keep the discussion civil, and (b) I want to hear those concerns so that we get this right. That said, "right" has to be for the whole community.

Future Vision - where I'd like to get to in the next 6 months

I'm going to go into the specifics of Matrix & IRC later on, but let me start with this quote from Thibault Martin over at GNOME:

IRC used to be the top notch system for a while, and is still pretty reliable. There are issues here and there, but given how basic the whole protocol is, it is fairly sturdy. The frugality of the protocol that was once its strength has turned into its weakness in the modern world. People’s expectations have shifted. Most of our contributing community is already “working for the computer” when producing the great software we all enjoy. It’s time for instant messaging to get out of the way, and to allow the computer to work for us when it comes to talking to each other. No NickServ spells, no ChanServ incantations, no bouncer to rent or to host on a server. It’s time for us to just enter our credentials, browse the channels, and enjoy the conversation with the ones we like to work with.

I was a long time IRC user, it was the right choice a decade ago; it is not today, and so I agree with Thibault. For our community to grow, we must attract new contributors, and they will want to talk with us. Whilst IRC was once the go-to for that, it no longer is. However, it's not sufficient to ask ourselves what we like, that would be survivor bias. By asking the people who are already happy with IRC if we should keep it, we'll ignore the folks who didn't join because they struggled with IRC. We want to account for those people - we don't know who they are, of course, so we must intuit how to make their life easier. This is why I've spent the last month talking with communities such as GNOME, KDE, Mozilla, Fedora, and OpenDev about their plans and experiences with this.

I know many of you love IRC, and others are specifically wary of Matrix. I'll address that shortly. But to round out this section, I'll give you my view of where the community needs to be in a moderate amount of time (3-6 months):

  • Matrix becomes the default for new users
    • We (the community) promote it actively, that is:
      • We encourage the use of its richness (emoji reactions, posting images etc.)
      • We add it to our docs as the way to enagage with us
      • We make a clear statement that we are a Matrix-first community
    • We (Red Hat) contribute through paying for Element Hosting, being good FOSS citizens and supporting our supply chain
      • This means Matrix rooms can get addresses like #welcome:ansible.com
      • We can potentially give out Matrix accounts to contributors (more on that later)
  • IRC is maintained
    • IRC does not go away! Users who are comfortable with their workflow should not be abandoned
      • Users who express an interest in making their setup work in Matrix should be helped to do so
      • Concerns should be addressed where possible. There are a lot of tuning options we can use to improve the IRC-side experience, but we need to bear in mind that the goal is to modernise our chat, and some change comes along with that.
    • We aim for a gradual osmosis of users to Matrix, aided by giving out Ansible Matrix accounts where suitable
  • We work on making more use of Matrix
    • Use/write Matrix bots rather than IRC ones (there's a GitHub bot already for example, and an RSS bot)
    • Running conferences on Matrix instead of GMeet / Bluejeans (I have a PoC for this)
    • Richer tooling for running online meetings (linkable chat, embedded etherpads, etc)
    • (There's many more things we could try, but for brevity I'll stop here)

I'm excited. I know from my work as a community lead for Satellite/TheForeman that tiny features like being able to give a "thumbs-up" or a "heart" emoji on a comment from a colleague might seem trivial, but it means the world for building the social glue that so often is lacking in online interactions. It's just one example of where richer interaction (combined with a lower barrier to entry, as Thibault said) will reap us rewards as a community.

The idea that a future Contributor Summit would be right there in the Community room for all to view the stream and participate makes me very happy - having it in a silo in GMeet means we get far lower participation. KDE are doing that right now for Akademy (using BigBlueButton), and of course the entire of FOSDEM 2021 was on Matrix/Jitsi, so it's possible.

The possibilities for bots and widgets are likewise huge - Matrix has a REST API for writing bots (and an SDK), and Element can embed HTML widgets in the UI (e.g. KDE are working on a Q&A poll app for doing talks). If we make the rooms fully public (rather than just open to anyone to join) then you can get an URL for any line in the chat history, which would mean forgetting to start Zodbot isn't so critical (although it still has features we like, ofc). There's so much more we could do here, especially in collaboration with other communities. In particular I'd like to experiment with the GitHub Matrix bot that already exists, but there's also Etherpads, Calendars, room conferencing, RSS, Travis bots, and we can always write our own.

Justifying Matrix as the choice

Before we talk about Matrix in more detail, I think it's helpful to review the issues I see with IRC. I want to stress again that IRC is not going away. It works for many, and I don't dispute that. But I'll refer back to that survivor bias - I no longer believe it works for new contributors joining us today.

What's wrong with IRC?

If you ask Mozilla that question, "everything" is the answer, but I don't agree 100%. Some of those concerns apply to us, some do not. In particular, Mike Hoye talks about the problems with unauthenticated networks and spam/hostile chat - we can use channel modes on Libera that require registration. However, the rest of his points stand: - Available interfaces haven't kept up with modern expectations - Spam / harassement is endemic to the platform (not in our channels, but if our users are in other channels they will be exposed to it) - IRC is frequently blocked from within institutions & corporate networks (often because of the IRC spam in general, even if it doesn't affect us) - The Freenode drama has shown that there's also an issue of organisational ownership. We lost everything on Freenode because of the drama of others.

However, whether that adds up to "IRC is bad" depends on your use case:

I already use IRC, why should I stop?

As I said, I think for existing IRC users it's fine. You don't need to stop if you're happy. You've already got your spam protection in place, you've already cast the Nickserv spells, and your organisations clearly permit IRC traffic. Carry on! But take a second, and read over that list of things again. How easy is it for someone unfamiliar with the world of IRC to do that?

I'm new, should I use IRC?

No. It's not what people expect of a modern chat system - Nickserv is ... unintuitive (to be polite) compared to signup systems people encounter today, access is hard (and frequently blocked at network level), the rich interface isn't there, and there's no persistence (without a bouncer, which can be even harder to get right than Nickserv). It's an unnecessarily huge barrier to entry, to a platform that doesn't meet expectations of new users anyway. That's an enormous ask for people who aren't deeply embedded in the project. I find it unsurprising that IRC growth in general has been outpaced by other platforms, and we are no exception.

I'm an organisation, should I set up on IRC?

If I was setting up a new project today, no, I would not use IRC. For an organisation, three things come up. Firstly, access & recruitment - as we've discussed already, a FOSS project needs volunteers, and when we put big barriers in the way, we wither. Secondly, keeping members safe - we have a CoC, but with the way IRC works, it isn't easy to apply, and we cannot ask people to agree to the CoC the way we do in other places. Third, ownership - the recent Freenode drama demonstrated that we didn't have control of our own chat domain that we would have in, say, a forum or by email, and instead watched helpless as our community was fractured by the actions of others.

I don't think we're yet at the point of declaring IRC a "legacy" system (as GNOME are considering), but I do think it's time to start our journey into something more modern and welcoming to the newest of the community. That platform is Matrix - and I as said in the section header, we need to justify that...

What does Matrix bring us?

First we need to define what Matrix is, because I see this getting conflated all the time. Matrix is a protocol for real-time communication (be that text, audio, video or even file sharing). It is not a client, any more than Thunderbird is SMTP or Hexchat is IRC. Concerns such as "Matrix is bloated and slow" or "I like my IRSSI interface" need to be taken aside, because that's a client discussion - a valid discussion for sure, but not in scope here. Those interested may want to look at the clients page, there are many options, from full web UI to a plugin for WeeChat.

Matrix for the old IRC user

I'm assuming here that this user wants to try a Matrix account - those who wish to remain on IRC can do so. If so, well, little changes really. Matrix can be used as IRC, you're just ignoring the richer features like image sharing etc. (there's even an IRC display style in Element). Thanks to bridging you can hop directly into IRC channels on some IRC networks (Libera and OFTC to name two). You'll get persistence for free (no need to run a bouncer), and you can continue largely as normal (this has been my workflow for the last 4 years).

Matrix for the new user

I've already spoken about new-user expectations, so you'd expect that Matrix is similar compared to the likes of Slack etc. All the features you'd expect (reactions, notifications, replies, etc.) are there. The get-started flow is easy too - since Element exists as a webapp, new users can load it in a browser and try it out. If they wish to dive in, they can grab a standalone desktop and/or mobile client for easier access. As such, much of the barrier to entry is removed - no arcane Nickserv process (registraion is either email-based or SSO, as you would expect of a modern service), traffic is not blocked so often, and persistence is there by default.

In either case, it's worth reading this comparison of Matrix vs Email as many of the concepts of email carry over.

Matrix for the organisation

OK, this is the big one - what does Matrix mean for the Ansible community as a whole? I'm going to spend a bit more time on this one, because we're all familiar with chatting, but perhaps these organisational points are a bit more nuanced.

Access and recruitment

As we've said, Matrix supports the feature set that users have come to expect from modern chat. It's no surprise that Ansible would like to grow its community of contributors, but it goes deeper. With the incoming "Spaces" feature, Matrix allows us to bring together rooms into a logical hierarchy, and link to the whole space with a single URL. That's an easy link to drop into our docs, into our chats, and into our blogs / social media, encouraging others to join us. It's essentially bringing our IRC docs page into Matrix and making it one click to jump into a Working Group or similar. With Suggested Rooms marked clearly, new members can find their way to good places to ask questions.

(Note Spaces are still experimental and in beta - this is the intended goal, but for now we would link to the Matrix rooms on the docs page).

Going further, we could make our rooms completely public, which allows "peeking" in Matrix terms - that lets guests view the chat (in read-only mode) before they sign up, allowing them a one-click, in-browser view to check if this is indeed the right channel to join and ask their question - it's hard to make the barrier lower than that. I have enabled this in a small working group room if anyone wishes to try it.

Safety and conduct

Matrix has good tooling for moderation and conduct. Unlike IRC, there's no such thing as an unauthenticated Matrix user (there are guest users but that's not the same thing, and are currently broken anyway), so spam is far less of a problem (in fact I've only seen a single serious spam attack on Matrix, and it was handled far better than IRC). If it's harassment rather than spam, then as with any chat platform, it is possible to kick or permanently ban any problem users, and it's much more intuitive than on IRC.

However, Matrix brings with it something of a quiet revolution in moderation. Matrix is federated (which we haven't spent much time talking about yet) and that means there are other people operating Matrix servers - and in (yet another) analogy to email, we can share information about bad actors. Similarly to how one can subscribe to blocklists for your own email server, Matrix servers can share information about banlists. This is huge - if we decide we trust another group (say, Fedora), then we can subscribe to their banlists, and any bad actors there are immediately removed from our community as well. Forming a group of projects that share values and codes of conduct could make this very strong.

The tools for enforcing the standards we expect of our community run much deeper, but I don't want to go on forever. I'll link to this piece for some thoughts from Mozilla on Matrix's tooling around CoC enforcement.

Ownership

What does the name of a project mean? When a person emails from a work account, what values do they represent? When you join the chat room of a group, who are you talking to? Ownership, namespacing, domains, sovereignty, it has many names, but it is critical.

The Freenode drama showed that we can lose that quickly if we're not careful; almost overnight, all our rooms were empty, the namespace meaningless. The same risk is true of Libera in fact - although I personally trust the admins of Libera, it's still a 3rd-party network. Also true of Slack, Discord, etc. In fact, only email really nails this (oh look, email again). I don't have an Ansible email address, but I do have a Red Hat one, and only Red Hat can take it away from me. When I email from that, I'm doing Red Hat stuff, and that matters.

Matrix can do this too - in the same way that an email server uses an MX DNS record, Matrix has namespaces. Our rooms can be #devel:ansible.com if we wish them to be (a Matrix room can have many aliases), and the primary room address confers powers to manage the room. So long as we hold the DNS, we also hold the rooms - that's one of the benefits of federation.

It goes deeper though, because we'll have accounts to give out, on a second DNS name (https://chat.ansible.im). Why a second domain? Well, here's some extensive thoughts on Matrix sovereignty, but one point made is about impersonation and representation. These accounts will be available to the wider Ansible community, and I don't want these accounts to be associated with Red Hat (i.e. ansible.com) - they are for the community and ownership of an ansible.im MXID should have the same gravitas as a project email address.

There are some decisions to be made about how to hand these accounts out (and how we take them back, if need be) but I'll cover that later.

How do we get there? a.k.a THE PLAN

OK, you have the vision, the concerns, the ways in which Matrix should improve our situation. How do we get there though? There's a few things to do, some of which are already in motion (because they're non-binding) and others need us to collectively agree to implement them.

Homeservers

My notes on sovereignty and domain ownership rest on having our own homeserver. We could self-host this, however we aren't well set up for doing our own ops, and we want to be supporting our FOSS supply chain. As such we have secured funding from Red Hat to pay for a Matrix homeserver with Element Hosting (ems.element.io). This will give us the necessary domain ownership and user accounts to get the community started on the transition to Matrix.

It's worth noting that all the communities I spoke to praise Element hugely. Working with them has been a pleasure so far, and they are very open to feedback, so I expect that relationship will go well and that we can make improvements as we need to. Read the process that Mozilla went through if you want to see how that looked for them.

We are setting up two domains as noted earlier. The first is a small admin-only instance for controlling the ansible.com namespace - giving us Ansible primary room names and the moderation tools to enforce our CoC. The accounts here will be for the folks maintaining the admin install (right now myself & Gundalow, but we'll want to make the bus factor a bit lower in a short time), and will not be for day-to-day use. The second is the main homeserver for user accounts (ansible.im). These are both live as of 2021-06-24, although the ansible.com instance is not yet federating while we work out things with the DNS.

User experience

I'm looking to https://chat.mozilla.org for inspiration here - clearly displayed expectations of the user, links to the coduct pages, and so on. If you make an account and log in, the conduct is shown again. Synapse (the homeserver software) can store consent, so we can ask users to agree to the Code of Conduct. The web UI is clear, we can link to it from our docs and media, and people can hop into the Community channel as a first point of contact. We're setting up ansible.im to look somewhat similar at the moment, Gundalow is working on it (but we both suck at page design, so bear with us :P).

The Matrix Foundation (who maintain the Libera bridge) have kindly granted us admin on our Libera/Matrix rooms (by default libera.chat rooms have only the bridge as admin), so once the ansible.com instance is working, I'll be able to add the ":ansible.com" room ids to each room. I'm already starting to add :ansible.im aliases as well, just for completeness. No Libera.chat IRC channels will change name - these are aliases on the Matrix side only, and we'll use names that make sense (e.g. #community:ansible.com will be an alias to #ansible-community:libera.chat).

These rooms are already organised into a Space (#ansible-space:matrix.org right now) and I'm meeting with Element in the coming weeks to give feedback on Spaces in general, so let me know how you find that if you are in the Spaces beta.

Accounts

This is one area where we need to decide what to do. As we've discussed, a Matrix account is akin to an email, and we don't just give out Ansible email addresses to anyone. We should be similarly careful with Ansible accounts - we would not want someone harassing others or running a scam with an Ansible ID. This means we need two things - a system for handing out accounts, and a protocol for taking them back.

To hand them out, I think it's clear that we don't want open registration. Other groups (Mozilla, Fedora) do this, but they are larger and have other IAM / SSO systems to tie the Matrix IDs to - we do not. GNOME are in a similar position, and are working on a system to tie into Keycloak that allows for existing users to "vouch" for new requests, 2 vouchs gets an approval. That system is not ready yet but GNOME are making it public and will welcome collaboration. In the meantime, I suggest we have a manual-vouch, whereby an admin (likely me, but not only me) receives a request and the request gets the needed vouches from the community and then the account is created. Exact details of making and handling requests are to be determined (by pull request or similar?) but the principle is clear.

To remove an account is trivial technically, but we need a policy. Clearly, any breach of our Code of Conduct is grounds for account removal - the safety of our community is important. It would be natural to assume people who have drifted away from Ansible should have their accounts reclaimed - but as we are billed monthly per active user, this may not in practice be required (only if the user is still active on Matrix, but not in Ansible, would this be needed). We should reserve the right to do so though, even if in practice we don't use it often. Once again, Thib has some thoughts which I agree with:

If you break policy, the organisation can and probably will take the account back and leave you without a chance to retrieve your data. If you leave the organisation, the account can and should probably be taken back from you too. Those accounts can really be compared to work e-mail accounts: you should limit your activities with this account to what you do for the organisation who provides it, and can’t expect to keep it after you leave.

Sadly, the lack of multi-account clients does tend to push people towards using a single Matrix account for everything, but in an ideal world that wouldn't be true, and users would segregate their activies just as they do between work and private email. For now though, it'll be painful, so while we need to have the right to take back an account, I wouldn't want to use that right unless we had to (either for violations or for cost reasons).

Importantly, Matrix is federated. No one is required to have an ansible.im MXID to participate in our community. However, these accounts are part of the package we bought, so we may as well use them, and they may be a useful perk or good for users that haven't already gotten a Matrix ID.

One day, I'd love to have open registration for our community - but until we have something to tie it to (such as Fedora and Mozilla do) then I think the concerns above outweigh the benefits. We'll get there :)

Interaction with IRC

Again I will reinforce that we have no plans to stop using IRC. Inevitably though, some of the richness of a more modern system will be lost on IRC, and we're going to have to take that on the chin. That said, much will work - embedded etherpads can have a direct link for use outside of Matrix, likewise Jitsi (or other conference call software like BBB) has an external URL. These will always be clearly posted so that IRC users are not left out. Likewise we're open to tuning the behaviour of the bridge in our rooms regarding edits and long lines if that becomes a significant issue, and so on. In short, I do not want IRC users feeling ignored - we should discuss issues as they arise.

Let's roll

This is a step change for us, but a needed one (in my opinion). As a community, I think we need to ratify two things:

  1. That we, the Ansible community, accept Matrix as an equal partner to IRC, and we welcome full use of its feature set.
  2. What our policy for giving and removing ansible.im accounts is.

I think I've covered (1) in my arguments earlier. For (2), my suggestions are as above - a pull-request-based vouching workflow, needing 2 ACKs from existing ansible.im accounts to grant access. For removal, we need some wording here but I would suggest it's a power that only the steering commitee can excercise, and that it's for either CoC violations or inactivity within the Ansible community. I look forward to bringing both these statements to the steering committee for ratifying in the near future.

Beyond that, I also want to see us do regular retrospectives so we can adjust as we go. This is not a once-and-done thing, it's a long project, and we need to listen if it's going to work. I think the first one would be ~1 month after we start officially promoting Matrix.

I'm excited. I hope you are too. Please feel reach to reach me for chat any time (at @gwmngilfen:ansible.im) if you have questions. Thanks for reading to the end :)




AIX Patch Management with Ansible

AIX Patch Management with Ansible

Leading enterprises today use Red Hat Ansible Automation Platform to provision, configure, manage, secure and orchestrate hybrid IT environments. A common misconception is that Ansible is just used to manage the Linux operating system. This is a false belief. Ansible supports Linux, Windows, AIX, IBM i and IBM z/OS environments. This blog will help AIX system administrators get started with Ansible on AIX, and introduce a patching use case.

Ansible Content Collections

When Ansible Automation Platform was released, Ansible Content Collections became the de facto standard for distributing, maintaining and consuming automation content. The shift to Collections increased community participation and has exponentially increased the number of stable and supported Ansible modules. Modules delivered via Collections rather than packaged with Ansible Core have resulted in a faster release cadence for new modules.

Let us explore the IBM provided Ansible Collection for AIX. It is important to note that many of the Ansible modules for the Linux operating system will also work on AIX (in addition to the IBM provided AIX modules), making the use cases for Ansible on AIX very broad.

Ansible and AIX, why?

The AIX operating system has been around for 35 years and is used to run business-critical applications. Historically, AIX systems were managed using the tools that ship with AIX, complimented by shell scripts written by AIX system administrators. The problem with this approach is that these scripts can become extremely complex over the years, and often wind up being held together with "duct tape and zip ties".

As enterprises move to a modern, enterprise-wide automation strategy with Ansible Automation Platform, extending automation to AIX is a great method to simplify and develop consistency in the way AIX systems are supported, all while using the same automation tools that can be used across the enterprise.

Ansible Concepts

First let us cover some basic Ansible concepts that will be used in the example. Further information can be found on the Ansible documentation site.

Playbooks, which are ordered lists of tasks and variables that are performed against an inventory of hosts.

Tasks are a single unit of action in Ansible, which calls a module.

Modules are code that Ansible executes. Each module could be something like copying a file to using NIM to trigger an AIX update.

Roles are repeatable bundles of tasks that are contained in a specific directory structure.

Variables within Ansible are called like this "".

Task delegation is how tasks can be delegated to another host in the inventory, other than the host that the Ansible run is targeted against.

Getting started with Ansible

In this example, I'm using a Fedora Linux 34 workstation, so I'm going to use the dnf package manager to install Ansible:

$ sudo dnf install -y ansible

Once Ansible is installed, I'm going to install the ibm.power_aix Collection:

$ ansible-galaxy collection install ibm.power_aix

When Ansible is installed, a default inventory file /etc/ansible/hosts is created. At this point, I'm going to include in the inventory the hosts used in this example:

  • nim01 is our AIX 7.2 NIM Master which is functional and has an lpp_source defined.
  • bruce is an AIX 7.2 NIM client registered to the nim01 NIM master.
  • freddie is an AIX 7.2 NIM client registered to the nim01 NIM master.
$ cat /etc/ansible/hosts
nim01 ansible_host=10.0.0.5 ansible_user=root
bruce ansible_host=10.0.0.6 ansible_user=root
freddie ansible_host=10.0.0.7 ansible_user=root

I'm now going to connect to all the systems over SSH as "root". The usual practice is to have a service account with "sudo" access, however for this example I will use "root" in our lab environment. Using the ssh-copy-id command, I can distribute my SSH public key to the AIX servers.

$ ssh-copy-id root@nim01
$ ssh-copy-id root@bruce
$ ssh-copy-id root@freddie

The next step is to use the Ansible ping module to check that I can connect to the three hosts in our inventory.

$ ansible -m ping all

PLAY [Ansible Ad-Hoc] ************************************************************************************************************************************************************************************************************************

TASK [ping] **********************************************************************************************************************************************************************************************************************************
ok: [nim01]
ok: [freddie]
ok: [bruce]

PLAY RECAP ***********************************************************************************************************************************************************************************************************************************
nim01                  : ok=1    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
bruce                  : ok=1    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
freddie                : ok=1    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

Ansible needs "python" to be installed on the AIX systems, and ideally the "yum" package manager should also be configured on AIX. If your AIX systems do not have these packages installed, or is a vanilla installation of AIX, IBM provides an Ansible Role to "bootstrap" an AIX system and manage it.

The playbook below uses the IBM provided role to prepare an AIX system for Ansible automation.

cat aix_bootstrap.yml
---

- name: Prep AIX for Ansible
  hosts: all
  vars:
    pkgtype: yum
  collections:
    - ibm.power_aix
  roles:
    - power_aix_bootstrap

The following example demonstrates running the playbook; however, I can see that the hosts Ansible is running against already have "python" and "yum" installed, so there is no need for any changes to be made to these hosts.

$ ansible-playbook aix_bootstrap.yml

PLAY [Prep AIX for Ansible] ******************************************************************************************************************************************************************************************************************

TASK [Gathering Facts] ***********************************************************************************************************************************************************************************************************************
ok: [bruce]
ok: [freddie]
ok: [nim01]

TASK [ibm.power_aix.power_aix_bootstrap : Fail if pkgtype not specified] *********************************************************************************************************************************************************************
skipping: [nim01]
skipping: [bruce]
skipping: [freddie]

TASK [ibm.power_aix.power_aix_bootstrap : Fail if download_dir not specified] ****************************************************************************************************************************************************************
skipping: [nim01]
skipping: [bruce]
skipping: [freddie]

TASK [ibm.power_aix.power_aix_bootstrap : Fail if target_dir not specified] ******************************************************************************************************************************************************************
skipping: [nim01]
skipping: [bruce]
skipping: [freddie]

TASK [ibm.power_aix.power_aix_bootstrap : Fail if rpm_src not specified] *********************************************************************************************************************************************************************
skipping: [nim01]
skipping: [bruce]
skipping: [freddie]

TASK [ibm.power_aix.power_aix_bootstrap : Fail if yum_src not specified] *********************************************************************************************************************************************************************
skipping: [nim01]
skipping: [bruce]
skipping: [freddie]

TASK [ibm.power_aix.power_aix_bootstrap : Bootstrap yum] *************************************************************************************************************************************************************************************
included: /home/tholloway/.ansible/collections/ansible_collections/ibm/power_aix/roles/power_aix_bootstrap/tasks/yum_install.yml for nim01, bruce, freddie

TASK [ibm.power_aix.power_aix_bootstrap : Check for existence of yum] ************************************************************************************************************************************************************************
changed: [bruce]
changed: [nim01]
changed: [freddie]

PLAY RECAP ***********************************************************************************************************************************************************************************************************************************
nim01                  : ok=3    changed=1    unreachable=0    failed=0    skipped=5    rescued=0    ignored=0
bruce                  : ok=3    changed=1    unreachable=0    failed=0    skipped=5    rescued=0    ignored=0
freddie                : ok=3    changed=1    unreachable=0    failed=0    skipped=5    rescued=0    ignored=0

Now that the platforms meet the required minimum components, I am now ready to automate AIX operations.

Running an AIX Update using NIM and Ansible

First off, I'll use a simple playbook to see what "oslevel" our NIM master and NIM clients are on, before I start.

$ cat aix_oslevel_check.yml
---

- name: AIX oslevel checking playbook
  hosts: all
  tasks:

  - name: Gather LPP Facts
    shell: "oslevel -s"
    register: output_oslevel

  - name: Print the oslevel
    debug:
      msg: "{{ ansible_hostname }} has the AIX oslevel of {{ output_oslevel.stdout }}"

Running that playbook delivers the below result. I can see that bruce and freddie are a service pack behind.

$ ansible-playbook aix_oslevel_check.yml

PLAY [AIX oslevel checking playbook ] *****************************************************************************************************************************************************************************************************************

TASK [Gathering Facts] ***********************************************************************************************************************************************************************************************************************
ok: [bruce]
ok: [freddie]
ok: [nim01]

TASK [Gather LPP Facts] **********************************************************************************************************************************************************************************************************************
changed: [freddie]
changed: [bruce]
changed: [nim01]

TASK [Print the oslevel] *********************************************************************************************************************************************************************************************************************
ok: [nim01] =>
  msg: nim01 has the AIX oslevel of 7200-05-02-2114
ok: [bruce] =>
  msg: bruce has the AIX oslevel of 7200-05-01-2038
ok: [freddie] =>
  msg: freddie has the AIX oslevel of 7200-05-01-2038

PLAY RECAP ***********************************************************************************************************************************************************************************************************************************
nim01                  : ok=3    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
bruce                  : ok=3    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
freddie                : ok=3    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

To ensure all systems are operating on the same OS level, I need to download the latest service pack. It should define an "lpp_source" on our NIM master. Make sure that the name of the "lpp_source" matches the example below, or the Ansible module will not detect the "oslevel".

$ cat aix_download.yml
---

- name: AIX Patching Playbook
  hosts: nim01
  vars:
    oslevel: 7200-05-02
    nim_lpp_source: 7200-05-02-2114-lpp_source
  collections:
    - ibm.power_aix
  tasks:

  - name: Download AIX Updates
    nim_suma:
      action: download
      download_dir: "/export/nim/lpp_source"
      lpp_source_name: "{{ nim_lpp_source }}"
      oslevel: "{{ oslevel }}"
      targets: 'bruce, freddie'

Next step is to run the download playbook. It will download the required updates from IBM Fix Central and define an "lpp_source" on the NIM master:

$ ansible-playbook aix_download.yml

PLAY [AIX Patching Playbook] *****************************************************************************************************************************************************************************************************************

TASK [Gathering Facts] ***********************************************************************************************************************************************************************************************************************
ok: [nim01]

TASK [Download AIX Updates] ******************************************************************************************************************************************************************************************************************
changed: [nim01]

PLAY RECAP ***********************************************************************************************************************************************************************************************************************************
nim01                  : ok=2    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

Now I can run a patching playbook, which will make use of the "alt_disk" and "nim" Ansible modules. The playbook is going to perform the following tasks:

  • Remove any existing "altinst_rootvg" "alt_disk_copy" that is left on the AIX system.
  • Create a new "alt_disk_copy" clone of the root volume group to a spare disk as a backup.
  • Run an application stop script.
  • Run the AIX update via task delegation to the NIM master.
  • Reboot.
  • Run an application start script.
---

- name: AIX Patching Playbook
  hosts: bruce,freddie
  vars:
    nim_lpp_source: 7200-05-02-2114-lpp_source
    nim_master: nim01
  collections:
    - ibm.power_aix
  tasks:

  - name: Cleanup any existing alt_disk_copy
    alt_disk:
      action: clean

  - name: Create an alt_disk_copy for backup
    alt_disk:
      targets: hdisk1

  - name: Stop Application
    shell: /usr/local/bin/stop.sh

  - name: Run AIX Update
    nim:
      action: update
      lpp_source: "{{ nim_lpp_source }}"
      targets: "{{ ansible_hostname }}"
    delegate_to: "{{ nim_master }}"

  - name: Reboot
    reboot:
      post_reboot_delay: 180

  - name: Start Application
    shell: /usr/local/bin/start.sh

Now I will run the playbook and patch the NIM client systems "bruce" and "freddie":

$ ansible-playbook aix_patching.yml

 PLAY [AIX Patching Playbook] *****************************************************************************************************************************************************************************************************************

TASK [Gathering Facts] ***********************************************************************************************************************************************************************************************************************
ok: [bruce]
ok: [freddie]

TASK [Cleanup any existing alt_disk_copy] ****************************************************************************************************************************************************************************************************
changed: [bruce]
changed: [freddie]

TASK [Create an alt_disk_copy for backup] ****************************************************************************************************************************************************************************************************
changed: [bruce]
changed: [freddie]

TASK [Stop Application] **********************************************************************************************************************************************************************************************************************
changed: [bruce]
changed: [freddie]

TASK [Run AIX Update] *************************************************************************************************************************************************************************************************************************
changed: [bruce -> 10.0.0.5]
changed: [freddie -> 10.0.0.5]

TASK [Reboot] ********************************************************************************************************************************************************************************************************************************
changed: [freddie]
changed: [bruce]

TASK [Start Application] *********************************************************************************************************************************************************************************************************************
changed: [bruce]
changed: [freddie]

PLAY RECAP ***********************************************************************************************************************************************************************************************************************************
bruce                 : ok=7    changed=6    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
freddie               : ok=7    changed=6    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

Next, I will run the aix_oslevel_check.yml playbook again and see that the systems are all on AIX 7.2 TL5 SP2.

$ ansible-playbook aix_oslevel_check.yml

PLAY [AIX oslevel checking playbook ] *****************************************************************************************************************************************************************************************************************

TASK [Gathering Facts] ***********************************************************************************************************************************************************************************************************************
ok: [bruce]
ok: [freddie]
ok: [nim01]

TASK [Gather LPP Facts] **********************************************************************************************************************************************************************************************************************
changed: [freddie]
changed: [bruce]
changed: [nim01]

TASK [Print the oslevel] *********************************************************************************************************************************************************************************************************************
ok: [nim01] =>
  msg: nim01 has the AIX oslevel of 7200-05-02-2114
ok: [bruce] =>
  msg: bruce has the AIX oslevel of 7200-05-02-2114
ok: [freddie] =>
  msg: freddie has the AIX oslevel of 7200-05-02-2114

PLAY RECAP ***********************************************************************************************************************************************************************************************************************************
nim01                  : ok=3    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
bruce                  : ok=3    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
freddie                : ok=3    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

Conclusion

As you can see from this example, Ansible provides a lot of value in automating AIX operations. For additional information, see the documentation for the supported Collection available from Automation Hub. This Collection is also available to the community from Ansible Galaxy.







Using VMware vCenter Tags in a Red Hat Ansible Tower Dynamic Inventory

Using VMware vCenter Tags in a Red Hat Ansible Tower Dynamic Inventory

VMware vCenter Server tags are labels that can be applied to objects like the system's environment and usage, therefore it is a very useful method of asset management - also making tags a perfect fit in the Ansible world to organize systems in an Ansible inventory. Red Hat customers have regularly requested the ability to use vCenter Tags in Red Hat Ansible Tower. This is now possible with an Ansible Tower inventory source that supports tags and provides the vmware_vm_inventory plugin.

Ansible Automation Platform 1.2 brings completely native Ansible inventory plugin support to Ansible Tower 3.8. In previous versions, there were specific inventory plugin configurations based on the old inventory scripts where a specific set of parameters surfaced in Ansible Tower's user interface. For example: cloud region and a specific subset of variables you could pass to those inventory scripts surfaced as variables you could pass to the inventory source, which means that new configuration parameters that come with Ansible inventory plugins are not supported in order to maintain compatibility with the old inventory scripts. 

The move to support native inventory plugins allows Red Hat Ansible Automation Platform customers to use all the configuration parameters available through the plugin, as well as supporting any future new plugin features automatically.

So as an example, the screenshot below shows the source configuration panel difference between an older version of Ansible Tower (3.7 in this case) and the new source configuration in Ansible Tower 3.8. This specific example is for an Amazon EC2 source in Ansible Tower 3.8:

vcenter tags blog one

As you can see, the "Instance Filters" and "Regions" configuration options are no longer a part of the user interface in Ansible Tower 3.8, but the configuration can now be done in the "Source Variables" section of the inventory source definition. This Ansible Tower instance was actually upgraded from 3.7 to 3.8, and during the upgrade, the platform installer takes old inventory sources and converts them to a compatible inventory plugin configuration - therefore there will be a lot of entries in the section to maintain the same outcome for upgraded sources - groups created by default for example - as the old inventory scripts.

Pretty exciting stuff!

Environment Setup

So the vmware_vm_inventory plugin supports tags using a configuration parameter - with_tags - which defaults to false - so we will need to set that to true in our source definition, but as stated in the documentation linked above, using this parameter requires the vSphere Automation SDK library to be installed on the controller machine - in our case, the Ansible Tower nodes. The documentation also links to this URL for the installation steps.

For this example, we will be using six VMs that were created:

Name Type Tags
testvm_1 RHEL7 Dev, TestVM, Linux
testvm_2 RHEL7 Prod, TestVM, Linux
testvm_3 RHEL8 Dev, TestVM, Linux
testvm_4 RHEL8 Prod, TestVM, Linux
testvm_5 Win2019 Dev, TestVM, Windows
testvm_6 Win2019 Prod, TestVM, Windows

First step is to make sure that our Ansible Tower nodes have the required library to use this feature. As we can use an inventory source with a custom python virtual environment, we will create a new python virtual environment under /opt/towervenvs called vmware-venv, and will be installing the required libraries in that environment (you can read more about Ansible Tower's virtual environments and how to use them in the documentation).

$ sudo /opt/towervenvs/vmware-venv/bin/pip3 install --upgrade pip setuptools
$ sudo /opt/towervenvs/vmware-venv/bin/pip3 install --upgrade  git+https://github.com/vmware/vsphere-automation-sdk-python.git

Make sure that the virtual environment and the required libraries are installed on all nodes in the Ansible Tower cluster, and that Ansible Tower is configured to look for virtual environments under the directory they are defined in. This setting can be found under Settings > System > CUSTOM VIRTUAL ENVIRONMENT PATHS

vcenter tags blog two

Next, we need to configure a credential for vCenter that Ansible Tower will use when syncing the inventory. 

In Ansible Tower, from the left hand panel under resources select "Credentials" and click the add icon and add a new credential. In the new credential configuration panel, enter a name for your new credential and choose "VMware vCenter" as the credential type and fill in the required information - here is what the credential definition looks like:

vcenter tags blog three

Creating the dynamic inventory source in Ansible Tower

Now it's time to create the inventory. In Ansible Tower, from the left hand panel under resources, select "Inventories" and click the add icon and add a new inventory. Give the inventory a name and select an organization for the inventory - we'll call ours "VMware Inventory", and assign it to Red Hat Organization.

vcenter tags blog four

Click "Save" and the sources tab is now enabled. Now go to the sources tab, click the add icon to add a new source - Give it a name, and choose VMware vCenter as the source, and choose the credential that we created earlier (the credential may already be auto populated if it's the only credential of the type "VMware vCenter" defined), and make sure to select the virtual environment that has the required library installed under it.

Under source variables we will add the following and click save:

---
plugin: community.vmware.vmware_vm_inventory
hostnames:
- 'config.name'
properties:
- name
- network
- overallStatus
- value
- capability
- config
- guest
- runtime
- summary
with_nested_properties: true
with_tags: true

vcenter tags blog five

Our new inventory source is now created and will appear under sources Let's now click on the sync icon to pull in our list of virtual machines (VMs). After the sync job completes, and the cloud icon next to the source turns green, we can now go into the list of hosts and see all the hosts that are in vCenter, and if we click on any of the hosts we can see the associated tags under the "tags" key. Awesome!

vcenter tags blog six

vcenter tags blog seven

Creating inventory groups based on tags

The previous configuration will pull in all the hosts in vCenter with their associated tags, and the guest attributes we defined based on what is available in the inventory plugin's documentation. But we only want to pull in VMs that have the tag "TestVM", and we want to create groups based on the tags associated with the VMs that are imported, their power state and their guest ID. So let's add some filters, as well as some keyed groups definition. Go back to the inventory source we defined, and replace the definition under source variables with the following:

---
plugin: community.vmware.vmware_vm_inventory
hostnames:
- 'config.name'
properties:
- name
- network
- overallStatus
- value
- capability
- config
- guest
- runtime
- summary
with_nested_properties: true
with_tags: true
keyed_groups:
- key: tags
  prefix: "vm_tag_"
  separator: ""
- key: config.guestId
  prefix: ''
  separator: ''
- key: summary.runtime.powerState
  prefix: ''
  separator: ''
filters:
- "'TestVM' in tags"

And refresh the inventory source again.

And just like that, we have a list of only the hosts that are tagged with TestVM, as well as groups created based on the tags defined in vCenter.

vcenter tags blog eight

vcenter tags blog nine

The new native Ansible inventory plugin support may upgrade the level of difficulty, as you will have to know how to configure the inventory plugin you want to use, but it gives users a lot of flexibility.




Automation Savings Planner

Automation Savings Planner

Enterprise organizations understand that to be leaders in their industries, they must change the way they deliver applications, improve their relationships with customers and gain competitive advantages.

Positioning those advantages to have a positive return on investment often starts with proper planning and automation. But what does proper planning of your automation even look like?

For some enterprises, proper planning includes reducing automation costs. For others, it's reducing time spent to open new opportunities.

With this in mind, Red Hat is excited to introduce Automation Savings Planner, a new enhancement that puts automation planning in the forefront within the hosted services on console.redhat.com.

The Automation Savings Planner is designed to provide a one stop shop to plan, track and analyze potential efficiency improvements and cost savings of your automation initiatives.

How does it work?

Users can create an automation savings plan within Automation Analytics accessible at cloud.redhat.com by defining how long and often the work is done manually, as well as a list of tasks needed to successfully automate this job.

Once defined, you can integrate your newly automated savings plans to automation controller's job templates to help accurately detect if the automation is successfully running across your infrastructure. You can also view projected cost and time savings from automating the job over time.

With these enhancements, you get a detailed overview on how to optimize and prioritize the various automation jobs throughout your organization, based on time and money saved. This allows you to decide what things are most important to automate first.

Ready to start saving? Let's get started!

The first step is to create an automation savings plan that defines the tasks needed to complete an automation job. 

First in the side navigation in Automation Analytics, select the Savings Planner navigation item. Then, click on the blue button labeled Add plan.

ROI blog one

Within the Create new plan section, fill out the details for the task you want to automate. The questions include:

  • What do you want to automate? (e.g., Provision an Apache server)
  • What type of task is it? (e.g., Operating System)
  • A description of your automation plan
  • How long does the process take to complete manually? (e.g., 4 hours)
  • How many hosts do you plan to run the automation on? (e.g., 1)
  • How often do you plan to run the automation? (e.g., weekly)

ROI blog two

Once you've completed the Details section, select the blue Next button on the lower left pane of the window.

Within the Tasks section, list out all the tasks that are needed to complete this plan. Write out each task and select the (+) to add it to your Tasks list. 

For example, if we were looking to successfully install an Apache web server, we'd likely include tasks such as:

  • Install Apache package
  • Start HTTPD service
  • Enable HTTPD service
  • Enable firewall port 80
  • Configure VirtualHost
  • Secure Apache web server

ROI blog three

Once you've completed the Tasks section for your specific plan, select Next

NOTE: These tasks are for your planning purposes, and do not currently factor into the savings estimates provided by Automation Analytics.

Lastly, within the Link template section, select the appropriate template to link to this plan and click Save

Once saved, you can view the newly created plan details. 

ROI blog four

In this Details view you will find a summary of all the options created and selected for your plan. 

If you notice something is amiss, you can easily make changes to your plan using the Edit button located at the bottom left corner of the Details section. 

And that's it!

With this newly created plan we can use Automation Savings Planner to share a projection of how much time and money you could save by automating a specific job. Automation Analytics takes data from the plan details and the associated job template to provide you with an accurate projection of your cost savings when you complete this savings plan.

Where can I find these stats?

Simply navigate to your Automation Savings Planner page, click on the name of an existing plan and navigate to the Statistics tab. You can also get to this screen by clicking the "Projected Savings" links in the card-based list of savings plans.

The statistics chart displays a projection of your monetary and time savings based on the information you provided when creating a savings plan. Primarily, the statistics chart subtracts the automated cost from the manual cost of executing the plan to provide the total resources saved through automation. The chart then displays this data by year to show you the cumulative benefits for automating the plan over time.

Click between Money and Time to view the different types of savings for automating the plan. An example is shown below.

ROI blog five

How are the Money and Time values determined?

Risk-adjusted factors are used to create a 3-year model projection of costs and savings related to automation. The objective is to provide as accurate a representation of cost and savings as possible but understand that actual values may differ in practice.

The following information breaks down:

  • where we get the data
  • the risk-adjustment factors we use
  • the assumptions we make
  • the formula used to compute the values as displayed in the chart

The cost portion of the formula includes hours spent in

  • Implementation
  • Deployment
  • Training
  • Other expenditures for creating, maintaining & running the automation

The hours (cost of investment) are typically higher on the onset and are greatly reduced once the automation has been created and only maintenance is required. 

For the initial period (including the first year), the formula uses the following variables for its calculation.

  • TIME - time for manual run on one host (in hours) multiplied by 10
  • BufferTime -extra time for unforeseen and unaccounted delays and familiarization with requirements
  • RISK - a 40% risk adjustment¹ is applied for unforeseen situations

The formula for the initial period and first year is represented as follows:

C1 = TIME + BufferTime

C2 = C1 * RISK

initial cost = (C1 + C2) * COST

year 1 cost = (C1 + C2) * COST²

The next two years after the first year, the formula uses the following variables for its calculation. 

  • TIME - time for manual run on one host (in hours) multiplied by 4
  • RISK - a 40% risk adjustment¹ to account for unforeseen situations

The formula for the next two years is represented as follows:

C1 = TIME

C2 = C1 * RISK

year 2 cost = (C1 + C2) * COST²

year 3 cost = (C1 + C2) * COST²

With the details on how cost is calculated for the plan, let's talk about savings.

The savings indicates the time and money saved as a result of automating the plan. 

A 50% productivity recapture rate is taken to account for the productivity that is gained by repeated manual implementation of a task over a period of time. Included is a -5% risk adjustment for unforeseen situations that may arise and need to be handled.

A savings growth rate of 15% year over year is used. 

The initial period of money savings results in $0. As such no formula is necessary for that period. 

The formula to calculate savings for the initial period is shown below:

Initial period of Savings = $0 - initialCost

The  formula used for savings for year one are:

S1 = (HOSTS * (TIME/60) * FREQUENCY)

S2 = S1 * RECAPTURE

S3 = S2 * RISK * COST²

Year One Savings = S2 - S3 - Year One Cost
  • HOSTS - number of hosts
  • TIME - manual time in minutes
  • FREQUENCY - yearly frequency of automation
  • RECAPTURE - 50% productivity recapture
  • RISK - 5% Risk Adjustment

The formula used to capture savings for year two:

S1 = Year One Savings * GROWTH

Year Two Savings = Year One Savings  + S1 - Year 2 Cost

GROWTH - 15% Growth

The formula used to capture savings for year three:

S2 = Year Two Savings * GROWTH

Year Three Savings = Year Two Savings + S2 -Year 2 Cost

And there you have it! The inner workings of how money and savings are calculated to give you the projected savings of automating tasks your organization is currently doing manually.

By using Automation Savings Planner, enterprise organizations can gain competitive advantages and a positive return on their investments by automating key elements of their business. This not only saves time and money, but allows businesses to expand their automation capabilities to deliver applications, meet expectations and improve their relationships with their customers. 

¹ A Forrester Total Economic Impact™ Study

² Cost per hour in USD if applicable, based on display.