AnsibleFest 2021 Network Automation Track

AnsibleFest 2021 Network Automation Track

This year, we are adapting our signature automation event, AnsibleFest, into a free virtual experience to connect our communities with a wider audience and to collaborate to solve problems. Seasoned pros and new Ansible enthusiasts alike can find answers and learn more about Red Hat Ansible Automation Platform, the platform for building and operating automation at scale and creating an enterprise automation strategy. Have you already automated some type of server or infrastructure management? Use the network automation track to understand the benefits that come with automating network management the Ansible way. 

Let's take a closer look at this track for AnsibleFest 2021.

Network Automation at AnsibleFest

Gone are the days of hand-typing commands into network devices one by one, because you simply can't keep up. Manage your network infrastructure using Ansible Automation Platform throughout the entire development and production life cycle, and free time as a result to focus on your top priority network engineering challenges. This AnsibleFest track focuses on network automation topics for automation content developers as well as network and cloud engineers or operators

Attendees will learn how network automation can no longer be a point tool, but instead part of a holistic automation strategy that spans NetOps and even IT teams. You will hear about some key use cases in network automation, as well as a new builder experience that makes it easier to create great network automation so your team can move faster.  

You'll hear from our customers who will share their experiences, as well as our network partners who will explain how to automate their solutions using Ansible Automation Platform. We also suggest you take a look at the Telco track, which features additional customers and advanced topics like AIOps.  

Here are a few sessions that you can expect to see in the network automation track: 

  • Journey to Network Automation (great for getting started) by Sean Cavanaugh and Trishna Guha 
  • NetDevOps: The network developer automation experience by Brad Thornton and Michael Ford
  • From Lunch Hobby to Production Ready: The ANZ Bank NZ Network Automation Journey by Roxy Rice, Kyle Claudi, Tony Thistol and Joseph Tejal
  • Everything as code and GitOps with Ansible by customer Emile Zweep and Anton Nesterov of ODC-Noord, a government data center in the Netherlands
  • Ansible wrapper to schedule any workflow while using ServiceNow change management and MS Teams by Rick Walsh of US Bank
  • Use case sessions offered by our engineering team, as well as any of the network partner sessions 

And of course, check out the Telco track to find sessions by customers Cox Communications and Bell Canada. 

We can't wait for AnsibleFest 2021 to kick off and hope to see you there, virtually!







Audit your VMware vCenter Server using Ansible

Audit your VMware vCenter Server using Ansible

vCenter has a graphical user interface if you want to interact with it, but what if you manage multiple vCenter servers and want to automate audits or the maintenance of those servers? In this blog, we will see how we can retrieve details about the VMware vCenter Server directly using Ansible. The practices laid out in the blog will help system administrators responsible for managing multiple vCenter servers. In addition, Ansible automation becomes imperative in development environments for testing against multiple instances in your CI/CD pipeline.

The new vmware.vmware_rest Collection has recently been released and published, and it comes with a new set of modules dedicated to vCenter Server (VCSA) management.

VMware vSphere (Product bundle that includes vCenter Server and other features) 7.0.2 (a.k.a 7.0U2) comes with some new REST end-points. This REST API does not cover all the features exposed over the SOAP interface. Modules in the vmware.vmware_rest Collection are built on top of this API and face the same limitations.

The vmware.vmware_rest Collection contains these modules, which is supported by Red Hat and available on Ansible automation hub.

Validate the state of a vCenter Server instance from Ansible

Taking our own dogfooding example (or drinking our own champagne!), our cloud/infrastructure team maintains a CI to validate the VMware Ansible modules. Everytime a new change is submitted, the full test suite is run against a freshly deployed VMware lab. The initial deployment takes 15 minutes and so we cannot spawn a new environment before each of the dozen of tests are run. Hence, it becomes important to keep our test environments as clean as possible.

We use these new appliance modules to build an audit report of the vCenter instance before and at the end of the test suite run. This way, it will be easier to spot any inconsistency between test runs.

The appliance modules cover the following use cases.

  • Access → localaccounts, audit and control the Console, Direct Console UI, the Shell or even SSH.
  • Health → retrieve information about the state of the system component.
  • Networking → collect information about the network configuration and adjust it.
  • System → manage services, reboot the system, get the storage configuration, get the state of the updates, etc.
  • Time management → configure the NTP server, adjust the timezone.

How to start using these modules

The latest release of vmware_rest Collection available on Ansible automation hub supports vSphere 7.0.2 and greater.

We can pass the authentication keys either through some environment variables or with the module parameters. In the following example, we use the first option. For example:

VMWARE_HOST=<vsphere_host>
VMWARE_PASSWORD=<vsphere_password>
VMWARE_USER=<vsphere_username>

Note: The community.vmware Collection uses the same environment variables.

We will try to explain some sample use cases below for the readers to understand how you can start using these modules.

Collect information about a VCSA instance

In this first example, we secure the appliance by turning off any potential user interfaces. The REST interface that the modules use remains available. Here's how you can check that using the modules available.

- name: Shell access should be disabled
  vmware.vmware_rest.appliance_access_shell_info:
- name: The Direct Console User Interface should also be disabled
  vmware.vmware_rest.appliance_access_dcui_info:
- name: We need the SSH access
  vmware.vmware_rest.appliance_access_ssh_info:

Response:

{
    "changed": false,
    "value": {
        "enabled": false,
        "timeout": 0
    }
}

{
    "changed": false,
    "value": false
}

{
    "changed": false,
    "value": true
}

The health states

We can rely either on the appliance_health modules or the other info modules to audit the state of your VCSA. For instance, here we check that the system load and the database are in a green state.

- name: Ensure the database health status is green
  vmware.vmware_rest.appliance_health_database_info:


- name: Get the system load status
  vmware.vmware_rest.appliance_health_load_info:


- name: Get the system load status
  vmware.vmware_rest.appliance_health_system_info:

Response:

{
    "changed": false,
    "value": {
        "messages": [
            {
                "message": {
                    "args": [],
                    "default_message": "DB state is Degraded",
                    "id": "desc"
                },
                "severity": "WARNING"
            }
        ],
        "status": "DEGRADED"
    }
}

{
    "changed": false,
    "value": "gray"
}

{
    "changed": false,
    "value": "gray"
}

In this example, our database is in a degraded state and the rest of the system is not in the optimal GREEN state.

Network configuration

Ansible is also able to read and set the network configuration of the VCSA. The appliance_networking_info modules return a system-wide overview of the network configuration:

- name: Get network information
  vmware.vmware_rest.appliance_networking_info:

Response: 

{
    "changed": false,
    "value": {
        "dns": {
            "hostname": "vcenter.test",
            "mode": "DHCP",
            "servers": [
                "192.168.123.1"
            ]
        },
        "interfaces": {
            "nic0": {
                "ipv4": {
                    "address": "192.168.123.8",
                    "configurable": true,
                    "default_gateway": "192.168.123.1",
                    "mode": "DHCP",
                    "prefix": 24
                },
                "mac": "52:54:00:c9:06:64",
                "name": "nic0",
                "status": "up"
            }
        },
        "vcenter_base_url": "https://vcenter.test:443"
    }
}

But we can also collect the details one specific NIC:

- name: Get details about one network interfaces
  vmware.vmware_rest.appliance_networking_interfaces_info:
    interface_name: nic0

Response:

{
    "changed": false,
    "id": "nic0",
    "value": {
        "ipv4": {
            "address": "192.168.123.8",
            "configurable": true,
            "default_gateway": "192.168.123.1",
            "mode": "DHCP",
            "prefix": 24
        },
        "mac": "52:54:00:c9:06:64",
        "name": "nic0",
        "status": "up"
    }
}

DNS configuration

The appliance_networking_dns_hostname_info module can be use to retrieve the hostname of the VCSA.

- name: Get the hostname configuration
  vmware.vmware_rest.appliance_networking_dns_hostname_info:

Response:

{
    "changed": false,
    "value": "vcenter.test"
}

Use the appliance_networking_dns_servers_info to get DNS servers currently in use:

- name: Get the DNS servers
  vmware.vmware_rest.appliance_networking_dns_servers_info:

Response:

{
    "changed": false,
    "value": {
        "mode": "dhcp",
        "servers": [
            "192.168.123.1"
        ]
    }
}

Conclusion and next steps

These new modules are helpful to quickly retrieve information from a running VCSA instance without relying on SSH. The outputs will fit well in a regular Ansible Playbook. Finally, you can also use them to adjust the configuration of the system (network, firewall, etc). An unsupported version of this Collection is also available on Ansible Galaxy.







Matrix & IRC | Size & Stability

Over the last few days since I posted my thoughts on Ansible & Matrix, I been getting some really good questions and comments - thank you all!

Two themes have come up that go well together for a data-laden blog post, and they are:

  1. How do we know there are users wanting to use Matrix?
  2. Isn't the bridge to IRC frequently broken?

These are solid questions - we want to know we're making the right choice, both for the future of the community, and for the people who wish to remain on IRC. But how can we go about answering them? Gathering data on these things isn't easy, but in this post I'm going to do my best - if you know of more / better data sources, I'd love to hear about it!

Size of user-base

OK, for this we have some decent data sources. Let's tackle IRC first. The best data source I know of is https://netsplit.de/networks/top10.php which gives us an idea of the change in users over time. Here's a sample:

Netsplit.de 2021

Sadly the graphs are premade images/posts, so we dont have the raw data, but roughly I'd say that in July 2021, those ten add up to ~180k users (Libera itself is at ~50-55k). But Netsplit has historical data:

Netsplit.de 2016

That's 2016, and here it adds up to ~250k with Freenode at 90k. Let's do one more:

Netsplit.de 2011

Going back another 5 years to 2011, and the numbers are even higher - 2021 is just a fraction of what 2011 was. I spent some time going through other years, and it's fairly easy to convince myself that IRC is declining. That's one data point, but what about the others?

Well, the Ansible community already vetoed the use of proprietary options (and rightly so, in my view). But for completeness, lets take a quick look. There's a lot of "corporations banging their own drum" in this space, but I did find this graph:

https://slack.com/blog/news/slack-has-10-million-daily-active-users

I'm struggling to get anything newer, but 10 million in 2019 is certainly higher now. Discord claims to be 150 million (https://discord.com/company). That's nice, but both require a separate server and login for each community, and anyway the community said no. Let's go back to FOSS...

I'm struggling to find any data at all on the likes of Rocket.Chat, Mattermost, etc, and Gitter has been merged into Matrix anyway. The fact that many of the FOSS solutions are self-hosted make it hard to get accurate data in any case.

What about Matrix? That also turns out to be a surprisingly tricky thing to answer, but for very different reasons. Because of it's distributed nature, you don't have a single source of data to query. This slide from Matt Broberg puts Element at 18 million (and Slack at 44 million, phew), and while I don't know where Matt sourced the numbers, they seem pretty high to me.

Matt Broberg

Maybe we can try to get our own value. There are (to my knowledge) two "traveller bots" in Matrix, who's job is to join any public room they see mentioned, to gather anonymous stats. One is "#@voyager:t2bot.io" which has been running for around 3 years, and has seen over 3 million unique Matrix IDs; the other is "@server_stats:nordgedanken.dev" which is much newer, but has already seen around 0.5 million IDs in it's life time.

Matrix accounts (and usage) gets weirder still, though. It is highly likely that many of these unique IDs seen by the traveller bots are in fact bridged users from other networks. Now, you might reasonably argue that this means the native users of Matrix are lower, and you'd be right. However, I don't think it matters, because what we really care about is "addressable IDs" - that is, who I can talk to. The power that Matrix has to build communities across networks has value, and we should allow for it in our stats.

Even if you disagree with me, though, let's do a pessimistic comparison. Start with 50k for Libera, and 500k (the Nordgedanken bot value) for Matrix, and then subtract Libera because of bridging (thus 450k), then Matrix is still 9x bigger. If you agree with me, and we pick a value between the two bots (say, 1 million for simplicity), then Matrix is 20x bigger than Libera, and 3x bigger than IRC as a whole.

Active users

As a footnote to this, I want to address one more thing. So far we've looked at any user ID in a room/channel, which obviously includes idle users. What about active users?

I went and got my logs for the last month (19th June -> 19th July) in two IRC channels, #ansible and #ansible-community (happily, I can use Matrix to do so :P). I'll also do that for the entirely unoffical Matrix room "ansible:matrix.org" (which by the way was created in 2016). Here's what we see:

Room Total users (today) Messages Active users Frequent users (> 10 lines) Msgs per active user
#ansible 686 5130 273 107 18.8
#community 137 3036 52 33 58.4
ansible:matrix.org 313 216 39 1 5.5

I take away few things here. Firstly, as we know from so many other places, we're very top-heavy - just a few folk are responsible for much of the chat.

Second, the Matrix room is interesting - it isn't as active, sure, but we don't promote that at all, anywhere. It's entirely organic, and yet it has a sizable number of members and a not trivial number of messages for a single month (that's still ~7 messages a day).

Finally though, think bigger. We have 42k members on Reddit, 59k followers on Twitter, the "ansible-project" mailing list has 13k subscribers. We have just a few hundred actually talking. That's ... concerning. It's almost a rounding error (273/59k == 0.005), you could argue we have 0 people discussing Ansible. If we truly want to make the community self-supporting, we have to change that.

Stability

On to the second half! Let's talk about stability...

One of Matrix's key strengths is bridges - the ability to chat with users on other platforms. That's a key part of why I'm proposing Matrix for the Ansible community, because it allows us to not hard-drop IRC. However, that relies on the stability of the Libera bridges. This is something I hear a lot about - that the bridge is unstable and it makes the experience for both IRC and Matrix a problem.

That's not unfair - the result of bridge issues is that the community is split. Messages cannot travel between the networks, and may well be lost forever for recipients on the other side. That's unfortunate, but I will now argue that this happens anyway on IRC - via netsplits.

From a user-experience view, netsplits, aren't really any different to a bridge problem. Either way, a chunk of the community (usually about 5-15%) is out-of-contact with the rest, and will not receive messages sent during that time. So, how common is that?

I asked my colleagues with long IRC logs to grep for netsplits, and here's what we got:

Disconnect stats

(I've added in a simple guess for total-affected-users here, picking 7% of the room membership (today) as the number of affected users and just multiplying it up. It's entirely made up, but illustrates what the impact of this many splits might be.)

So that's roughly 2 netsplits a week in #ansible. Also interesting is that the netsplits were getting more frequent over time - clearly there's a ceiling to that, one shouldn't forecast that fit too far, but it's a concerning trend nonetheless.

As for the IRC<->Matrix bridge, that's a lot harder. Element told me they don't log bridge issues, and the bridge is largely silent when there is a problem. However, one can look for times when all the matrix users drop from the IRC side - obviously this won't catch issues where the bridge is slow rather than dead, but it's something. And lo, the reverse is true, the bridge is getting more stable.

Conclusion

We've looked at 3 key things here - overall size, active users, and stability. While much of the data is vague, each part reinforces the story - taken as a whole, I think it's a fair statement that Matrix is already larger than IRC, while IRC is declining, and while bridge issues are rough, they're decreasing with time.

We must also remember that the bridge tries to recover (and Matrix itself is eventually-consistent), so messages frequently arrive later on, rather than not at all as with netsplits (although some bridge issues do cause drops, this is not a 100% guarantee).

My first post on Matrix laid out the "what" and "how" of my thoughts on Ansible - this post adds some strength to the "why". I still think it's the right choice for the future of our community - hopefully I'm convincing you too.