Ansible is a simple but feature complete way to automate IT workflow on systems without installing any management agents on them.
Many people who have not tried Ansible will think an SSH-based system can't move as fast as Ansible actually does (leading to some occasional FUD until people try it - particularly with tuning settings). But we have spend a lot of time tuning the underlying implementation - leveraging things like ControlPersist and minimizing SSH operations to make sure Ansible is fast.
Especially important is that Ansible transfers and runs modules - using SSH as a transport, not a pure shell. It doesn't just run and parse shell commands, so the number of SSH operations it has to perform is well optimized and calls can be very efficient. Further, the push architecture can actually help avoid some of the "thundering herd" architectural problems in CPU-heavy pull-based systems (though ansible can also be run in a pull mode, more on this later).
As the size of deployments grow, it can be helpful to be aware of some tips to keep it operating at maximum efficiency. Many users use some of the tips below to use Ansible to manage fleets of tens of thousands of systems, addressing many thousands at one time.
Let's dig in to a few of the methods involved in how to automate deployments quickly. These are good tips to follow even if you have a moderately sized infrastructure, as they can pay dividends later as you expand.
Use Mirrors Effectively
Waiting on the network is where most of time is spent in update processes.
If you have hundreds or thousands of systems, or even dozens, it’s aggressive to have them all fetch web resources from internet sources every time you apply an update. This includes things like requesting remote packages, downloading tar balls, and so on.
It’s better to create a local mirror of packages you need, which can allow for quicker and more reliable deployments, and less surprises when upstream versions change. Tools like yum's reposync or apt-cacher-ng make this exceptionally easy.
Also consider using a small ansible play to download any tarballs you might need to the local control machine and then push them out to the remote nodes using the unarchive module, rather than having each machine in your fleet hitting someone’s web server farm. Also, the place you are pulling a tarball from might be a small web server, and it’s just nice to be friendly to the servers you are downloading from to avoid an unintentional DOS attack - which might run dozens of times a day if you are practicing Continous Deployment.
Optimize Your Package Installations
Ansible is smart and knows how to group yum and apt transactions to install multiple packages into a single transaction block, so it’s a huge optimization to install as many of your packages as you can in a with_items block.
Know Your Forks
Ansible works by spinning off forks of itself and talking to many remote systems independently.
The forks parameter controls how many hosts are configured by Ansible in parallel. By default, the forks parameter in Ansible is a very conservative 5. This means that only 5 hosts will be configured at the same time, and it's expected that every user will change this parameter to something more suitable for their environment. A good value might be 25 or even 100.
When using a large number of forks, be advised that any “local_action” steps can fork a python interpreter on your local machine, so you may wish to keep “local_action” or “delegated” steps limited in number or in separate plays.
If you wish to control the number of machines running in parallel for a specific play to a lower level than the global fork count, the serial keyword, primarily ended to control rolling updates, can be leveraged to temporarily constrain parallelism, for instance when talking to a web service with limited capacity.
Regardless of what you set forks to, Ansible is smart - for example, if you have 50 systems and set forks to 500, Ansible will only spin up 50 forks because it knows it doesn't need all 500.
One case where you may actually need less forks is if you are doing rolling updates (which ansible makes very easy), and thereby not talking to all of your systems at once. If you are using Ansible for rolling updates and have, say, 2000 systems, but have decided that you want to update only 100 machines at a time, set "serial" in Ansible to 100, and you'll only need 100 forks, too.
OpenSSH connection tips
If running from most operating systems, the default connection for Ansible is native OpenSSH. This supports “ControlPersist” which allows for keeping ansible connections open subject to a configurable timeout. This timeout is configurable in ansible.cfg. One possible gotcha with ControlPersist is that if your hostnames are very long, due to humorous kernel limitations, you may wish to change the file path ControlPersist sockets are saved to. Note that ControlPersist may consume about a megabyte of memory per connection to hold things open. You will probably want to adjust the control persist timeout, 30 minutes may, for example, be a good value.
If running with OpenSSH, the “pipelining” setting will further double the speed of operations by optimizing the way Ansible modules are transferred. This is not enabled by default because it can’t run absolutely everywhere (different tty policies with sudo, etc), but almost everywhere can, so you should definitely try it out.
UPDATE August 2018: Since this post was originally written the section below has become outdated. We're leaving it here, greyed out, for posterity. Information about Paramiko can be found in the docs pages.
Paramiko connection tips
If running from an Enterprise Linux 6 or earlier host, Ansible will detect that our OpenSSH is probably not new enough, and will use a pure-python SSH client called paramiko. When using paramiko, Ansible will need to reconnect to each host between actions. To eliminate this, use accelerated mode, and you’ll see playbook runs that may be 4-5x faster than before! As described in the docs, this works by having ansible set up a temporary daemon on the remote nodes, that will expire if it doesn't have any activity in 30 minutes.
You shouldn’t need ansible in pull mode if following most of the above steps, but the ansible-pull utility can be used, usually on a crontab, to monitor a git repo for changes and run ansible in local mode when any changes occur. Pull mode scales basically infinitely with bandwidth to your source control server, and can be pretty amazing, but you’ll lose out on centralized logging. If you want to achieve periodic check-in without giving up the central history, Ansible Tower has a nice provisioning callback feature that provides this.
Another approach that some people are interested in is the idea of Immutable systems - using Ansible to define an image build and then using native cloud technologies to deploy it. This can result in less time to roll out new software because you don’t have to wait on package installations, however, you will still need to be concerned with security updates. Tools like Packer, aminiator, and ansible’s ec2_ami module provide some options in this space.
For most infrastructures, most of the time you spend in waiting for an update to perform will be in dealing with downloading of artifacts from network resources. Creating local mirrors, as well as grouping package updates into single package manager transactions can go a long way to speeding up deployments.
If you want to further optimize turnaround time, options like accelerated mode (primarily useful for legacy systems) and pipelining will further tune ansible. Settings like “forks” also play a big role in affecting parallelism.
Using some of the performance tuning tips above will allow you to easily use ansible to manage thousands of machines, and even if you have less, get your updates rolled out faster. Just don't tell the boss so you can have more time for office chair swordfights! Happy Ansibling!