Infrastructure Automation and Configuration Management of Capital One's AWS Cybersecurity Data Lake

Videos  /  AnsibleFest San Francisco 2017  /  Capital One

At Capital One, cybersecurity event data accounts for roughly six terabytes of data per day. To effectively mitigate the growing threat of cyberattacks, security operations center analysts require low-latency access to this data in real time. To address this need, we created a large-scale distributed data processing platform in Amazon Web Services (AWS). Managing such a large platform introduces many challenges, including large amounts of toil, slow deployment velocity, and an increased risk of incidents. In this talk, I discuss how Ansible allowed us to effectively address these issues. I describe how we leveraged Ansible to improve deployment velocity, production reliability, and incident management while also meeting regulatory compliance. Additionally, I detail the best practices we’ve developed to maintain high team velocity even as our cluster grows in complexity and scale. Finally, I conclude by discussing areas for improvement and future work.

In this session, you will learn:

  • How Ansible helped solve our challenges of managing a Big Data platform (300+ EC2 instances in AWS) that stores hundreds of TBs
  • Our Ansible best practices that enable high team velocity
  • How using Ansible made it easier for us to meet compliance




Mihai Sirbu, Data Engineer, Capital One