One of our favorite stories at Sensu is hearing how our customers are using, repurposing, and even replacing their Nagios setup. The Sensu monitoring event pipeline lets you run your existing Nagios plugins while also preparing you for what’s next; while Nagios has been tried and true for many, Sensu empowers businesses to modernize their infrastructure with a comprehensive, future-proof monitoring solution. So, if businesses want to replace Nagios, they can do so seamlessly (as opposed to a rip and replace). David Schroeder, Cloud Engineer at Viasat, told us his own story about migrating to Sensu (from Nagios) at the 2017 Sensu Summit. In this post, we’ll recap that story, sharing how they went about that migration and some of the benefits they’ve seen.
At Sensu Summit 2017, David shared his story about migrating to Sensu.
Viasat is a $3.7Bn global communications company — for more than 30 years, they’ve helped shape how consumers, businesses, governments, and militaries around the world communicate.
David Schroeder joined Viasat in May 2016, bringing over 10 years of practical Nagios and Icinga experience. As it turned out, the team he was joining was using a Nagios-based product, and his first major project was upgrading the team’s monitoring infrastructure — AKA, “Get rid of Nagios.”
David’s team was looking for a modern, flexible monitoring solution that the whole company could benefit from, while supporting the unique needs of each team. At last year’s Sensu Summit, David shared some details of this project on his team’s seamless upgrade from a Nagios-based solution to Sensu Enterprise.
Inspired after hearing Andy Sykes’s talk, “Please Stop Using Nagios (So it can die peacefully),” David set up a fresh Sensu cluster with an aim toward parity with their existing solution. Since Sensu supports the Nagios plugin specification, the service checks they had been using — including some custom 500-line Perl scripts — worked right out of the box.
David got the new Sensu Enterprise cluster configured quickly, and after pushing the agent out across his servers, the transition had begun. With a single cluster of nine VMs, David had built a fault-tolerant monitoring solution for one thousand servers across six teams.
Rolling Sensu out for every team at the company did bring a few challenges: David needed to figure out how to manage access control between teams and environments, deliver unique alerting profiles per team, and ensure that he wasn't the bottleneck going forward.
Managing enterprise access control with Sensu
While some infrastructure was shared across teams, many environments were managed by a single team. David used role-based access control (RBAC) in Sensu Enterprise to restrict access to authorized users according to a role, or job function based on LDAP groups. He then created group-specific API tokens each team could use to pull certain aggregate metrics into team dashboards.
Manage unique alerting profiles per team
While every host in their environment shipped with a default set of monitoring with sensible thresholds and alarms, each team had different approaches to alert routing and notification integrations.
Many teams relied on the PagerDuty integration for Sensu, while some opted to use xMatters. Using contact routing with Sensu Enterprise, David was able to configure unique routing profiles to map each team’s environments to different HipChat rooms.
Sharing control of a centralized monitoring cluster
This project for David was just that — a project. He needed to make sure he wasn’t going to be the bottleneck for future changes to this monitoring solution at the company. Since Sensu configuration is stored as JSON, he was able to manage the cluster setup in GitHub, alongside their Ansible configuration with their custom handlers, checks, and subscriptions. Furthermore, their teams can now deploy identical configurations across multiple disparate datacenters, providing more reliability and ensuring infrastructure-as-code best practices.
Ready to learn more about how Sensu + Nagios work together? Here are some resources to get you started.