The following is a guest post from Kris Zentner, Senior Service Engineer at Microsoft AI and Research. Have a story on how you Sensu? Let us know in the comments!
Microsoft Research is the research arm of Microsoft. Founded in 1991, their goal is to advance state-of-the-art computing and solve difficult problems through technological innovation, collaborating with academic, government, and industry researchers.
In a recent tweet, I shared some of the open source tools and technology I use as an engineer at Microsoft. My team helps provide researchers in the Artificial Intelligence and Research Group shared resources to further their research goals. These tend to be both individual and clustered compute resources on both Linux and Windows platforms.
As a @Microsoft engineer, I use:#Windows 10 + WSL@code as IDE editing#Ruby for my @chef installation managing 1000's of #Linux nodes on @Ubuntu, most run @Docker— Kris Zentner (@ktzentner) June 6, 2018
Monitoring: @sensu @grafana#Python for misc scripting#Vivaldi is my primary browser
In this post, I’ll share our current Sensu setup, plus a look at what I’m excited about for Sensu 2.0.
When it came time for us to look for a monitoring solution, we needed something simple, lightweight, and multi-platform while being easy to extend if needed. We also needed something that could work with the cloud since much of our footprint is in Azure.
We currently monitor about a few thousand hosts both on-prem and in Azure running Linux and Windows. Most of the benefit comes from the on-prem nodes as we do a fair bit of hardware monitoring for these. We were previously using a monitoring solution from one of our hardware suppliers, so Sensu was much easier to manage. During my time in the industry, I’ve used many other prominent open and closed source products. In this case, Sensu was the best fit and required the least amount of babysitting for what we're trying to do.
We have our resources grouped into “clusters” using the “datacenters” feature of Sensu. Team members perform checks on the cluster they’re assigned to that day. If there is an ongoing issue they can silence the check, and there’s accountability since each team member has their own login.
For plugins, we use a mix of Nagios plugins and Sensu Ruby based plugins. I’ve been able to utilize some the older Nagios perl plugins to provide hardware monitoring. Since these are all open, I’ve been able to update them to modern versions as necessary.
For the future, I'm looking forward to migrating to Sensu 2.0 and making this a containerized solution, using Kubernetes via AKS. Currently it's deployed on Azure with three servers monitoring several types of deployments and Uchiwa, bringing all these together via the "datacenter" view. I’m looking forward to the simplification of the server components that 2.0 will bring.