Check configuration upgrades with the Sensu Go sandbox

As a followup to my previous post, I’d like to walk you through using the Sensu Go sandbox to test upgrading different parts of the event pipeline from Sensu 1.x to Sensu Go. First up we have a Sensu 1.x check configuration using the check_ssh command from Nagios Plugins. This walkthrough can be used as a pattern for converting existing Sensu 1.x check configurations into Sensu Go.

simpsons typing

Set up the Sensu Go sandbox

Let’s grab the current Sensu sandbox repository:

git clone https://github.com/sensu/sandbox.git

Bring up the sandbox pre-provisioned with the check-upgrade lesson plan:

cd sandbox/sensu-go
git pull
ENABLE_SENSU_SANDBOX_PORT_FORWARDING=1 SANDBOX_LESSON=check-upgrade vagrant
up

The check-upgrade sandbox lesson plan installs the Sensu 1.x configuration files referenced here, to make it easier for you to try the commands as you read this post.

Enter the sandbox:

vagrant ssh

Let’s make sure the Nagios plugin is installed:

[sensu_go_sandbox]$ sudo yum install nagios-plugins-ssh

Start up the Sensu Go agent:

[sensu_go_sandbox]$ sudo systemctl restart sensu-agent

Check to make sure the sandbox is in the entity list:

[sensu_go_sandbox]$ sensuctl entity list 
ID Class OS Subscriptions
sensu-go-sandbox agent linux entity:sensu-go-sandbox

If this is a fresh sandbox, you shouldn’t have any Sensu Go pipeline resources defined — no handlers, checks, mutators, filters, etc. The Sensu Go agent is also pre-configured without any additional subscriptions beyond the default subscription.

Step 1: Translate Sensu 1.x configs

Copy over your Sensu 1.x configuration into the Sensu Go sandbox. I’ve included the full configuration I’ll be starting with in the example below. If you’re using the check-upgrade Sensu Sandbox lesson plan, you’ll find this config in the sandbox at /etc/sensu/conf.d:

[sensu_go_sandbox]$ tree /etc/sensu/conf.d/
/etc/sensu/conf.d/
├── api.json
├── checks
│ ├── check_filtered_ssh_server.json
│ ├── check_http_proxy_request.json
│ ├── check_ssh_server.json
│ └── cpu_percentage.json
├── client.json
├── filter
│ └── workday_filters.json
├── handlers
│ ├── filtered_logevent.json
│ ├── influxdb_tcp.json
│ └── logevent.json
├── influxdb.json
├── logevent.json
├── redis.json
└── transport.json

If this configuration looks familiar, this is the same configuration I used in the Sensu upgrade breakout session at Sensu Summit.

Right now, I’ll focus on just check_ssh_server.json:

[sensu_go_sandbox]$ cat /etc/sensu/conf.d/checks/check_ssh_server.json
{
"checks": {
"nagios_check_ssh_server_localhost": {
"command": "/usr/lib64/nagios/plugins/check_ssh -4 -r :::ssh.version|OpenSSH_7.4::: -P :::ssh.protocol|2.0::: localhost",
"type" : "metric",
"handlers": [ "logevent" ],
"interval": 10, "subscribers": ["localhost"],
"timeout": 15
}
}
}

The Sensu Go sandbox also comes pre-configured with some very useful packages that help transition Sensu 1.x check configurations to Sensu Go, including the Sensu translator utility — an aid to automate conversion from the Sensu 1.x configuration spec into Sensu Go.

Let’s run sensu-translator on this config and see what we get:

[sensu_go_sandbox]$ sensu-translator -d /etc/sensu/conf.d/ -o /tmp/sensu_config_translated
Sensu 1.x filter translation is not yet supported
...
DONE!

Just a note, I'm using '...' to indicate snipped output, and highlighting the important output in red, so I can focus your attention on the most relevant parts of the command output as you follow along in the sandbox. 

As expected, filters were not translated — complicated filter logic can’t be automatically translated — I’ll manually rebuild those filters in a later post. Let’s see what was translated:

[sensu_go_sandbox]$ tree /tmp/sensu_config_translated//tmp/sensu_config_translated/
├── checks
│ ├── check-http-proxy-request.json
│ ├── cpu_metrics.json
│ ├── nagios_check_ssh_server_localhost.json
│ ├── nagios_during_office_hours_ssh_server_localhost.json
│ └── nagios_outside_office_hours_ssh_server_localhost.json
├── extensions
├── filters
├── handlers
│ ├── influx-tcp.json
│ ├── logevent.json
│ ├── silence_during_office_hours_logevent.json
│ └── silence_outside_office_hours_logevent.json
└── mutators

Please note: the example Sensu 1.x configuration defines multiple resources in a single file and the translator creates a separate file for each translated Sensu Go resource.

The translator gets us most of the way for checks and handlers, giving us configuration files that can be uploaded into Sensu Go using the sensuctl create -f command. The translator doesn’t yet attempt to translate check token substitution nor extended attributes — we’ll need to visually inspect and make some adjustments for correct operation.

The translated file we’re interested in now is the nagios_check_ssh_server_localhost.json check:

[sensu_go_sandbox]$ cd /tmp/sensu_config_translated/checks
[sensu_go_sandbox]$ cat nagios_check_ssh_server_localhost.json
{
"api_version":"core/v2",
"type":"Check",
"metadata":{
"namespace":"default",
"name":"nagios_check_ssh_server_localhost",
"labels":{},
"annotations":{
"sensu.io.json_attributes":"{\"type\":\"metric\"}"
}
},
"spec":{
"command":"/usr/lib64/nagios/plugins/check_ssh -4 -r :::ssh.version|OpenSSH_7.4::: -P :::ssh.protocol|1.0::: localhost",
"subscriptions":[
"localhost" ],
"publish":true,
"interval":10,
"handlers":[
"logevent"
]
“timeout”:15
}
}

This translated configuration has a couple of things we need to manually adjust for correct operation — extended attributes and token substitution syntax. I've highlighted these items in the output above.

Step 2: Adjust check spec attributes

The translator stores all check extended attributes in the check metadata annotation named sensu.io.json_attributes. In this check, the type attribute is no longer part of the Sensu Go check spec, so we’ll need to adjust it by hand. The original check was configured as type: metric which told Sensu 1.x to always handle the check regardless of the check status output. This allowed Sensu 1.x to process output metrics via a handler even when the check status was not in an alerting state. Sensu Go treats output metrics as first-class objects, allowing you to process check status as well as output metrics via different event pipelines. Let’s edit the nagios_check_ssh_server_localhost.json and update the spec attributes manually.

Here’s the Sensu Go check config snippet with the manually updated metrics configuration:

[sensu_go_sandbox]$ cat nagios_check_ssh_server_localhost.json
...
"spec":{
...
"handlers":[
"logevent"
],
"output_metric_handlers":[
"influxdb"
],
"output_metric_format": "nagios_perfdata",
"timeout":15
...
}

The Sensu Go agent will ingest the plaintext metrics included in the output of the check command using the Nagios perfdata format. The resulting metrics will be handled by a handler named influxdb.

Step 3: Adjust check token substitution syntax

The check command still uses the Sensu 1.x check substitution syntax, making reference to Sensu 1.x nested client attributes ssh.server and ssh.protocol as part of the client JSON config. The Sensu Go agent handles extended attributes differently, allowing you to define a flat set of key value pairs referred to as labels. The Sensu Go check config nagios_check_ssh_server_localhost.json needs to be edited to update the token substitution syntax.

Here's the translated check command for reference:

[sensu_go_sandbox]$ cat nagios_check_ssh_server_localhost.json
...
"spec":{
"command":"/usr/lib64/nagios/plugins/check_ssh -4 -r :::ssh.version|OpenSSH_7.4::: -P :::ssh.protocol|1.0::: localhost",
...

Here’s what the check command looks like after editing.

[sensu_go_sandbox]$ cat nagios_check_ssh_server_localhost.json
...
"spec":{
"command":"/usr/lib64/nagios/plugins/check_ssh -4 -r {{.labels.ssh_version | default \"OpenSSH_7.4\" }} -P {{.labels.ssh_protocol | default \"1.0\" }} localhost",
...

One note: the template engine used in Sensu Go token substitution treats ‘.’ as a special character. To ease the transition, I’ve changed the ‘.’ to underscores while converting the extended attributes to label keys. I’ve also had to escape quote the default fallback strings that are used when the agent hasn’t defined the labels.

Step 4: Upload the check into Sensu Go

This check should now be ready to upload into sensuctl.

[sensu_go_sandbox]$ sensuctl create -f nagios_check_ssh_server_localhost.json

And should show up in the list of defined checks:

[sensu_go_sandbox]$ sensuctl check list
NAME COMMAND
Nagios_check_ssh_server_localhost /.../bin/plugins/check_ssh ...

The check hasn’t been scheduled yet as the Sensu Go agent is not yet subscribed to the localhost subscription.

Step 5: Adjust the Sensu Go agent configuration

Edit /etc/sensu/agent.yml to include the localhost subscription:

### agent configuration##
...
subscriptions:
- "localhost"
...

And restart the agent:

[sensu_go_sandbox]$ sudo systemctl restart sensu-agent

Now the check should be firing at 10-second intervals:

[sensu_go_sandbox]$ sensuctl event info sensu-go-sandbox nagios_check_ssh_server_localhost

=== sensu-go-sandbox - nagios_check_ssh_server_localhost
Entity: sensu-go-sandbox
Check: nagios_check_ssh_server_localhost
Output: SSH CRITICAL - OpenSSH_7.4 (protocol 2.0) protocol version mismatch, expected '1.0'
Status: 2
History: 2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2
Silenced: false
...

But there’s a problem: the SSH protocol 2.0 doesn’t match the expected protocol string 1.0. This can be addressed by setting the ssh_protocol label in the agent.yml to the correct expected protocol string.

Edit /etc/sensu/agent.yml to include labels:

##
# agent configuration
##
...
subscriptions:
- "localhost"
labels:
ssh_protocol: "2.0"
ssh_version: "OpenSSH_7.4"
...

Restart the sensu-agent service again and wait for the check to run. Now the check should be returning a check status of 0.

[sensu_go_sandbox]$ sensuctl event info sensu-go-sandbox nagios_check_ssh_server_localhost
=== sensu-go-sandbox - nagios_check_ssh_server_localhost
Entity: sensu-go-sandbox
Check: nagios_check_ssh_server_localhost
Output: SSH OK - OpenSSH_7.4 (protocol 2.0) | time=0.024728s;;;0.000000;10.000000
Status: 0
History: 2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,0,0
Silenced: false
...

Notice also that the output includes Nagios perfdata. Since we configured the check’s output_metric_format, the Sensu Go agent converts the Nagios perfdata into Sensu’s internal metrics representation and sends it along as part of the Sensu event:

[sensu_go_sandbox]$ sensuctl event info sensu-go-sandbox nagios_check_ssh_server_localhost --format json
...
"metrics": {
"handlers": [
"influxdb"
],
"points": [
{
"name": "time",
"value": 0.024728,
"timestamp": 1545270871,
"tags": []
}
]
},
...

It’s a little out of scope for what I wanted to cover in this post, but if you want to set up the InfluxDB handler, you can read the metrics guide, or check out the Sensu Go sandbox introduction lesson.

More fun with the sandbox

This walk-through covers the typical considerations as you upgrade existing Sensu 1.x check definitions. If you want to try something more complicated, take a look at the Sensu 1.x check_http_proxy_request.json provided as part of the Sensu Go check-upgrade sandbox lesson plan and see if you can get the proxy request updated using the Sensu Go reference documentation.

Ready to try your own checks?

Take a crack at it yourself, and give us a report in the Sensu Go section of our Community forum. We’d love to hear about both your successes and what you’ve gotten stuck on — start a new thread and tag it with 1.x-migration.

Sensu Go Nagios Monitoring