Self-monitoring Chef using Sensu

Chef is one of the most popular tools for DevOps awesomeness, and you won’t believe what’s one of the biggest causes of chef-server unavailability: disk full

Yep, you read that right. In 2017, there’s still a LOT of people who don’t have all their systems being carefully monitored by a trustworthy application. That means that if that logrotate is not really rotating all your logs, or if postgresql is misbehaving and generating a lot of data, sooner or later your system will crash badly. Recovering a database after it crashes for lack of disk space is not fun, and you can’t be 100% sure you won’t have any data loss.

My grandpa used to say “it’s better safe than sorry”, and this should be every SysOps/CloudOps/SRE/DevOps Engineer mantra.

I’ll show below how to easily set up Sensu to monitor the disk space and the overall status of your Chef Server.
You can have Sensu installed on the same server as chef-server, but it’s highly discouraged. Come on, CPU power and memory are extremely cheap these days and it’s not worth the risk to be cheap on this area. A small AWS instance will cost about the same as a couple frappuccinos from Starbucks.

If you don’t have Sensu installed, go ahead and read this Five Minute Install document. I won’t cover Sensu’s installation here, but basically you’ll have to add the EPEL repository (if using RHEL/CentOS, sudo yum install epel-release -y), add the sensu repository to yum and install Redis, sensu and uchiwa (optional). There is something though, that the documentation doesn’t make clear. You’ll have to edit /etc/redis.conf and comment out the line that starts with bind, and change the line protected-mode yes to protected-mode no in order for your clients to be able to communicate with the sensu server. Don’t forget to start the services and set them to start on boot time.

Once you set it up, you’ll be able to check the dashboard on http://your-ip:3000. You should have the sensu-client started up so you can see it listed on the dashboard.

Set up the checks on the Sensu server

Sensu has a unique way to setup its checks and that’s what makes it so versatile. For this example, you’ll need two checks and one handler, that will send you an email in case of a failure.

Bellow are the two checks that you’ll need on your sensu server

  • I saved this one as /etc/sensu/conf.d/checks/check-chef-server.rb. Notice that the command has to be run with sudo, because it actually uses chef-server-ctlfor the checks.
{
  "checks": {
    "chef-server": {
      "handlers": ["default", "email"],
      "command": "sudo -u root /opt/sensu/embedded/bin/ruby /etc/sensu/plugins/check-chef-server.rb",
      "interval": 60,
      "subscribers": ["chef-server"]
    }
  }
}
  • And this one as /etc/sensu/conf.d/checks/disk.json:
{
  "checks": {
    "check_disk": {
      "handlers": ["default", "email"],
      "command": "/etc/sensu/plugins/check-disk-usage.rb -w 93 -c 96 -W 93 -K 96",
      "interval": 60,
      "subscribers": ["all"]
    }
  }
}
  • For the handlers, you can save the contents bellow on /etc/sensu/conf.d/handlers/email.json
{
  "handlers": {
    "email": {
      "type": "pipe",
      "command": "mail -s 'sensu alert' thiago@vinhas.org"
    }
  }
}

Now let’s go to the next step…

Set up sensu-client on your Chef Server

Here, all you have to do is add the Sensu repository by running the command bellow and then run yum install sensu -y. If you get a 404 error, you’re probably using Amazon Linux, so replace $releasever to 6 and try again.

$ echo '[sensu]
name=sensu
baseurl=https://sensu.global.ssl.fastly.net/yum/$releasever/$basearch/
gpgcheck=0
enabled=1' | sudo tee /etc/yum.repos.d/sensu.repo

You’ll now need a /etc/sensu/config.json with the following content:

{
  "redis": {
      "host": "sensu-server.myhost.org",
      "port": 6379
  },
  "transport": {
      "name": "redis",
       "reconnect_on_error": true
  }
}

If you start your Sensu client now, you’ll see in a few seconds that your chef server will be added to the dashboard. The only check available there though is be keepalive, but we really want to be able to see a problem before the whole server blows up, right?

I’ll show you how to check for disk usage, and for the results of chef-server-ctl status, so you get alerted by email in case any of these fail. Sensu can also be integrated with Pagerduty, Opsgenie, ServiceNow, JIRA, Slack, Hipchat and many others.

So what you have to do on your chef-server is add the Sensu repo as showed above, run yum install sensu -y and add the two plugins bellow on your /etc/sensu/plugins:

$ cd /etc/sensu/plugins
$ wget https://raw.githubusercontent.com/sensu-plugins/sensu-plugins-disk-checks/master/bin/check-disk-usage.rb
$ wget https://raw.githubusercontent.com/sensu-plugins/sensu-plugins-chef/master/bin/check-chef-server.rb

The check-disk-usage has a ruby gem dependency, and since we don’t want to mess with Chef’s ruby gems (and even on non-chef systems you want to be a client, it’s prudent to use Sensu’s embedded ruby), we’re going to run:

$ /opt/sensu/embedded/bin/gem install sys-filesystem

You can test the plugin now by running /opt/sensu/embedded/bin/ruby check-disk-usage.rb (include the full path if you’re not on the plugins/ directory)

  • Add the /etc/sensu/config.json file so your sensu client knows how to report to its master, just like you did above.

Now start all the services and set them to start on boot, and you’re good to go.

  • On your Sensu server:
$ service sensu-server start
$ service sensu-api start
$ service sensu-client start
$ service uchiwa start
$ systemctl enable sensu-server sensu-api sensu-client
  • On your Chef server:
$ service sensu-client start
$ systemctl enable sensu-cliente

You should now install a dashboard like Uchiwa to see your monitored servers/services and its alerts.

You’re just a yum -y install uchiwa ; service uchiwa away. Now just access http://your-server:3000 to check out Uchiwa’s dashboard.

Thiago Vinhas

Former Chef Employee