ChatOps: Hubot Grafana Images in HipChat
30 Sep 2015
As we continue toward ChatOps and making our work visible at work, the next phase of maturing our monitoring systems is to create a query-able interface to our visualization system (Grafana) from HipChat. Grafana is a system I’ve become quite fond of and helped author the Chef cookbook for. The HTTP API for Grafana has matured, and the time seemed right to create this integration.
High-level Design
Having read about Librato’s ChatOps and seen Etsy’s nagios-herald, I had a rough idea of the user experience I wanted. With a head full of hindsight bias, here are some of the requirements:
- A user-friendly query interface in chat (no magic numbers, server-specific names, etc.)
- Images posted should be available in chat without additional authentication
- Able to utilize our existing Grafana server
I was delighted to find that Stephen Yeargin had already written hubot-grafana, a script that did all the heavy lifting for the first requirement. The Grafana docs site also has a how to integrate Hubot with Grafana article. Stephen’s hubot script provides for discovery of dashboards, per-panel queries, template variables, and time-range queries. It’s really quite fantastic. However, it assumes that S3 will be used to host the images. While that’ll work for most folks (and certainly could work for us), I wanted to be able to use our existing Grafana server to house this integration. To achieve this I had to modify grafana.coffee. More on that below.
The default configuration provided by the chef-grafana cookbook includes Nginx as a proxy for grafana-server. For work we wrap the community cookbook to configure TLS, LDAP, and Grafana’s datasources. It seemed like a natural extension of visualization’s responsibility to have a small app on the Grafana node that can fetch/save rendered panel images and then use Nginx to serve those images. I called that small application grafana-images. More on that below as well.
Modifications to hubot-grafana
As mentioned above, I had to modify the hubot-grafana script to provide an alternate image persistence method (alternative to S3). The coffeescript additions are relatively straightforward:
The Grafana API key is provided to the script by an environment variable and the newly added environment variable HUBOT_USE_GRAFANA_IMAGES
determines whether or not to use the customFetchAndUpload
code-path. The full diff can be found here.
As you can see, the /grafana-images
uri is hard-coded. That’s because the route used by grafana-images
is hard-coded. Also, note that the necessary authentication token is passed along with the json payload. In many ways this function is treating grafana-images
as a proxy for Grafana.
Another addition to note is the help text I added to the hubot script. You can ask the bot “graf help
” and it’ll respond with increasingly complex query samples. Yay for user friendliness!
grafana-images
Following my experience with http-stats-collector, Golang seemed like a good choice for the small application. It acts as a proxy and therefore expects only two things: a valid API token and json payload containing the full Grafana panel render url. To give more context to what’s happening, here’s an http call diagram:
The customFetchAndUpload
function described above is call #4. From there grafana-images
will fetch (#5 & #6), save, and return a sharable image url via json (#7). Here’s a snippet from grafana-images
’ handlers.go:
There are several variables assumed to be set:
image
- the requestedimageUrl
from the jsontoken
- the contents of theAuthorization
headerimagePath
- a path on disk to store the saved imagesimageHost
- the host used in the building the json response
If everything is configured correctly, the Grafana dashboard panel will be saved to disk and the json sent back to hubot-grafana
. Further detail can be found on GitHub. I tired to make all the error messages helpful and actionable, but if you find an error condition that isn’t well explained, please open a GitHub issue.
Security
You may have noticed that the app very simply downloads whatever is specified at imageUrl
and saves it as a png. This can be dangerous given that nothing checks to ensure that the contents are in-fact an image and not an exploit. Take care to only allow specific traffic to make requests of grafana-images
. I may add a check via Golang’s png package to ensure proper encoding, but it may be quite some time before that happens (pull requests welcome).
Nginx Config
As mentioned above, I used Nginx to proxy grafana-server. I also use it to proxy grafana-images
and serve the saved panel images. Here’s a sample conf that should would for this purpose:
Note that the imageHost
passed to grafana-images
is the FQDN plus the location of the saved images. The value used will be dependent on the web server hosting the saved images.
Other Uses
Because grafana-images
exposes its functionality over a simple HTTP API, expanding its purpose should be straightforward. The app expects an "Authorization: Bearer grafana-token-goes-here"
header and a json payload:
Sensu Notifications
At work we have incorporated Grafana panel image embedding functionality into our Sensu HipChat handler. We started with the Sensu community HipChat handler and modified the message body heavily for our purposes.
The code to add Grafana panel images to Sensu HipChat notifications is roughly:
The @event['check']['graph_image']
value is assumed to be a valid dashboard panel render url without the from/to times: https://grafana.example.com/render/dashboard-solo/db/sample-dashboard/?panelId=5&var-server=test-server&width=1000&height=500
. The panelId
is can be obtained from the UI of the dashboard.
We manage our infrastructure with Chef and it creates all the Sensu checks, thus allowing us to programmatically build the checks. We add a graph_image
attribute to the check that contains a panel render url associated with the metric(s) that can help provide context to the Sensu notification. Chef can give the FQDN of the Grafana node as well as the values for template attributes, so it all comes together quite cleanly.
Other Considerations
One thing not handled by grafana-images
is saved image retention. You’ll need to create a purge policy that works for you. Once I’ve figured out how we’re going to handle that, I’ll add it here. :)