refreshing a nagios configuration

I've got a nagios box that I setup years ago, and now that there are going to be some substantial platform changes I need to revisit the configuration. Unfortunately what I see there is not a sensible hierarchy of monitors and alerts, but a morass of services, hostgroups, servicegroups and other obscure nagios-language directives.

I haven't got a good reason to try and swap another some other network monitoring tool, so it looks like I am going to have to spend a few hours re-grokking the documentation for nagios (which has upped a version to 3.2.x since I last looked)

Here goes...

The Nagios documentation is the place to start for understanding what each of the objects represent and how to map them onto your environment.
I am going to start with the hosts, applications and services that need to be monitored and progress to the alerts and notifications in some future post.
 
In this example I am looking to create a monitoring configuration for a number of web applications that are hosted on Apache tomcat 6, with mysql backend and a shared haproxy load balancer front end.There are also a number of shared and dependent services such as SMTP, cloud storage and backup backup machines that I would like to monitor.

Here is a simplified list of my objects and services, from a quick brain dump;

Production & Test platform

URL:http://www.mywebapp.com/
URL:http://www.test.mywebapp.com/
httpd service: port 80
Production Application Server 1 : tomcat service: port 8080
Production Application Server 2 :  tomcat service: port 8080
Test Application Server 1 : tomcat service: port 8080

Master Database : mysql service: port 3306
Slave Database : mysql service: port 3306
Test Database : mysql service: port 3306

The test environment shares the load balancers, but has a distinct tomcat and database server.

Nagios configuration

Nagios provide the following objects to be mapped on to your service platform;

  • Host definitions
  • Host group definitions
  • Service definitions
  • Service group definitions
  • Contact definitions
  • Contact group definitions
  • Time period definitions
  • Command definitions
  • Service dependency definitions
  • Service escalation definitions
  • Host dependency definitions
  • Host escalation definitions
  • Extended host information definitions
  • Extended service information definitions



Initially I will only be paying attention to the functional objects, and work out the contacts and notifications later; so my list of interest is actually the following;


  • Host definitions
  • Host group definitions
  • Service definitions
  • Service group definitions
  • Command definitions
  • Service dependency definitions
  • Host dependency definitions
  • Extended host information definitions
  • Extended service information definitions


So working through each item in turn;

 1) Hosts

I think its pretty clear  what Nagios means by a "Host", as some sort of physical (or virtual) device on your network that has an IP address associated.

I am not 100% sure how a bonded interface, or some multi-homed interface device would be represented at this point, but hopefully it will become clear later.


2) Hostgroups

Hostgroups allow you to group hosts together for logical or display purposes, typically you can use it as a short-cut for binding a service "mysqld" to a group of Hosts (db1, db2), without explicitly setting the services on each host.

So I would have a hostgroup called "mysql-server" which contains all the hosts running an mysqld service, and indicates to nagios that these hosts should be checked for mysqld port availability.

3) Service

This seems to be the representation of an actual concrete check of a service (or host metric such as CPU) of some particular group of Hosts.

Presumably you could have a general "mysql-server" service which was associated with all database servers, and further services such as prod-mysqld which only contain the production mysql databases.

4) Service group


(What I am looking to do is represent the important logical structure of my environment in such a way that there is some way of grouping the "production web app" services and applications together, so that I can alert for Production and not for test. Service groups seem to be the top level grouping in Nagios, and hence the most likely place where I can represent my "production web application" object.)


So I think that I am going to define a top level "myapp-production" top level service group, and add to that the various "myapp-production-webapp" and "myapp-production-mysql"

....draft....TODO....

7) Host dependencies

These are useful when you are monitoring a network that contains devices which have a routing dependency. This can by physical, in that one host server is dependent on its local network router, or logical in that a virtual server guest instance is dependent on its XEN or Hypervisor host.

I guess that could be generalised to whenever the accessibility of one host "address" is dependent on another.

This could be anything from a vpn connection, a ADSL router, I guess think in terms of a traceroute etc.

However being a server monkey, my remit is mostly within servers which are typically peers in one or more vlans, and in this case are on the same segment as the nagios box.  So I a find limited use for Host dependencies.


....draft....TODO....









Ass bitten;

This one got me for a bit;
https://dev.icinga.org/issues/1502

It seems that you can't have empty hostgroups and assign services to them.