Thursday, 13 February 2014

Nagrestconf Service Sets

Nagrestconf uses service sets to group services but what are they for and why bother with them?

Rationale

When first learning to configure Nagios a number of 'Object Tricks' are suggested. These are really great time saving tips when configuring nagios by hand, editing the text files directly, but as more tricks are learned and used it becomes more difficult to understand the configuration, and more difficult to change it with confidence.

After spending time and effort learning all the tricks it feels natural to organise hosts into hostgroups and then to say, 'which ever host belongs to host group X gets the services assigned to that host group'. Naturally this then extends to having hostgroups named by a role and assigning hosts to many hostgroups. This is great in theory, but in practice it causes many problems and reduces the granularity of changes that an administrator is willing to make since the configuration contains so much redirection, ifs and buts, it looks more like an sql database than, what should be, a simple nagios configuration.

Simply put, using host groups to configure services can get messy, difficult to manage, and sometimes it is even dangerous. As the configuration grows it becomes increasingly more difficult to make localised changes and mistakes creep in - especially in the form of changes that affect more servers than was expected. Often the answer to this problem is to restrict changes - only allowing changes to a group of servers, which reduces the configurability of the monitoring system.

Additionally, the hostgroups view, in the nagios web interface, is not a logical grouping that would be useful to those supporting the servers, but instead it's a grouping only to facilitate the configuration. Using host groups this way means that the hostgroups view will contain many duplicate entries, especially as the configuration becomes complex, which affects being able to visually determine the state of the network. With a quick scroll down the host groups list for example, a single problem will be shown many times making things look worse than they are, and, of course, this happens just as someone important is being shown the monitoring system.

Service sets solve part of the overall configuration problem and makes many automation tasks straightforward. It's about trying to make the configuration work the way we think, and not the other way around.

I will describe service sets in the next section and, even though it's a small section, you will finish knowing everything there is to know about them.

Service Sets Definition

In nagios, a service is the definition of a monitoring check that nagios will read and execute at regular intervals. This service check can be assigned to a host and will be shown connected to that host in the nagios web interface.

A service set is a named collection of services, and once defined this service set can be assigned to a host when the host is created. Service sets are a nagrestconf feature and are not available in nagios.

More than one service set can be assigned to a host, in which case the services contained in each listed service set will be added to the host. If there are duplicate services in the service sets assigned to a host then the rightmost service set containing the duplicate service will be used.

Benefits

Using service sets allows you to make nagios do exactly what you want and no more.

Using the Bulk Tools plugin allows service sets to be applied to many hosts at once, and for automation it allows a host to be added with only one REST request.

Service sets can be modified and re-applied to one or many hosts using the host edit dialog or by using the bulk tools plugin, and since the list of servers to re-apply the service sets to is chosen by the administrator, there is a much lower risk of making changes that weren't expected. 

Thinking about service sets is natural and simple, and allows host groups to be used for grouping hosts into logical groups with no duplicate entries.

In distributed environments with a central console, or 'single pane of glass', service sets can be copied to slave hosts without any modifications by using the Backup and Restore plugin, and without having to change any names. Nagrestconf deals with name collisions by name mangling so identical configurations can be used throughout an organisation.

The Bulk Tools plugin also allows hosts to be created using a 'csv' file, which can be created using a spreadsheet program. The format of the csv file is short and simple since it expects the service set to be specified rather than all of the individual checks. Service sets can be named by the role they fulfill so creating the csv file for an environment should feel quite natural. As an example, an environment, called 'aa', might contain 2 database servers, 2 apache web servers, and 2 application servers. The csv format is:

Hostname, IP Address, Host Template, Hostgroup, Service Sets

So the csv file might look like:

aa-db1,10.0.0.1,host-tmpl-linux,aa-env,base-lin aa-db
aa-db2,10.0.0.2,host-tmpl-linux,aa-env,base-lin aa-db
aa-web1,10.0.0.3,host-tmpl-linux,aa-env,base-lin aa-apache
aa-web2,10.0.0.4,host-tmpl-linux,aa-env,base-lin aa-apache
aa-app1,10.0.0.5,host-tmpl-linux,aa-env,base-lin aa-app
aa-app2,10.0.0.6,host-tmpl-linux,aa-env,base-lin aa-app

After a successful import all the 'aa' hosts will belong to the 'aa-env' hostgroup, which might be displayed as 'AA Environment' in the nagios web interface. Before the file is imported into nagrestconf the host template, host groups and service sets should already have been defined. If not, then the hosts will not be added since nagrestconf will spot that the configuration is wrong.

After adding the hosts it is still possible to edit values for any host or service. For example, if the host 'aa-app2' always runs with higher load then the load threshold for that server can be changed, but it would be better to create another service set with this higher load threshold and apply that service set to the host, which is easy to do since service sets can be duplicated (cloned) using the web interface.

The Future

I think service sets are pretty useful and would be excellent for sharing. Imagine an on-line repository of service sets which could be downloaded and used, along with the required plugins. This could save lots of time and headache for many monitoring tasks.

It would be great if when I'm asked to monitor something, say an IIS Web server, I could go to the repository, look for IIS, and find a bunch of ready-made service sets that can be downloaded and modified for the local environment. I want this feature - and it's a planned addition for the future!


No comments:

Post a Comment