Wednesday, 12 February 2014

Synology Diskstation Performance Tuning (3)

Synology Diskstation Performance Tuning (2)
Synology Diskstation Performance Tuning (1)

Performance Notifications (Continued)

Previously, in part 2, I sketched out an algorithm for reliably determining when the diskstation is overloaded. It looks like it should work but it needs to be implemented and tested.

Implementation

The implementation will use the same macro technique used by the stock nagios plugin, check_cluster, but will be implemented as a shell script so that the table rules, shown in part 2 and repeated here, can be implemented in the script.

Metric Status
Load Average OK HIGH HIGH OK HIGH OK OK HIGH
CPU Usage OK OK HIGH HIGH OK OK HIGH HIGH
Swap Activity OK OK OK HIGH HIGH HIGH OK HIGH
RESULT OK OK OK BAD BAD BAD OK BAD

The macro technique check_cluster uses is pretty clever because it's simple to use and very efficient. It relies on nagios getting all the required service status values directly from its in-memory data structures and passing them to the plugin as an argument. This means that the plugin can be simple and will exit quickly, using few resources.

To write the nagios plugin I'll start by specifying the interface, how it will be run:
check_cluster_table (-S | -H) \
    -t col1[,col2,..,colN] \
    -c NUM \
    -d col1[,col2,..,colN]
The options are similar, but not the same as check_cluster's options. This is what the options will mean:

-S for checking services
-H for checking hosts
-t for each 'BAD' column in the table, specified multiple times for each column
-c for specifying how many statuses in critical state will cause this plugin to show critical
-d for service statuses that nagios gets

Using the interface definition above, the command to implement the table, including the macro, would be:
check_cluster_table -S \
    -t 0,1,1 \
    -t 1,0,1 \
    -t 0,0,1 \
    -t 1,1,1 \
    -c 2 \
    -d "$SERVICESTATEID:diskstation:Load$,
        $SERVICESTATEID:diskstation:CPU$,
        $SERVICESTATEID:diskstation:Swap Activity$"
The '-d' option contains the nagios macros. These can only be tested from within nagios in the service check and will expand to three numbers separated by commas, for example "1,0,0", so it should be fairly easy to see how the '-t' options relate to the '-d' option. Refer to the On-demand Macros section in the nagios documentation for a good description on how to use them.

Next, I wrote the script and it passed all the table tests, so now it's time to copy the script over to synagios and set up the nagios configuration.

Trying it Out

  1. The nagios plugin script needs to be copied to the SyNagios plugin area, and is available for download from nagios exchange.
  2. Using the synagios web interface, nagrestconf:
    1. Add the command in the Commands tab.
    2. Add the check to the diskstation service set in the Service Sets tab.
    3. Re-apply the service set in the Hosts tab.
    4. Apply and Restart.
  3. View the new check in the nagios3 web interface and disable notifications for the clustered service checks.
  4. Adjust thresholds for the clustered checks.
  5. Test.
Copying the plugin.

Until the System Tools plugin is written, the plugin, check_cluster_table, will need to be copied using scp. In DSM 4.3 enable the SSH service in the DSM Control Panel then copy using an scp client like filezilla or cygwin. I use cygwin on Windows so for me the command is:

scp check_cluster_table root@diskstationarm:/volume1/@appstore/Synagios/nagios-chroot/usr/local/nagios/plugins/

Then make sure it's executable:

ssh root@diskstationarm chmod 755 /volume1/@appstore/Synagios/nagios-chroot/usr/local/nagios/plugins/check_cluster_table

Add the command, service-set service, and re-apply the service set to the host.

The following video shows the process involved and the text to paste in is shown just below the video. For Firefox users the video can be paused and resumed using the Toggle animated GIFs plugin.

Screenshot video


Command, 'check_svc_cluster_table':

$USER5$/check_cluster_table -S -c $ARG1$ -d "$ARG2$" $ARG3$

Service set service, 'System Performance':

check_svc_cluster_table!2!$SERVICESTATEID:diskstation:Load$, $SERVICESTATEID:diskstation:CPU$, $SERVICESTATEID:diskstation:Swap Activity$!-t 1,1,1 -t 0,1,1 -t 1,0,1 -t 0,0,1

Stop notifications from the clustered service checks.

Now that the System Performance plugin is checking the values from the service checks, load, cpu and swap activity, notifications for those checks should be disabled. Simply disable the notifications in the Nagios Web interface.

Adjust thresholds for the clustered checks. 

Ensure that the clustered checks each have thresholds set. If they don't then check_cluster_table won't work correctly. In my setup the cpu check had no thresholds set so I appended the options -w 75 -c 90 to the cpu check, so the command ended up as:
check_cpu.sh!-i 4 -w 75 -c 90
Test by making load.

I used many different packages and tools to create load in different ways and email alerts were only sent when there was a real problem. Success!

So, that's it for accurate performance notifications. I expect to do a bit more tweaking of thresholds and it's looking good so far, but there's still Reliable Media Streaming to get back to, which I'll talk about in part 4.

No comments:

Post a Comment