This profile is designed for use in enterprise deployments, where it is necessary to track the health and performance of multiple GroundWork servers.
The services included are mostly copies of the local-groundwork-server profile, but set to function over ssh. Note that the remote GroundWork servers monitored in this way must be set up with ssh keys in the same way as other target servers monitored via ssh. In addition, the Nagios home directory must contain a copy of the local plugins, or a link to the Nagios plugins directory on the remote GroundWork server /usr/local/groundwork/nagios/libexec, accessible as the relative path ./libexec from the nagios user's login point, usually /usr/local/groundwork/users/nagios, or /home/nagios.
- SERVICE - Definitions in Monarch are stored under this name.
- COMMAND LINE - Service command name with arguments to be passed to the plugin.
- PLUGIN COMMAND LINE - Plugin script called by Nagios for this Service.
- EXTENDED INFO - The Extended Service Info definition, typically used for generating graphs.
Command lines displayed below are intended to be single line commands. Service Command Line/Description Plugin Command Line ssh_cpu_httpd check_by_ssh_cpu_proc!20!30!httpd $USER1$/check_by_ssh -H $HOSTADDRESS$ -t 60 -I "USER17$" -C "$USER22$/check_procl.sh --cpu -w $ARG1$ -c $ARG2$ -p $ARG3$" ssh_cpu_java check_by_ssh_cpu_proc!20!30!java $USER1$/check_by_ssh -H $HOSTADDRESS$ -t 60 -I "USER17$" -C "$USER22$/check_procl.sh --cpu -w $ARG1$ -c $ARG2$ -p $ARG3$" ssh_cpu_mysql check_by_ssh_cpu_proc!20!30!mysql $USER1$/check_by_ssh -H $HOSTADDRESS$ -t 60 -I "USER17$" -C "$USER22$/check_procl.sh --cpu -w $ARG1$ -c $ARG2$ -p $ARG3$" ssh_cpu_nagios check_by_ssh_cpu_proc!20!30!nagios ssh_cpu_perl check_by_ssh_cpu_proc!20!30!perl ssh_cpu_proc check_by_ssh_cpu_proc!<warn>!<crit>!<procname> ssh_cpu_snmptrapd check_by_ssh_cpu_proc!20!30!snmptrapd ssh_cpu_snmptt check_by_ssh_cpu_proc!20!30!snmptt ssh_cpu_syslog-ng check_by_ssh_cpu_proc!20!30!syslog-ng ssh_disk_home check_by_ssh_disk!400!200!/home ssh_disk_root check_by_ssh_disk!400!200!/ ssh_disk_var check_by_ssh_disk!400!200!/var ssh_load check_by_ssh_load!5,4,3!10,8,6 ssh_mem_httpd check_by_ssh_mem_proc!20!30!httpd ssh_mem_java check_by_ssh_mem_proc!40!50!java ssh_mem_mysql check_by_ssh_mem_proc!20!30!mysql ssh_mem_nagios check_by_ssh_mem_proc!20!30!nagios ssh_mem_perl check_by_ssh_mem_proc!20!30!perl ssh_mem_proc check_by_ssh_mem_proc!<warn>!<crit>!<procname> ssh_mem_snmptrapd check_by_ssh_mem_proc!20!30!snmptrapd ssh_mem_snmptt check_by_ssh_mem_proc!20!30!snmptt ssh_mem_syslog-ng check_by_ssh_mem_proc!20!30!syslog-ng ssh_memory check_by_ssh_mem!80!90 ssh_nagios_latency check_by_ssh_nagios_latency ssh_process_count check_by_ssh_process_count!80!100 ssh_swap check_by_ssh_swap!20%!10% ssh_uptime check_by_ssh_uptime!1800!900 $USER1$/check_by_ssh -H $HOSTADDRESS$ -t 60 -l "$USER17$" -C "$USER22$/check_system_uptime.pl -w $ARG1$ -c $ARG2$" tcp_gw_listener check_tcp_gw_listener $USER1$/check_tcp -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -p 4913 tcp_http check_http!3!5 $USER1$/check_http -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ tcp_nsca check_tcp_nsca $USER1$/check_tcp -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -p 5667 tcp_ssh check_ssh $USER1$/check_ssh -H $HOSTADDRESS$ -t 60
This package includes the following files:
Performance Graphing Programs
GroundWork Monitor includes many monitoring profiles for a variety of devices, systems and applications. Profiles already imported on a new GroundWork installation include Service Ping, SNMP Network, and SSH UNIX. The GroundWork Monitor Configuration tool is used to import updated Profiles and Profiles that require additional setup; the Profile XML file and its companion Performance Configuration definition file. Services can also be imported in addition to Service Profiles in the Profile Importer. The import process is documented under GROUNDWORK PROFILES > How to import profiles.
This section contains detail settings used by this Profile. These parameters can be altered with the Configuration tool.
Command parameters are in the Configuration Services section with the following names and default values.
- The services named "ssh_cpu_<process name>" are measures of cpu utilization for specific process or groups of processes. Note that the total percentage cpu utilized across all these services will not necessarily total 100% at any one time; Nagios will check them all at different times, and some of them are actually checking groups of processes (such as ssh_cpu_perl) while others are checking individual processes, which may be members of one or more groups. For example, ssh_cpu_perl will include the measure of ssh_cpu_snmptt, a perl program.
- This is an example check of the /home partition. If /home is not a separate partition from /, this service should be deleted.
- Check disk utilization of root partition.
- $ARG1$ - A warning alert will be generated if the utilization exceeds this value. The default is 400.
- $ARG2$ - A critical alert will be generated if the utilization exceeds this value. The default is 200.
- $ARG3$ - The partition.
- This is an example check of the /var partition. If /var is not a separate partition from /, this service should be deleted.
- A check of the cpu load average at 5,4,3 and 10,8,6 minute intervals. It is a good idea to know when the GroundWork server is getting overloaded, so tuning adjustments can be made.
- $ARG1$ - A warning alert will be generated if the load exceeds these values for 5,4,3 minute load averages.
- $ARG2$ - A critical alert will be generated if the load exceeds these values for 10,8,6 minute load averages.
- Those services named "ssh_mem_<process_name>", similarly, measure the memory utilization of individual processes in the same way cpu is measured. In fact, the same plugin, check_by_ssh_mem_procl.sh, is used for both.
- In contrast to the individual memory measures, this service checks the total memory used on the system. It has been configured to check used memory, and to ignore memory used for buffers, which is potentially available for use by programs. This service, therefore, will warn the administrator when the system is consuming too much memory to operate optimally.
- $ARG1$ - A warning alert will be generated if the memory utilization exceeds this value.
- $ARG2$ - A critical alert will be generated if the memory utilization exceeds this value.
- The check_by_ssh_nagios_latency.pl plugin will check the time between when an active service check is scheduled to be executed and the time it actually executes. This term is called latency and is an indicator of the load on the server. This plugin will not generate an alert, but it will allow the latency to be graphed.
- A measure of the "lateness" of Nagios' service check scheduler, or how far behind the intended scheduled time service checks are actually being launched. This is in contrast to check execution time. A good metric to watch here is the average latency. This number should ideally stay below 10 seconds. Note that this service does not have thresholds, and will not alert administrators.
- This is a check of the total number of active processes on the remote GroundWork server. This is a useful metric to track, especially if there are add-on packages like NMS or large numbers of active checks on a child server. The warning threshold is 80 and the critical threshold is 100.
- This is a check of swap utilization. If swap is heavily used, system performance will be degraded. The threshold are for remaining swap, so warnings are generated when 20% remains, and critical messages are generated when 10% remains.
- A check of a TCP port being open, in this case the port used to post state and message data into the GroundWork Foundation database. This should always be open while foundation is running. Note: In some child server installations, it may be normal for this port to be closed, as Foundation is sometimes disabled on child servers for performance reasons. It should be open on the primary server, however, or the database will be unable to accept data.
- Checks the availability of the web interface at a very low level. Designed to alert the administrator that users cannot use the GroundWork web portal.
- Check for a listening TCP port number 5667 on $HOSTADDRESS$. This is the port used by NSCA to listen for connections from remote hosts. GroundWork uses the NSCA add-on to Nagios (actually in the form of the bronx event broker) to accept passive host and service checks. This port is open by default, and is rarely disabled, and so the status is checked regularly.
- A good practice, we include a check of the ssh port. This service can be made the root of a notification and execution dependency if desired, to ensure that false alarms are minimized.
The following parameters are used to generate performance charts. These parameters are set using the Configuration>Performance tool in GroundWork Monitor.
- Graphs the cpu utilization of the process
- The Nagios service description must contain the string "ssh_cpu_<process name>".
ssh_disk_home, ssh_disk_var are similar.
- Graphs the disk utilization of the root partition.
- The Nagios service description must contain the string "ssh_disk_root".
- Graphs the 1, 5 and 15 minute load averages.
- The Nagios service description must contain the string "ssh_load".
- Graphs the percentage memory utilization for the process.
- The Nagios service description must contain the string "ssh_mem_<processname>".
- Graphs the percentage memory utilization.
- The Nagios service description must contain the string "ssh_mem".
- Graphs the latency of Nagios service checks.
- The Nagios service description must contain the string "ssh_nagios_latency".
- This is a standard graph of the number of running processes.
- This is a standard graph of swap utilization.
- The Nagios service description must contain the string "tcp_gw_listener".
- This is a standard graph of port response time and amount of data transferred.
- Graphs time taken for NSCA daemon to respond.
- The Nagios service description must contain the string "tcp_nsca".
- As noted above, the GroundWork Server monitoring must be accessible over ssh as user nagios, with a key exchanged. The plugins needed must be accessible at -nagios/libexec.
- The Nagios latency graph relies on a Nagios 2.0 binary /usr/local/groundwork/nagios/bin/nagiostats and the included version of the plugin check_nagios_latency.pl. It has been modified to produce performance data in the standard format. All other checks should work for Nagios.