Local GW Server

Local GroundWork Server Profile

This profile is applied as the default monitoring for the local groundwork server. It is designed to give the administrator a complete view of the health and performance of all the processes groundwork needs to work well. The services fall into four categories: cpu measures, memory measures, process status, and specific measures. This profile ensures the main components of the monitoring server listed below are running:

  • Nagios
  • PostgreSQL database and associated feeder and listener processes
  • NSCA
  • SNMP TRAPD and SNMPTT
Services Configuration
  • Service - Definitions in Monarch are stored under this name.
  • Command Line - Service command name with arguments to be passed to the plugin.
  • Plugin Command Line - Plugin script called by Nagios for this Service.
    Command lines displayed below are intended to be single line commands.
    Service Command Line/Description Plugin Command Line
    local_cpu_httpd check_local_proc_cpu!40!50!httpd $USER1$/check_procl.sh --cpu -w "$ARG1$" -c "$ARG2$" -p "$ARG3$"
    local_cpu_java check_local_proc_cpu!40!50!java $USER1$/check_procl.sh --cpu -w "$ARG1$" -c "$ARG2$" -p "$ARG3$"
    local_cpu_nagios check_local_proc_cpu!40!50!nagios $USER1$/check_procl.sh --cpu -w "$ARG1$" -c "$ARG2$" -p "$ARG3$"
    local_cpu_perl check_local_proc_cpu!40!50!perl $USER1$/check_procl.sh --cpu -w "$ARG1$" -c "$ARG2$" -p "$ARG3$"
    local_cpu_snmptrapd check_local_proc_cpu!40!50!snmptrapd $USER1$/check_procl.sh --cpu -w "$ARG1$" -c "$ARG2$" -p "$ARG3$"
    local_cpu_snmptt check_local_proc_cpu!40!50!snmptt $USER1$/check_procl.sh --cpu -w "$ARG1$" -c "$ARG2$" -p "$ARG3$"
    local_cpu_syslog-ng check_local_proc_cpu!40!50!syslog-ng $USER1$/check_procl.sh --cpu -w "$ARG1$" -c "$ARG2$" -p "$ARG3$"
    local_dir_size_snmptt check_dir_size!/usr/local/groundwork/
    common/var/spool/snmptt!500!1000
    $USER1$/check_dir_size.sh "$ARG1$" "$ARG2$" "$ARG3$"
    local_disk_root check_local_disk!15%!10%!/ $USER1$/check_disk -w "$ARG1$" -c "$ARG2$" -p "$ARG3$"
    local_load check_local_load!5,4,3!10,8,6 $USER1$/check_load -w "$ARG1$" -c "$ARG2$"
    local_mem_httpd check_local_proc_mem!20!30!httpd $USER1$/check_procl.sh --mem -w "$ARG1$" -c "$ARG2$" -p "$ARG3$"
    local_mem_java check_local_proc_mem!40!50!java $USER1$/check_procl.sh --mem -w "$ARG1$" -c "$ARG2$" -p "$ARG3$"
    local_mem_nagios check_local_proc_mem!20!30!nagios $USER1$/check_procl.sh --mem -w "$ARG1$" -c "$ARG2$" -p "$ARG3$"
    local_mem_perl check_local_proc_mem!20!30!perl $USER1$/check_procl.sh --mem -w "$ARG1$" -c "$ARG2$" -p "$ARG3$"
    local_mem_snmptrapd check_local_proc_mem!20!30!snmptrapd $USER1$/check_procl.sh --mem -w "$ARG1$" -c "$ARG2$" -p "$ARG3$"
    local_mem_snmptt check_local_proc_mem!20!30!snmptt $USER1$/check_procl.sh --mem -w "$ARG1$" -c "$ARG2$" -p "$ARG3$"
    local_mem_syslog-ng check_local_proc_mem!20!30!syslog-ng $USER1$/check_procl.sh --mem -w "$ARG1$" -c "$ARG2$" -p "$ARG3$"
    local_mem check_local_mem!95!99 $USER1$/check_mem.pl -U -w "$ARG1$" -c "$ARG2$"
    local_nagios_latency check_nagios_latency $USER1$/check_nagios_latency.pl
    local_users check_local_users!5!20 $USER1$/check_users -w $ARG1$ -c $ARG2$
    local_process_gw_listener check_local_procs_arg!1:1!1:1!etc/
    foundation.xml
    $USER1$/check_procs -w "$ARG1$" -c "$ARG2$" -a "$ARG3$"
    local_process_nagios check_nagios $USER1$/check_nagios -F /usr/local/groundwork/nagios/var/status.log -e 5 -C bin/.nagios.bin
    local_process_snmptrapd check_local_procs_arg!1:1!1:1!snmptrapd $USER1$/check_procs -w "$ARG1$" -c "$ARG2$" -a "$ARG3$"
    local_process_snmptt check_local_procs_arg!2:2!2:2!sbin/snmptt $USER1$/check_procs -w "$ARG1$" -c "$ARG2$" -a "$ARG3$"
    local_uptime check_local_uptime!1800!900 $USER1$/check_system_uptime.pl -w "$ARG1$" -c "$ARG2$"
    tcp_gw_listener check_tcp_gw_listener $USER1$/check_tcp -H $HOSTADDRESS$ -p 4913
    tcp_http_port check_http_port!3!5!80 $USER1$/check_http -H $HOSTADDRESS$ -w "$ARG1$" -c "$ARG2$" -p "$ARG3$"
    tcp_nsca check_tcp_nsca $USER1$/check_tcp -H $HOSTADDRESS$ -p 5667
Profile Package

This package includes the following files:

Profile Definitions

  • service-profile-local-groundwork-server.xml
  • perfconfig-local-groundwork-server.xml

Plugins Scripts

  • check_http
  • check_procs
  • check_nagios_latency.pl
  • check_nagios
  • check_tcp
  • check_procl.sh
Installation

GroundWork Monitor includes many monitoring profiles for a variety of devices, systems and applications. Profiles already imported on a new GroundWork installation include Service Ping, SNMP Network, and SSH UNIX. The GroundWork Monitor Configuration tool is used to import updated Profiles and Profiles that require additional setup; the Profile XML file and its companion Performance Configuration definition file. Services can also be imported in addition to Service Profiles in the Profile Importer. The import process is documented under GROUNDWORK PROFILES > How to import profiles.

You should import this profile and apply it to localhost if you want comprehensive monitoring of the GroundWork system itself.

Implementation

This section contains detail settings used by this Profile. These parameters can be altered with the Configuration tool.

Command Parameters

Command parameters are in the Configuration Services section with the following names and default values.

local_cpu_<process name>

  • The services named "local_cpu_<process name>" are measures of cpu utilization for specific process or groups of processes. Note that the total percentage cpu utilized across all these services will not necessarily total 100% at any one time; Nagios will check them all at different times, and some of them are actually checking groups of processes (such as local_cpu_perl) while others are checking individual processes, which may be members of one or more groups. For example, local_cpu_perl will include the measure of local_cpu_snmptt, a perl program. 

local_dir_size_snmptt

  • Monitors the directory size, or the number of files present in the directory /usr/local/groundwork/common/var/spool/snmptt, which is used as a spooling directory for snmp traps awaiting processing. If this directory backs up, you may need to determine why, and what device or devices are generating larger numbers of traps. It's also possible that overloaded servers may be too slow to keep up with a normal flow of traps, and this service will detect that condition before it becomes a problem. Note that this metric is normally only gathered on the parent server in GroundWork deployments, and so is not replicated in the ssh version of this profile designed for GroundWork child servers.

local_disk_root

  • Check disk utilization of root partition.
  • $ARG1$ - A warning alert will be generated if the utilization exceeds this value. The default is 15.
  • $ARG2$ - A critical alert will be generated if the utilization exceeds this value. The default is 10.
  • $ARG3$ - The partition.

local_load

  • A check of the cpu load average at 1,5, and 15 minute intervals. It is a good idea to know when the GroundWork server is getting overloaded, so tuning adjustments can be made.
  • $ARG1$ - A warning alert will be generated if the load exceeds these values for 1, 5, or 15 minute load averages.
  • $ARG2$ - A critical alert will be generated if the load exceeds these values for 1, 5, or 15 minute load averages.

local_mem_<processname>

  • Those services named "local_mem_<process_name>", similarly, measure the memory utilization of individual processes in the same way cpu is measured. In fact, the same plugin, check_procl.sh, is used for both. 

local_mem

  • In contrast to the individual memory measures, this service checks the total memory used on the system. It has been configured to check used memory, and to ignore memory used for buffers, which is potentially available for use by programs. This service, therefore, will warn the administrator when the system is consuming too much memory to operate optimally.
  • $ARG1$ - A warning alert will be generated if the memory utilization exceeds this value.
  • $ARG2$ - A critical alert will be generated if the memory utilization exceeds this value.

local_nagios_latency

  • The check_nagios_latency.pl plugin will check the time between when an active service check is scheduled to be executed and the time it actually executes. This term is called latency and is an indicator of the load on the server. This plugin will not generate an alert, but it will allow the latency to be graphed.
  • A measure of the "lateness" of Nagios' service check scheduler, or how far behind the intended scheduled time service checks are actually being launched. This is in contrast to check execution time. A good metric to watch here is the average latency. This number should ideally stay below 10 seconds. Note that this service does not have thresholds, and will not alert administrators.

local_users

  • A measure of the number of users logged into the command line. This is not a measure of the number of users in the GroundWork web portal.

local_process_gw_listener

  • Check to make sure the GroundWork Foundation listener process is running. If this process fails, the Foundation database will not be updated with current Nagios data. A critical alert will be generated if this process is not running.
  • The services named simply "local_process_<process name>" are simple checks if those processes are running, and if so, if the correct number are running.
  • $ARG1$ - A warning alert will be generated if the number of processes exceeds this value. The default is 1:1.
  • $ARG2$ - A critical alert will be generated if the number of processes exceeds this value. The default is 1:1.

local_process_nagios

  • Check if the Nagios process is running. The check_nagios plugin will check to make sure the Nagios status log is updating at least every 5 minutes, and the bin/.nagios.bin process is running. A critical alert will be generated if either of these conditions is not met.

local_process_snmptrapd

  • Check to make sure the SNMPTRAPD daemon is running. If this process fails, SNMP traps will not be received by the GroundWork server. A critical alert will be generated if this process is not running.
  • $ARG1$ - A warning alert will be generated if the number of processes exceeds this value. The default is 1:1.
  • $ARG2$ - A critical alert will be generated if the number of processes exceeds this value. The default is 1:1.

local_process_snmptt

  • Check to make sure the SNMPTT daemon is running. If this process fails, SNMP traps that are received by SNMPTRAPD will not be processed by Nagios and the GroundWork Foundation will not insert trap events. A critical alert will be generated if this process is not running.
  • $ARG1$ - A warning alert will be generated if the number of processes exceeds this value. The default is 1:1.
  • $ARG2$ - A critical alert will be generated if the number of processes exceeds this value. The default is 1:1.

tcp_gw_listener

  • A check of a TCP port being open, in this case the port used to post state and message data into the GroundWork Foundation database. This should always be open while foundation is running. Note: In some child server installations, it may be normal for this port to be closed, as Foundation is sometimes disabled on child servers for performance reasons. It should be open on the primary server, however, or the database will be unable to accept data.

tcp_http_port

  • Checks the availability of the web interface at a very low level. Designed to alert the administrator that users cannot use the GroundWork web portal.

tcp_nsca

  • Check for a listening TCP port number 5667 on $HOSTADDRESS$. This is the port used by NSCA to listen for connections from remote hosts. GroundWork uses the NSCA add-on to Nagios (actually in the form of the bronx event broker) to accept passive host and service checks. This port is open by default, and is rarely disabled, and so the status is checked regularly.
Performance Graphing Parameters

The following parameters are used to generate performance charts. These parameters are set using the Configuration>Performance tool in GroundWork Monitor.

local_cpu_<process name>

  • Graphs the cpu utilization of the process
  • The Nagios service description must contain the string "local_cpu_<process name>".

local_dir_size_snmptt

  • Graphs the directory size for snmptt.
  • The Nagios service description must contain the string "local_dir_size_snmptt".

local_disk_root

  • Graphs the disk utilization of the root partition.
  • The Nagios service description must contain the string "local_disk_root".

local_load

  • Graphs the 1, 5 and 15 minute load averages.
  • The Nagios service description must contain the string "local_load".

local_mem_<processname>

  • Graphs the percentage memory utilization for the process.
  • The Nagios service description must contain the string "local_mem_<processname>".

local_mem

  • Graphs the percentage memory utilization.
  • The Nagios service description must contain the string "local_mem".

local_nagios_latency

  • Graphs the latency of Nagios service checks.
  • The Nagios service description must contain the string "local_nagios_latency".

local_users

  • Graphs the number of users logged in to the Linux command shell on the GroundWork Monitor System. This is NOT the number of users logged in to the web portal.
  • The Nagios service description must contain the string "local_users".

local_process_gw_listener

  • Graphs the number of gw_listener processes.
  • The Nagios service description must contain the string "local_process".

local_process_nagios

  • Graphs the number of Nagios processes.
  • The Nagios service description must contain the string "local_process".

local_process_snmptrapd

  • Graphs the number of snmptrapd processes.
  • The Nagios service description must contain the string "local_process".

local_process_snmptt

  • Graphs the number of snmptt processes.
  • The Nagios service description must contain the string "local_process".

tcp_gw_listener

  • The Nagios service description must contain the string "tcp_gw_listener".

tcp_http_port

  • Graphs time taken to load web page.
  • The Nagios service description must contain the string "tcp_http_port".

tcp_nsca

  • Graphs time taken for NSCA daemon to respond.
  • The Nagios service description must contain the string "tcp_nsca".
Implementation Notes

The Nagios latency graph relies on a Nagios binary /usr/local/groundwork/nagios/bin/nagiostats and the included version of the plugin check_nagios_latency.pl. It has been modified to produce performance data in the standard format. All other checks should work for Nagios.

Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.