This page gives an overview of performance data processing, how to configure it, and how to create performance graphs using RRD and remote RRD. While creating performance graphs with Grafana is largely automatic in GroundWork Monitor, you will still need to have an entry for your metrics in the Performance option (Configuration > Performance). This page describes these requirements as well.
GroundWork can accept performance data from many sources. Nagios is configured to pass service check performance data to GroundWork Monitor via a special processing program, which is also used to process performance data from Cloud Hub and other sources.
If you select RRD as a graphing mechanism in GroundWork Monitor 7.2.1, the performance data processor (process_service_perfdata_file) gets chart parameters from a configuration database, interprets the performance data, then uses RRDtool to create or update RRD (Round Robin Database) data files. Performance data is added to a cache each time the service check is executed. CGI programs are provided to display the graphs using server-side rendering. A default configuration database matching the installed GroundWork service profiles is delivered with the GroundWork Monitor package. This data can be modified by accessing the Performance option in the UI.
Once RRD databases are created, there are several methods for displaying this data. In the Status page you can view performance graphs under a service if an RRD is associated with that host and service. The second figure below displays example performance graphs within Status.
No additional configuration other than the procedure listed in this section is required. You may also show these as links off the Nagios service detail pages. In order to do this, you must create Nagios extended service information links and install graphing CGI programs. Generic versions of these are included with GroundWork Monitor.
Figure: Performance data processing using RRDs
Figure: Performance RRD Graphs as seen in Status
If you selected InfluxDB and Grafana as a way to process performance data, this same cache and performance processing will be called, but the data will be sent to InfluxDB for storage, and graphed using Grafana.
Figure: Performance data processing using Grafana
In either case, there must be an entry in the Performance table to enable the processing of the data collected. By default, data is not sent to the foundation database, but it can be enabled. See the section Performance Configuration Parameters for InfluxDB/Grafana below for details on enabling the processing for your metrics.
Any checks that are processed by Nagios may return performance data. The link Nagios Plugin Development Guidelines defines the format for plugin performance data.
In the GroundWork Monitor package, Nagios writes all data (plugin output including performance data) to the service-perfdata.dat file. Every 300 seconds Nagios runs the launch_perfdata_process command, which runs the launch_perf_data_processing script, which starts the process_service_perfdata_file script if it is not already running, and that script reads a renamed copy of the service-perfdata.dat file.
In the service performance data file /usr/local/groundwork/nagios/var/service-perfdata.dat the service performance data file processing interval 300 and the service performance data file processing command launch_perfdata_process are configurable in the configuration page under Control > Nagios Main Configuration (on page 3).
The launch_perfdata_process command invokes the script process_service_perfdata_file which writes performance data into one of three places:
- RRD Files - The script creates an RRD file whose name is a concatenation of the host name and the service name. The data in these RRDs are presented graphically in both the Status and Performance applications.
- InfluxDB - The script makes an API call to push the data to Influx, creating the host and service entries if they do not exist, and updating the GroundWork Grafana data source with the contextual information.
- Foundation Database - Optionally, a summary of the performance data is also sent to Foundation which has a listener for performance data. How the data is persisted is described in detail below.
Synchronously with the processing, Nagios reopens the service-perfdata.dat file in append mode, which either continues to collect data in an unprocessed file, or starts a new file if the previous file was renamed for processing by the launch_perf_data_processing script.
Performance data is stored into RRD files. Format and data aggregation information can be found at http://oss.oetiker.ch/rrdtool/doc/rrdtool.en.html.
Performance data is stored in InfluxDB for further use in graphs or other processing. You can refer to the document How to change performance data retention policies in InfluxDB for data on configuring how long data is kept.
Performance data is sent to Foundation as XML in a Web Services call, for efficient bulk-data transfer.
The process_service_perfdata_file script includes the host name, service description, label, timestamp and performance value in the post to Foundation.
The business object in Foundation handling the incoming performance data is by default configured to average the performance data values for the check over a day. Along with the daily average, the maximum and minimum values for the day are also stored. Through configuration the range for average can be changed from a day to an hour interval. For details about changing the interval you can refer to GroundWork Foundation in the Developer Reference section.
The performance data values are stored in the LogPerformanceData table. For each service that provides performance data an entry per day is created.
The Reports option in GroundWork Monitor includes two reports with performance data stored in Foundation by host group or by host. The reports allow drill-down to performance data by individual services. These reports are located under Reports > BIRT Report Viewer > Performance.
- Performance Report by Host (epr-host): This report shows the performance indicators identified by the Label value across a selected Host and Time Range.
- Performance Report by Host Multi Variable (epr-host multi variable): Charts a report with up to two individually selected Hosts, units, and performance indicators present in the selected Hosts.
- Performance Report by Hostgroup (epr-hostgroup): This report shows the performance indicators identified by the Label value across a selected Host Group and Time Range.
- Performance Report by Hostgroup Multi Variable (epr-hostgroup multi variable): This report charts long-term performance trends for performance data for a selected Host Group. This report can help identify areas where additional capacity is needed due to steady increases in load or demand.
- Performance Report by Hostgroup Top Five (epr-hostgroup topfive): This report charts a selected performance indicator present in the selected Host Group.
The same process is used to accumulate data from Cloud Hub or any other metrics source in a file cache, and a secondary copy of the processing daemon will pick up this cache and process it asynchronously to the processing done for Nagios data. The data can be directed to the same endpoints: RRD, InfluxDB, and/or Foundation.
If you use InfluxDB and Grafana, all of the default performance configuration entries will work, and you will see some data from services loaded by default, or that you import using profiles that have performance configuration entries. Only a few of the fields are needed, however, and, if all you ever use to create graphs is InfluxDB and Grafana, you may want to make a fairly generic entry active to avoid ever having to deal with the performance configuration. You need only have the first 6 parameters defined, and the entry enabled (see below regarding RRD Create Command). This allows you to control which hosts and services get graphs in Grafana, using the regular expression for host and service. You can also optionally use the status text parsing options to dig metrics out of text results, as is sometimes necessary when dealing with Nagios or Icinga2 plugin results that do not conform to the plugin coding standard.
Most of the time you will not need to define a RRD Create Command, namely when a service check is written to the Nagios Plugins Development Guidelines and emits perfdata at the end of the service-check output in the approved format.
However, sometimes that's not the case. If you need to enable Use Status Text Parsing instead of Performance Data to parse the service-check status text for perfdata values, then an RRD Create Command is required in the perfdata entry, even if you don't intend to have any RRD files created. The reason is, in this situation, we use the RRD file definition DS labels as the labels for the data being parsed from the status text. Those labels are needed to tag the data in InfluxDB, and they would have been provided by perfdata at the end of the service-check results if that format had been followed. Without that format, we have no obvious way to know what labels to use. So we extract them from the DS labels, which will need to be defined in the same order as you extract data from the status text.
A simple RRD Create Command, for a single number being extracted from the service-check status text, might look like this:
This will cause the word number to be used as your perfdata label. Of course, that is quite generic; you will probably want to use some word which is more descriptive for the particular item in question. Remember DS label names are limited to 19 characters, and must contain only characters in the set:
Figure: Parameters needed for InfluxDB and Grafana
|Nagios has the ability to process performance data from both hosts and services. Service checks are executed at regular intervals. Host checks, on the other hand, may never be executed at all. Nagios only executes host checks when it is doing dependency calculations. Therefore the sporadic nature of host checks renders Host performance data unsuitable for graphing. This is why, in GroundWork Monitor, we only concern ourselves with service performance data.|
The Configuration page can to be used to properly configure the Nagios Main Configuration file to enable performance data handling. This should be already set up by the GroundWork installer. But these are the crucial configuration parameters. The image shows the parameters in the Nagios Main Configuration screen that enable performance data handling.
Most of the plugins in the GroundWork Monitor distribution output formatted performance data. The standard that defines how this data should be formatted is in the Nagios Plugin Development Guidelines.
Figure: Configuring performance data
When Nagios schedules a plugin to execute, the plugin returns two types of data on standard output. Both of these fields are in the same line. These two fields are separated by the pipe operator "|". Everything before the pipe operator Nagios considers to be status text, and is inserted in the status field of the Nagios (and Status) user interface. The status text is also inserted into the Nagios macro $SERVICEOUTPUT$. The text that follows the pipe operator is inserted into the macro $SERVICEPERFDATA$ and is also written into the service-perfdata.dat file.
A typical plugin output should look something like this:
Everything before the pipe operator is status text and everything after it is formatted performance data.
If we properly configured the service_perfdata configuration directives, Nagios takes this plugin output and records it in a log file:
At 5 minute intervals, this can be adjusted using the service_perfdata_file_processing_interval, Nagios runs the performance eventhandler command launch_perfdata_process. The process_service_perfdata_file script that it eventually launches in turn performs several tasks. It reads from the service-perfdata.dat file to extract the performance data Nagios has written there. For each service check result it finds there, it does a database lookup of the service name in the performanceconfig table in the Monarch database. This table (indexed by service name) contains the unique RRD create commands and RRD update commands appropriate for the data returned by that particular plugin.
The process_service_perfdata_file script uses this information to create the RRDs in the first instance, then to update the data in them on subsequent executions of the service.
Those RRDs are read by the CGI specified in the performanceconfig entry (these can be customized) and then presented for viewing in the Status application.
There is also a graphical user interface on the performanceconfig table, so the operator can adjust RRD create and update strings, or even specify which CGI will be used to graph them.
The process_service_perfdata_file script does more, however. Whenever it has to create a new RRD, it writes the path and filename of that RRD into the datatype table in the Monarch database, and makes a corresponding entry in the host_service table. These tables are used by the Performance application to locate the various RRDs in the system. Performance is able to read in the data from multiple RRDs and consolidate that data into a single graph.
This event handler also does a Web Services post to Foundation, which inserts summary performance data into the GWCollageDB for use by the EPR reports.
Finally, process_service_perfdata_file has the ability to generate a debug log file which is very helpful in diagnosing RRD problems in the system. The file is named process_service_perfdata_file.log and logging to it can be turned on and off using the debug_level in the perfdata.properties file. To increase debug logging, edit perfdata.properties and change this line:
|The logging is quite voluminous and this file can get to be very large in a relatively short period of time. Remember to turn this off (by setting debug_level=1) when you are finished troubleshooting your RRD problem and then kill the process_service_perfdata_file script so it gets restarted and picks up the new value.|
This occurs automatically the next time Nagios is restarted which happens during a Commit, or you can force it manually with the following command:
Figure: Performance process data flow
Under Configuration > Performance, set up one or more service-host entries for the passive services you defined. You may create these in any manner you like, but ensure that the RRD Create Command entry is of the following form:
Basically, everything between $LISTSTART$ and $LISTEND$ will be replicated for each label=value pair in the performance data. You may, of course, change the DS type from GAUGE to any supported value, or change any of the RRA parameters. Similarly, ensure that the RRD Update Command is of the following form:
The $LABELLIST$ and $VALUELIST$ macros will be expanded to the derived lists of labels and values parsed from the performance data.
Use the following steps to ensure that the performance handler is working as expected. The performance handler log file is /usr/local/groundwork/nagios/var/log/process_service_perfdata_file.log, as configured in the perfdata.properties file. At a high debug_level setting, the following information is entered in the log for each plugin execution:
- Performance eventhandler execution time stamp
- Host name
- Service name
- Last check time in UTC format
- Status Text
- Performance data string
- Parsing results
- Interpreted RRD create command string
- Interpreted RRD update command string
- RRD command results
- Execution time
To debug a performance handler problem, look at the log results for your Service entry. Check the following steps:
- Service is being parsed properly
- The configuration entry information is correct
- Performance or status information is being parsed correctly
- The correct entry of the performanceconfig database is used
- RRD commands are properly interpreted
- RRD commands are executing without an error message
To debug a chart generation error, check the following:
- Make sure the RRD is being generated for your Host/Service. RRDs are stored in the directory: /usr/local/groundwork/rrd
- Check to make sure you have the correct CGI program referenced in the Service extended information template
- Make sure the browser is opening the referenced CGI program when you click on the graph icon
- Make sure the CGI program references the correct data set names defined in the RRD creation command
|The logging is quite voluminous and this file can get to be very large in a relatively short period of time. Don't forget to turn this off (by setting debug_level=1) when you are finished troubleshooting your RRD problem and then kill the process_service_perfdata_file script so it gets restarted and picks up the new value.|
This occurs automatically the next time Nagios is restarted which happens during a Commit, or you can force it manually with the following command:
Definitions in the performance configuration database may be exported to transfer to another system or for backup purposes. To export the entire performance configuration database, select the Export All button at the top of the Performance Configuration utility page. To export a specific performance configuration entry, select the Export button for that entry. The exported file is placed by default in the /tmp directory. This is an XML file describing each field entry. A sample file is shown below.
To import an exported file, execute the following script which will read the exported XML file and insert the entry into the performance configuration database.