|
Key
This line was removed.
This word was removed. This word was added.
This line was added.
|
Comment:
Changes (1)
View Page History... This page reviews performance the graph process, and how to create performance and remote RRD graphs. {toc:minLevel=4|maxLevel=4|printable=false} h4. About Performance Graphs The *Performance* option in the _Configuration_ page enables users to generate performance graphs with data gathered from the _Nagios_ monitoring system. h5. Performance Graphs Process !p_status.gif|align=right,border=10,bordercolor=#FFFFFF! _Nagios_ is configured to pass service check performance data to a special event handler. The event handler gets chart parameters from a configuration database, interprets the performance data, then uses RRDtool to create or update RRD (Round Robin Database) data files each time the Service check is executed. CGI programs are provided to display the graphs. A default configuration database that matches installed _GroundWork_ service profiles is delivered with the _GroundWork Monitor_ package. This data can be modified by accessing the _GroundWork Monitor_ *Configuration > Performance* option. Once RRD databases are created, there are several methods for displaying this data. In the _Status_ page you can view performance graphs under a service if an RRD is associated with that host and service. The figure below displays a performance graph in _Status_ for the *CPU Utilization* performance metric returned by the *local_cpu_mysql* service check. The image here show a performance graph viewable in Status. No additional configuration other than the procedure listed in this section is required. You may also show these as links off the _Nagios_ service detail pages. In order to do this, you must create _Nagios_ extended service information links and install graphing CGI programs. Generic versions of these are included with _GroundWork Monitor_. h5. Performance Data Handling in GroundWork Monitor Any checks that are processed by _Nagios_ may return performance data. See this link; [Nagios Plugin Development Guidelines|http://nagiosplug.sourceforge.net/developer-guidelines.html], which defines the format for plugin performance data. h6. Performance Data Handling Process In the _GroundWork Monitor_ package, _Nagios_ writes all data (plugin output including performance data) to the {{service-perfdata.dat}} file. Every 300 seconds Nagios runs the {{launch_perfdata_process}} command, which runs the {{launch_perf_data_processing}} script, which starts the {{process_service_perfdata_file}} script if it is not already running, and that script reads a renamed copy of the {{service-perfdata.dat}} file. The service performance data file: {code}/usr/local/groundwork/nagios/var/service-perfdata.dat{code} the service performance data file processing interval {code}300{code} and the service performance data file processing command {code}launch_perfdata_process{code} are configurable in the _Configuration_ page under *Control,* *Nagios Main Configuration* (on page 3). The {{launch_perfdata_process}} command invokes the script {{process_service_perfdata_file}} which writes performance data into two places: * *RRD Files* \- The script creates an RRD file whose name is a concatenation of the host name and the service name. The data in these RRDs are presented graphically in both the _Status_ and _Performance_ applications. * *Foundation Database* \- A summary of the performance data is also sent to _Foundation_ which has a listener for performance data. How the data is persisted is described in detail below. At the end of processing _Nagios_ reopens the {{service-perfdata.dat}} file in append mode, which either continues to collect data in an unprocessed file, or starts a new file if the previous file was renamed for processing by the {{launch_perf_data_processing}} script. h6. RRD Files for Performance Data Performance data is stored into RRD files. Format and data aggregation information can be found at [http://oss.oetiker.ch/rrdtool/doc/rrdtool.en.html|http://oss.oetiker.ch/rrdtool/doc/rrdtool.en.html]. h6. Performance Data in Foundation Performance data is sent to _Foundation_ as XML in a Web Services call, for efficient bulk-data transfer. The {{process_service_perfdata_file}} script includes the host name, service description, label, timestamp and performance value in the post to _Foundation_. The business object in _Foundation_ handling the incoming performance data is by default configured to average the performance data values for the check over a day. Along with the daily average, the maximum and minimum values for the day are also stored. Through configuration the range for average can be changed from a day to an hour interval. For details about changing the interval you can refer to [Foundation] in the DEVELOPER REFERENCE section. The performance data values are stored in the LogPerformanceData table. For each service that provides performance data an entry per day is created. h6. Reporting on Performance Data The _Reports_ option in _GroundWork Monitor_ includes two reports with performance data stored in _Foundation_ by host group or by host. The reports allow drilldown to performance data by individual services. These reports are located under *Reports*, *Report Tree, Performance Reports.* * *Performance Report by Host (epr-host)*: This report shows the performance indicators identified by the Label value across a selected Host and Time Range. * *Performance Report by Host Multi Variable (epr-host multi variable)*: Charts a report with up to two individually selected Hosts, units, and performance indicators present in the selected Hosts. * *Performance Report by Hostgroup (epr-hostgroup)*: This report shows the performance indicators identified by the _Label_ value across a selected Host Group and Time Range. * *Performance Report by Hostgroup Multi Variable (epr-hostgroup multi variable)*: This report charts long-term performance trends for performance data for a selected Host Group. This report can help identify areas where additional capacity is needed due to steady increases in load or demand. * *Performance Report by Hostgroup Top Five (epr-hostgroup topfive)*: This report charts a selected performance indicator present in the selected Host Group. h5. Performance Data Handling Parameters {Note}Nagios has the ability to process performance data from both hosts and services. Service checks are executed at regular intervals. Host checks, on the other hand, may never be executed at all. _Nagios_ only executes host checks when it is doing dependency calculations. Therefore the sporadic nature of host checks renders Host performance data unsuitable for graphing. This is why, in _GroundWork Monitor_, we only concern ourselves with service performance data.{Note} {section}{column:width=50%}The _Configuration_ page can to be used to properly configure the _Nagios Main Configuration_ file to enable performance data handling. This should be already set up by the _GroundWork_ installer. But these are the crucial configuration parameters. The image (select _Show/Hide_) shows the parameters in the _Nagios Main Configuration_ screen that enable performance data handling. Most of the plugins in the _GroundWork Monitor_ distribution output formatted performance data. The standard that defines how this data should be formatted is in the [Nagios Plugin Development Guidelines|http://nagiosplug.sourceforge.net/developer-guidelines.html].{column} {column:width=50%} !c_psystemconfiguration.gif|align=left!{column}{section} h5. Performance Process Data Flow When _Nagios_ schedules a plugin to execute, the plugin returns two types of data on standard output. Both of these fields are in the same line. These two fields are separated by the pipe operator (\|). Everything before the pipe operator _Nagios_ considers to be status text, and is inserted in the status field of the _Nagios_ (and _Status_) user interface. The status text is also inserted into the _Nagios_ macro $SERVICEOUTPUT$. The text that follows the pipe operator is inserted into the macro {{$SERVICEPERFDATA$}} and is also written into the {{service-perfdata.dat}} file. A typical plugin output should look something like this: {code}OK - load average: 0.35, 0.29, 0.20 | load1=0.350;5.000;10.000;0; load5=0.290;4.000;6.000;0; load15=0.200;3.000;4.000;0;{code} Everything before the pipe operator is status text and everything after it is formatted performance data. If we properly configured the {{service_perfdata}} configuration directives, _Nagios_ takes this plugin output and records it in a log file: {code}/usr/local/groundwork/nagios/var/service-perfdata.dat{code} At 5 minute intervals (this can be adjusted using the {{service_perfdata_file_processing_interval}}) _Nagios_ runs the performance eventhandler command ({{launch_perfdata_process}}). The {{process_service_perfdata_file}} script that it eventually launches in turn performs several tasks. It reads from the {{service-perfdata.dat}} file to extract the performance data Nagios has written there. For each service check result it finds there, it does a database lookup of the service name in the performanceconfig table in the Monarch database. This table (indexed by service name) contains the unique RRD create commands and RRD update commands appropriate for the data returned by that particular plugin. The {{process_service_perfdata_file}} script uses this information to create the RRDs in the first instance, then to update the data in them on subsequent executions of the service. Those RRDs are read by the CGI specified in the performanceconfig entry (these can be customized) and then presented for viewing in the _Status_ application. There is also a graphical user interface on the performanceconfig table (_Configuration>Performance_), so the operator can adjust RRD create and update strings, or even specify which CGI will be used to graph them. The {{process_service_perfdata_file}} script does more, however. Whenever it has to create a new RRD, it writes the path and filename of that RRD into the datatype table in the Monarch database, and makes a corresponding entry in the host_service table. These tables are used by the Performance application to locate the various RRDs in the system. Performance is able to read in the data from multiple RRDs and consolidate that data into a single graph. This event handler also does a Web Services post to Foundation, which inserts summary performance data into the GWCollageDB for use by the EPR Reports. Finally, {{process_service_perfdata_file}} has the ability to generate a debug log file which is very helpful in diagnosing RRD problems in the system. The file is named {{process_service_perfdata_file.log}} and logging to it can be turned on and off using the {{debug_level}} in the {{perfdata.properties}} file. To increase debug logging, edit {{perfdata.properties}} and change this line: {code}debug_level=1{code} to this: {code}debug_level=3{code} {Note}The logging is quite voluminous and this file can get to be very large in a relatively short period of time. *Don’t forget* to turn this off (by setting debug_level=1) when you are finished troubleshooting your RRD problem and then kill the process_service_perfdata_file script so it gets restarted and picks up the new value.{Note} This occurs automatically the next time _Nagios_ is restarted which happens during a _Commit_, or you can force it manually with the following command: {code}service groundwork restart nagios{code} !p_data_flow.gif|align=left|hspace=5px! h5. Implementing String Lists in Performance Configuration Under *Performance>Configure*, set up one or more service-host entries for the passive services you defined. You may create these in any manner you like, but ensure that the RRD Create Command entry is of the following form: {code}$RRDTOOL$ create $RRDNAME$ --step 300 --start n-1yr $LISTSTART$DS:$LABEL#$:GAUGE:900:U:U$LISTEND$ RRA:AVERAGE:0.5:1:8640 RRA:AVERAGE:0.5:12:9480{code} Basically, everything between {{$LISTSTART$}} and {{$LISTEND$}} will be replicated for each label=value pair in the performance data. You may, of course, change the DS type from GAUGE to any supported value, or change any of the RRA parameters. Similarly, ensure that the _RRD Update Command_ is of the following form: {code}$RRDTOOL$ update $RRDNAME$ -t $LABELLIST$ $LASTCHECK$:$VALUELIST$ 2>&1{code} The {{$LABELLIST$}} and {{$VALUELIST$}} macros will be expanded to the derived lists of labels and values parsed from the performance data. h5. Performance Testing and Debugging Use the following steps to ensure that the performance handler is working as expected. The performance handler log file is {{/usr/local/groundwork/nagios/var/log/process_service_perfdata_file.log}}, as configured in the {{perfdata.properties}} file. At a high {{debug_level}} setting, the following information is entered in the log for each plugin execution: * Performance eventhandler execution time stamp * Host name * Service name * Last check time in UTC format * Status Text * Performance data string * Parsing results * Interpreted RRD create command string * Interpreted RRD update command string * RRD command results * Execution time h6. Service Entry Log Results To debug a performance handler problem, look at the log results for your Service entry. Check the following steps: # Service is being parsed properly # The configuration entry information is correct # Performance or status information is being parsed correctly # The correct entry of the performanceconfig database is used # RRD commands are properly interpreted # RRD commands are executing without an error message h6. Chart Generation Error To debug a chart generation error, check the following: # Make sure the RRD is being generated for your Host/Service. RRDs are stored in the directory: {{/usr/local/groundwork/rrd}} # Check to make sure you have the correct CGI program referenced in the Service extended information template. # Make sure the browser is opening the referenced CGI program when you click on the graph icon. # Make sure the CGI program references the correct data set names defined in the RRD creation command. {Note}The logging is quite voluminous and this file can get to be very large in a relatively short period of time. *Don’t forget* to turn this off (by setting debug_level=1) when you are finished troubleshooting your RRD problem and then kill the process_service_perfdata_file script so it gets restarted and picks up the new value.{Note} This occurs automatically the next time _Nagios_ is restarted which happens during a _Commit_, or you can force it manually with the following command: {code}service groundwork restart nagios{code} h5. Importing and Exporting Performance Configuration Definitions in the performance configuration database may be exported to transfer to another system or for backup purposes. To export the entire performance configuration database, select the _Export All_ button at the top of the Performance Configuration utility page. To export a specific performance configuration entry, select the _Export_ button for that entry. The exported file is placed by default in the {{/tmp}} directory. This is an XML file describing each field entry. A sample file is shown below. {code}<groundwork_performance_configuration> <service_profile name="gwsp-service_ping"> <graph name="Ping response time "> <host>*</host> <service regx="1"><![CDATA[Host Alive]]></service> <type>nagios</type> <enable>1</enable> <label>Ping Response Time</label> <rrdname><![CDATA[/usr/local/groundwork/rrd/$HOST$_$SERVICE$.rrd]]></rrdname> <rrdcreatestring><![CDATA[$RRDTOOL$ create $RRDNAME$ --step 300 --start n-1yr DS:number:GAUGE:900:U:U RRA:AVERAGE:0.5:1:2880 RRA:AVERAGE:0.5:5:4032 RRA:AVERAGE:0.5:15:5760 RRA:AVERAGE:0.5:60:8640]]></rrdcreatestring> <rrdupdatestring><![CDATA[$RRDTOOL$ update $RRDNAME$ $LASTSERVICECHECK$:$VALUE1$ 2>&1]]></rrdupdatestring> <graphcgi><![CDATA[/nagios/cgi-bin/number_graph.cgi]]></graphcgi> <parseregx first="0"><![CDATA[]]></parseregx> <perfidstring></perfidstring> </graph> </service_profile> </groundwork_performance_configuration>{code} h5. Import Exported File To import an exported file, execute the following script which will read the exported XML file and insert the entry into the performance configuration database. {code}/usr/local/groundwork/tools/profile_scripts/import_perfconfig.pl <xml_file_name>{code} h4. Creating Performance Graphs The following text describes the procedure to create your own performance graphs or if you wish to modify an existing graph. h5. Create a Graph !c_performanceservice.gif|align=right,border=10,bordercolor=#FFFFFF! # Select *Configuration* and then *Performance*. # Select *Configure*. # Select *Create New Entry* or select the *Copy* option to copy an existing service definition that come close to what you want. Continue with the steps below to enter or edit the service definition properties. # In most cases, you will want to graph either a number or a percent. The easiest way to do this is to make a copy of an existing configuration entry: #* Copy the GENERIC_NUMBER or GENERIC_PERCENT entry in the performance configuration database, then rename the service name to your entry. #* If performance data is already being generated, this is all you need to do to create the RRD. #* If performance data is not being generated, enter the status parsing regular expression to parse the number or percent from the output text. #* Specify either {{number_graph.cgi}} or {{percent_graph.cgi}} the CGI graphing program in the _Configuration_ extended information service template. These graphing programs are installed with _GroundWork Monitor_. If you wish, you can also specify the {{graph.gif}} icon. You will need to commit the changes to _Nagios_ in order for the service CGI to appear on the _Nagios_ interface. | Graph Label | Enter a *Graph Label* to define the label to appear on the Status graph window. | | Service | Enter a *Service* to define the Service name for Performance Graphs. Enter the exact Service name for this graph. The entry is case sensitive. | | Use Service as a Regular Expression | If you want this definition to match multiple Service names (e.g. {color:#008000}snmp_if_interface_1, snmp_if_interface_2{color}), check the *Use Service as a Regular Expression* option. All Service Names that match the Service field will use this entry to create RRDs. Be careful with this; if a Service Name matches 2 entries in this database, the system won't know which to use, so a RRD will not be created at all. | | Host | Enter a *Host* name to match for this entry. To match all Host names, enter \*. | | Status Text Parsing Regular Expression | The *Status Text Parsing Regular Expression* field is used when you are working with a plugin that does not return properly formatted performance data. This field is used in conjunction with the next field (Use Status Text Parsing Instead of Performance Data) to enable regular expression based parsing of the status text for performance metrics. \\ For Example, inserting a this regular expression "(\d+) (\d+)" (without the double quotes) will parse through the status text looking for the occurrence of two single or multiple digit numbers separated by a whitespace. \\ These numbers would be captured as $VALUE1$ and $VALUE2$ and could be passed to the rrd create and/or insert commands using those variable names. \\ The end result would be that numbers were extracted from the status text field of the plugin output and inserted into performance graphs despite the fact that the plugin returned no performance data. \\ *Note:* Parenthesis in a regular expression mean to capture the string or value that matches the enclosed regular expression into a variable. In our case it would be $VALUE1$ and $VALUE2$ | | Use Status Text Parsing instead of Performance Data | *Use Status Text Parsing Instead of Performance Data*: This field is the flag that enables the status text parsing function described in #5 above. A zero in this field disables regular expression based status text parsing, and a 1 enables it. | | RRD Name | The *RRD Name* option is used to define the RRD file name. The following macros may be used: \\ *$HOST$* Name of Host that called the performance handler \\ *$SERVICE$* Name of Service that called the performance handler \\ For example, the following string will create an RRD with the Host and Service name in the RRD file: /usr/local/groundwork/rrd/$HOST$_$SERVICE$.rrd | | RRD Create Command | Enter a *RRD Create Command* to define the RRD creation string. You can reference the RRDtool documentation for RRDtool creation commands and options. The following macros may be used: \\ *$RRDTOOL$* RRDtool program including file location \\ *$RRDNAME$* Name of the RRD file defined in the configuration tool \\ And an example of an RRD creation string: \\ $RRDTOOL$ create $RRDNAME$ \--step 300 \--start n-1yr DS:number: GAUGE:900:U:U RRA:AVERAGE:0.5:1:2880 RRA:AVERAGE:0.5:5:4032 RRA:AVERAGE:0.5:15:5760 RRA:AVERAGE:0.5:60:8640 | | RRD Update Command | Enter a *RRD Update Command* to define the RRD update string. Reference the RRDtool documentation for RRDtool creation commands and options. The in addition to the macros mentioned above, the following macro may be used: \\ $LASTSERVICECHECK$ Check time the plugin executed in UTC format \\ And the following string is an example of an RRD update. This example updates the RRD file with the first value from the performance data string or status text parse: $RRDTOOL$ update $RRDNAME$ $LASTSERVICECHECK$:$VALUE1$ 2><1 | | Custom RRDtool Graph Command | There are 3 Host view options in the _Dashboards Performance Graph_ portlet including _Expanded_, _Consolidated by Host_, and _Consolidated_. The *Custom RRDtool Graph Command* effects the appearance of the RRD graph when using the _Expanded_ view only. The graphs in the Status viewer application are not effected. \\ See text below this table to change the appearance of the graph. | | Enable | The *Enable* option if checked enables an entry. If disabled (unchecked) the RRD creation and update will not be executed for this entry. | h6. Custom RRDtool Graph Command To change the appearance of the graph, paste in a command that produces a graph from the command line to the *Custom RRD Graph Command* field. Any valid command will work, and any rrd accessible will produce a graph, even for an unrelated rrd. However, this is probably not what is desired, so substitution of certain strings is used to produce the desired effect. This process is triggered if the command inserted contains the string: "rrdtool graph" Substitutions: * rrd_source is replaced by the rrd selected by the cgi for the host and service. * ds_source_0 is replaced by the first DS * ds_source_N is replaced by the Nth DS in the RRD, where N is an integer This allows a fair amount of flexibility in specifying what is to be graphed and how. {Note}But there is more; if you place a {{$LISTSTART$ $LISTEND$}} pair in the rrdtool graph command the following values will be substituted: * $DEFLABEL#$ will become the RRD:DS string, repeated in the supplied context as many times as there are DS in the RRD, * $CDEFLABEL#$ will be a short string, to be used to serialize the CDEFS. The strings are taken from the sequence {{a}}, {{b}}, {{c}}, ..., {{z}}, {{aa}}, {{ab}}, {{ac}}, ..., {{az}}, {{ba}}, {{bb}}, {{bc}}, ..., and so forth. * $DSLABEL#$ will be the DS name, repeated in context as above. * $COLORLABEL#$ will be a color selected from the @colors array, the same one used to select colors in the default graphs. Use it if you don't know (or don't care) what colors get shown.{Note} What this means is that a custom command like this: {code}/usr/local/groundwork/common/bin/rrdtool graph - \ --imgformat=PNG \ --title="All Disk Partitions" \ --rigid \ --base=1000 \ --height=120 \ --width=500 \ --alt-autoscale-max \ --lower-limit=0 \ --vertical-label="Kilobytes" \ --slope-mode \ $LISTSTART$ \ DEF:$DEFLABEL#$:AVERAGE \ CDEF:cdef$CDEFLABEL#$=$CDEFLABEL#$,8,* \ LINE:cdef$CDEFLABEL#$$COLORLABEL#$:"$DSLABEL#$" \ GPRINT:cdef$CDEFLABEL#$:LAST:" Current\:%8.2lf %s" \ GPRINT:cdef$CDEFLABEL#$:AVERAGE:" Average\:%8.2lf %s" \ GPRINT:cdef$CDEFLABEL#$:MAX:" Maximum\:%8.2lf %s" \ $LISTEND${code} Ends up looking something like this: {code}/usr/local/groundwork/common/bin/rrdtool graph /usr/local/groundwork/apache2/htdocs/performance/rrd_img/view_1193523936_localhost_All-Partitions_h_1.png --imgformat=PNG --title="All Disk Partitions" --rigid --base=1000 --height=120 --width=500 --alt-autoscale-max --lower-limit=0 --vertical-label="Kilobytes" --slope-mode DEF:a=/usr/local/groundwork/rrd/localhost_All_Partitions.rrd:_boot:AVERAGE CDEF:cdefa=a,8,* LINE:cdefa#8DD9E0:"_boot" GPRINT:cdefa:LAST:" Current\:%8.2lf %s" GPRINT:cdefa:AVERAGE:" Average\:%8.2lf %s" GPRINT:cdefa:MAX:" Maximum\:%8.2lf %s" DEF:b=/usr/local/groundwork/rrd/localhost_All_Partitions.rrd:_dev_shm:AVERAGE CDEF:cdefb=b,8,* LINE:cdefb#64A2B8:"_dev_shm" GPRINT:cdefb:LAST:" Current\:%8.2lf %s" GPRINT:cdefb:AVERAGE:" Average\:%8.2lf %s" GPRINT:cdefb:MAX:" Maximum\:%8.2lf %s" DEF:c=/usr/local/groundwork/rrd/localhost_All_Partitions.rrd:root:AVERAGE CDEF:cdefc=c,8,* LINE:cdefc#D3DB00:"root" GPRINT:cdefc:LAST:" Current\:%8.2lf %s" GPRINT:cdefc:AVERAGE:" Average\:%8.2lf %s" GPRINT:cdefc:MAX:" Maximum\:%8.2lf %s" --start 1194999156 --end 1195002756 --height 200 --width 600{code} h4. Creating Remote RRD Graphs h5. Background _GroundWork Monitor_ includes a feature known as the _Remote RRD Graph Web Service_ (_RRGWS_). This section describes this web service, its uses, and configuration. {section}{column:width=50%}The _RRGWS_ was developed to support large configurations spanning multiple _GroundWork_ monitoring servers. In large configurations, the volume of data transferred can become an issue. The largest volumes of data are those associated with performance measures, as this information is useful only if collected regularly. Status information, by contrast, is generated only when the state of the object changes. In the context of multiple _GroundWork_ servers, we can leverage this dynamic to greatly reduce the data transfers between child (polling) servers and parent systems.{column} {column:width=50%} !p_remoterrd1.gif|align=left! {column}{section} !p_remoterrd2.gif|align=right,border=10,bordercolor=#FFFFFF! In this scenario, the polling servers are doing all the active checks, and are collecting performance data. When a host or service changes state, this information is forwarded to the _GroundWork Monitor_ server, but the results of individual checks that do not contain a state change (the vast majority) are not forwarded. All the performance graphs are created and hosted on the child server, yet the user primarily (or even exclusively) uses the _GroundWork_ server to interact with the system. The graphs are displayed from the child server on the _GroundWork_ server by means of the _RRGWS_. This approach entails several components. The child must publish the location of the _RRD_ graphs to the _GroundWork_ server for display. This is done by configuring the child server to forward the location of the _RRDs_, and the graph commands to the _GroundWork_ server. The _GroundWork_ server will similarly be configured, for those services whose graphs are hosted on a child server, to not store the local _RRD_ graph commands or path locations. It will store this information only for checks it performs locally, such as the local _GroundWork_ server checks. This is done automatically. The child server must be configured to send only changes of state to the _GroundWork_ server. This is done using the state changes detected by the status feeder, and by periodically sending heartbeat messages from the child to the _GroundWork_ server, posting the current state of all objects. The heartbeat portion of this operation is expensive, but the interval and rate of transmission can be tuned, so it is deemed acceptable considering the benefit. No one wants to be looking at stale data, so the system should be able to accommodate the trade-off between performance and data age. In any case, state changes will always be forwarded immediately, so critical data will remain up-to-date. h5. Requirements You must have _GroundWork Monitor_ set up and operating as a child server, accepting configuration files from a _GroundWork_ server. It is helpful to configure the child server to forward results to the _GroundWork_ server, as well, although we will not cover this in detail, as the commands are changed. The child server must be able to contact the _GroundWork_ server on the following network ports: * 5667/tcp (for posting results) * 4913/tcp (for posting RRD graph locations) The _GroundWork_ server must be able to contact the child server on the following network ports: * 22/tcp (ssh for configuration transfers) * 80/tcp or 443/tcp for web services h5. Configuration Steps h6. GroundWork Server There is no special configuration to be done on the _GroundWork_ server. h6. Setup Forwarding of RRD Locations The child server is configured with the following procedure: # Edit the file: {code}/usr/local/groundwork/config/perfdata.properties{code} # Un-comment the section: {code}# <foundation_host MYPARENTHOST> # foundation_port = 4913 # child_host = MYHOST # send_RRD_data = true # send_perf_data = false # </foundation_host>{code} to: {code} <foundation_host MYPARENTHOST> foundation_port = 4913 child_host = MYCHILDHOST send_RRD_data = true send_perf_data = true </foundation_host>{code} where: {{MYPARENTHOST}} is the DNS name of the parent server (must be resolveable from the child server). {{MYCHILDHOST}} is the DNS name of the child server. {Note}This CANNOT be localhost or 127.0.0.1. It must be resolveable from the parent server.{Note} Optionally, you may decide not to send the perf_data to the parent. If you send it, this data is posted directly to _Foundation_, and is used in the performance reports under the _Reports_ tab. (These are the EPR reports found under *Reports*>*Report tree*>*Reports*>*Performance Reports*: _epr host_, _epr host multi variable_, _epr hostgroup_, _epr hostgroup multi variable_, and _epr hostgroup topfive_. In contrast, the *Reports*>*Performance View* tool does not support the _Remote RRD_ configuration, as it relies on direct local access to _RRD_ files.) There is a performance load introduced on the parent by sending the detailed perf_data; however, this is not anticipated to be a large load, since the data is bundled. The advantage is that reports run on the parent will have all performance data from the child. You may also choose to keep the performance data on the child, or not to post it at all. The recommended configuration is to send the perf_data to the parent if you have one or two child servers. If you have more than two child servers, you should probably consider another configuration where this data is not sent, as the load on the parent will likely be significant. To +not+ send the perf_data to the parent, change the line in the section above: from {code}send_perf_data = true{code} to {code}send_perf_data = false{code} {Note}DO NOT remove the following section as it is required for operation. You can, however, choose to set send_perf_data=false if you do not want the perf data to be sent to _Foundation_ on the server on which that data originates.{Note} {code} <foundation_host localhost> foundation_port = 4913 child_host = "" send_RRD_data = true send_perf_data = true </foundation_host> {code}To have these changes take effect, you can: # Perform a configuration change (build instance for the group for this child server on the _GroundWork_ server) or, # Kill the process for the process_service_perfdata_file program, at the command prompt, type:{code} ps -ef | grep process_service_perfdata_file {code}You will see output similar to the following:{code} nagios 23260 1 0 Jun14 ? 00:00:42 /groundwork/perl/bin/.perl.bin -I/usr/local/groundwork/perl/lib/5.8.8 -I/usr/local/groundwork/perl/lib/site_perl/5.8.8 -I/usr/local/groundwork/nagios/libexec -I/usr/local/groundwork/perl/custom/lib/5.8.8 -I/usr/local/groundwork/perl/custom/lib/site_perl/5.8.8 -w -- /usr/local/groundwork/nagios/eventhandlers/process_service_perfdata_file root 28917 4118 0 14:06 pts/1 00:00:00 grep process_service_perfdata_file {code}In this case, the _PID_ is 23260 for the process. Kill it by typing:{code} kill 23260 {code}The process will automatically restart in a few minutes. After approximately 10 minutes, the \_GroundWork\_ server interface will begin to show you graphs generated on the child server. h6. Set Up Heartbeat Operation The child server must be set up to forward periodic updates for all hosts and services. This is done with the following procedure: Edit the file: {code}/usr/local/groundwork/config/status-feeder.properties{code} # Change the following lines: {code}send_state_changes_by_nsca=false{code} to {code}send_state_changes_by_nsca=true{code} {code}primary_parent=""{code} to {code}primary_parent="MYPARENTHOST"{code} where {{MYPARENTHOST}} is the DNS name of the parent that will be receiving this data. {color:red}{*}DO NOT{*}{color} neglect the quotes. # Optionally set up any secondary servers by changing the appropriate lines. If you are not using secondary servers, just leave these alone. You do not need to change any of the remaining parameters, but of course, you can tune them to fit your installation. The defaults are: Send heartbeats every hour: {code}nsca_heartbeat_interval = 60 * 60{code} Send full dumps every 8 hours: {code}nsca_full_dump_interval = 8 * 60 * 60{code} Send a maximum of 100 messages at a time: {code}max_messages_per_send_nsca = 100{code} Wait for 2 seconds between batches of results: {code}nsca_batch_delay = 2{code} You may want these less frequently (for large configurations) or more frequently. It depends on the bandwidth available, and the load on the _GroundWork_ server. You can also elect to send the heartbeat in small batches, rather than the default of 100 results, and to open a larger gap between the batches. Be advised that the feeding of results to the database on the child server will be affected if you make the sending of heartbeats too frequent or too long (with many batches and long batch delays), but you may not be concerned about this, as child servers are often not accessed at all by users. # Save the file when you are finished editing. You will need to restart gwservices on the child server when you are done. This can be done by typing the following command: {code}/etc/init.d/groundwork restart gwservices{code} h6. Set Up Forwarding of State Changes and Heartbeats Via Spooler - \[Optional\] The child server can optionally spool results to be forwarded to the parent. This can be useful if, for example, the network link from the child to the parent is intermittent. It also provides a small amount of enhanced reliability for the data transfers. The spooler works in the same way as the _GDMA_ spooler code. It will keep a programmable number of results for a programable interval, and will transmit the saved results when the parent becomes available after an interval of downtime. {Note}The spooler is actually a separate method of sending results for the _NSCA_ method. You probably do not want to use both methods to send results to the same server. You can also send results to one server with _NSCA_ and another with the spooler. It's up to you. Generally, though, if you use the spooler, you will be disabling the _NSCA_ method.{Note} If you monitor the child server from the parent, you should consider setting up a passive service named gdma_spooler on the child host. This will show you the spooler statistics. There's no harm if you do not set it up, but it is sometimes useful to know how much data is flowing in from a given child server. To configure the spooler option, make the following additional changes to the file: {code}/usr/local/groundwork/config/status-feeder.properties{code} # Change: {code}send_state_changes_by_gdma=false{code} to: {code}send_state_changes_by_gdma=true{code} |
Optionally change the defaults for {{gdma_heartbeat_interval}}, {{gdma_full_dump_interval}}, and {{max_unspooled_results_to_save}}. These values are explained in the comments, and are analogous to those for the similarly named _NSCA_ settings. # Next, find the file called {{gwmon_HOSTNAME.cfg}}, in {{/usr/local/groundwork/gdma/config}} where \_HOSTNAME\_ is the name of this child server host. Edit this file and change: |
{code}Spooler_Status=off{code} to: {code}Spooler_Status=on{code} and |
{code}Target_Server="http://gdma-autohost"{code} to: |
... {code}Target_Server="http://PARENTHOSTNAME"{code} where {{PARENTHOSTNAME}} is the name of the parent host to send to. Note that you can specify more than one, in a comma separated list. The name must be specified as a URL. Be sure to uncomment the {{Target_Server}} line. \\ \\ Adjustment of other parameters is optional, and should be done only if necessary. Refer to the _GDMA_ documentation for explanations of the parameters in this file that control the spooler. # Save the file. # Restart _gwservices_ on the child to make this configuration active: {code}service groundwork restart gwservices{code} h5. Considerations h6. perfdata.properties The {{perfdata.properties}} file contains a mixture of scalar-value settings and some XML-like sections. It must be edited manually to preserve this structure. It is listed in the *Administration*>*Foundation*>*Manage Configuration* screen as a file that can be edited there, but attempting to do so will effectively destroy the content of this file. A bug report (GWMON-10097) has been filed to remove this filename from the list of edit-able files in this screen. A copy of the original file is included in the reference section of this document. Editing of the {{perfdata.properties}} file should be done on the child server, not on the parent server, as it is the child server that needs to know what data to send and where to send it. You will need to either do a configuration push to the child (build instance for the child group on the _Groundwork_ server), or restart the process_service_perfdata_file process on the child to have these changes take effect. See above. h6. Encryption Using either the _NSCA_ or _GDMA_ spooler methods makes use of the _NSCA_ program, and the _Bronx_ event broker. These programs are capable of sending and receiving encrypted data, but are set up by default not to do so. Also, the _Bronx_ event broker (which processes the data received at the parent) can support what is known as _wide packets_, or an enhancement to _NSCA_ that makes it much faster in high-load configurations. Wide packets support is enabled by default. If you set up encryption, you should note that: * Encryption must be the same on the parent and all child servers, as well as any system that sends data to the parent via _NSCA_ (for example a _GDMA_ system) * Encryption adds overhead, and may slow down the data transfer process * Encryption is more secure, which can be important in some environments h5. Maintenance There is no special maintenance for this feature. However, keeping track of performance on the _GroundWork_ server is a good idea. If things seem to be slow, it may be a good idea to consider adjusting the heartbeat frequency to a less frequent interval. If you do so, ensure that any freshness checks on the _GroundWork_ server hosts and services are synchronized with this interval. Freshness intervals for passive checks should always be kept longer than the update cycles to avoid false positive results. h5. References h6. perfdata.properties This is the default file contents (with comments). {code}# perfdata.properties # # Copyright 2010 GroundWork Open Source, Inc. ("GroundWork") # All rights reserved. This program is free software; you can # redistribute it and/or modify it under the terms of the GNU # General Public License version 2 as published by the Free # Software Foundation. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA # 02110-1301, USA. ###################################################################### ## GroundWork Performance Data Processing Configuration Properties ###################################################################### # The values specified here are used to control the behavior of the # process_service_perfdata_file script. # Possible debug_level values: # 0 = no info of any kind printed, except for startup/shutdown # messages and major errors # 1 = print just error info and summary statistical data # 2 = also print basic debug info # 3 = print detailed debug info debug_level = 1 # Create and update RRD files. [true/false] process_rrd_updates = true # Use the newer XML web-service API to post performance data to the # Foundation databases configured below. Highly recommended, for # efficiency. [true/false] post_performance_using_xml = true # How many performance data updates to bundle together in a single # message to Foundation, when the XML web-service API is used. This # is a loose limit; it is only checked after adding all the data for # a {host, service}, which might contain multiple performance values. max_performance_xml_bundle_size = 20 # A limit on the number of items sent to Foundation in a single # packet. max_bulk_send = 200 # Timeout, specified in seconds, if the older HTTP API is used # to post performance data to the Foundation database. foundation_http_submission_timeout = 2 # Timeout, specified in seconds, to address GWMON-7407. # The usual value is 30; set to 0 to disable. socket_send_timeout = 30 # Specify whether to use a shared library to implement RRD file # access, or to fork an external process for such work (the legacy # implementation). Set to true (recommended) for high performance, # to false only as an emergency fallback or for special purposes. # [true/false] use_shared_rrd_module_for_create = true use_shared_rrd_module_for_update = true use_shared_rrd_module_for_info = true # Where the rrdtool binary lives. rrdtool = /usr/local/groundwork/common/bin/rrdtool # What file to read for results to be processed. This file path is # now defined by the launch_perf_data_processing script, so it cannot # be changed here arbitrarily. service_perfdata_file = /usr/local/groundwork/nagios/var/service-perfdata.dat.being_processed # Where the log file is to be written. debuglog = /usr/local/groundwork/nagios/var/log/process_service_perfdata_file.log # The wait time between cycles of the process_service_perfdata_file # script, which runs as a daemon. Specified in seconds. loop_wait_time = 15 # Whether to emit a log message to Foundation at the end of every processing # cycle where errors or warnings were detected. This is disabled by default # because it can generate a large number of messages when the setup is broken. # But it can be valuable to provide very visible notice that processing problems # are occurring, so you know to look in the debug log for details. [true/false] emit_status_message = false # Specify whether to log messages that tell exactly what the script is doing # at the moment that a termination signal is received. We don't enable # these messages by default because logging i/o routines are not necessarily # re-entrant, which could cause difficulties. But the messages can be enabled # during troubleshooting trials to identify which areas of the script need # improvement in the speed of handling termination signals. [true/false] spill_current_action = false # This section contains the configuration for all access to Foundation # databases. It must include one group of lines for the Foundation # associated with this server (with the child_host value set to an # empty string). Additional groups of lines are needed for parent # servers if you want RRD graphs generated on this child server # (where the process_service_perfdata_file script is running) to be # integrated into Status Viewer on a parent server, or if you want # EPR reports to be created on a server. # # The foundation_host value, specified inside the angle-brackets, is # a qualified or unqualified hostname, or IP address, for a network # interface on which the Foundation of the respective standalone, # child, parent, parent-standby, or report server can be accessed. # Substitute for MYPARENTHOST or MYSTANDBYHOST in the lines below # as needed. The foundation_port is the port number on that network # interface through which Foundation can be contacted. # # The child_host value is a qualified or unqualified hostname, or # IP address, of the machine on which the performance data handling # script (process_service_perfdata_file) is running, as seen by that # particular Foundation server. The specified value must not be # 127.0.0.1 or localhost, and it may be different for access from # different Foundation servers (substitute for MYHOST in the lines # below as needed). This value must be left empty for the child # (or standalone) server's own Foundation. # # The send_RRD_data value [true/false] specifies whether this # Foundation should receive information about RRD graphs. # If child_host is empty, this information will include details # on RRD filenames and graph commands, so graphs can be directly # generated as needed. If child_host is non-empty, this information # will instead include just the child_host value, so this copy of # Foundation will know where to reach to obtain the graph. # # The send_perf_data value [true/false] specifies whether this # Foundation should receive a copy of the detailed performance data. # It should be enabled if and only if this Foundation may be used to # produce EPR reports. # # Lines in this section may be commented out with a leading "#" # character. Uncomment and customize groups of lines here as needed. <foundation> # Local Foundation. It is not a parent server for this data, # so the child_host is set to an empty string to distinguish # this case. send_RRD_data must be true for this entry. <foundation_host localhost> foundation_port = 4913 child_host = "" send_RRD_data = true send_perf_data = true </foundation_host> # Parent-server Foundation, if any. # <foundation_host MYPARENTHOST> # foundation_port = 4913 # child_host = "MYHOST" # send_RRD_data = true # send_perf_data = false # </foundation_host> # Parent-standby-server Foundation, if any. # <foundation_host MYSTANDBYHOST> # foundation_port = 4913 # child_host = "MYHOST" # send_RRD_data = true # send_perf_data = false # </foundation_host> </foundation>{code} |