About Status

Contents

This page reviews the GroundWork Monitor Status application.

1.0 Viewing Monitoring Status

With the Status interface, users have access to various critical views into their company's IT infrastructure including high level status of all servers, applications, networks, and services as they near or breach thresholds.

The majority of the Status application relies on GroundWork-developed technology that extracts, normalizes and stores monitoring status data in a separate embedded database and makes the data available via an application programming interface (API). The interface itself is written using PHP, a widely used open source scripting language, resulting in much higher performance than is possible with the Nagios current interface built using Common Gateway Interface (CGI) scripts. Also, Status utilizes AJAX-based dynamic inter-activity which allows data updates to occur regularly without reloading the entire page.

2.0 Status Screen Layout

Let's start with an overview of the Status application's screen layout. Basically, this page has two main parts, the Tree View portlet on the left side and various monitoring data portlets located on the right. Additionally, above the portlets are tabs for easy access to previously viewed pages.

Figure: Status screen layout

2.1 Navigating Using Tree View

The Tree View portlet provides the tabs; Hosts, Services, and Search. The host and services tabs display lists of monitored host groups and service groups, and the search tab enables quick access to specified monitoring objects based on the entered host, service, alias, and IP address criteria. The listed host and service groups are expandable so you can view detailed host and service status quickly. Each object is preceded by a color-coded status indicator, which is referred to as the bubble up feature, and indicates the most critical state of an underlying object.

When a specific component level is selected the corresponding monitoring data is then displayed within portlets on the right side of the screen. These portlets display status information in various formats including summaries, charts, performance graphs, and events, and allows drilling-down to detailed data. You can apply and execute commands such as scheduling downtime or other actions. The specific portlets that are displayed and the actions that are available vary depending on the component that is selected.

Additionally, when using the Search tab, the search results provide status indicators, mouse over summaries, and drill-down capability to go directly to that components monitoring data. Search results can also be sorted by Host, Host Group, Service, Service Group, and alphabetically.

In our example below, you can see that the host group Linux Services, has a status indicator of UP due to the bubbling up status information of the underlying host localhost that has several OK services. Along with the quick bubble up feature, the Tree View portlet offers quick mouse over object level information where you can mouse over an elements name to view its summary. In our example, we mouse over the host group Linux Servers to display the information, Alias: Linux Servers Summary, Hosts: 2, Troubled Hosts: 0, and Troubled Services: 0.

Figure: Tree view portlet

2.2 Host and Service States

The table below summarizes the icons which represent host group/host, and service group/service states. Host icons are square and service are round. The parent node shows the most critical state,(e.g. a Host Group will be displayed as Down if any of the underlying Hosts are in a Down state).

Hosts Host Groups
Down Unscheduled - A host is in a non-OK state and it has been rechecked the number of times specified by the max_check_attempts option in the host definition.
Down Scheduled - A host is in a non-OK state and it has been rechecked the number of times specified by the max_check_attempts option in the host definition. Hosts are scheduled for downtime.
Warning - Host or host group that eventually need attention.
Unreachable - A host is unreachable or is in a non-OK state. This directive is specified by the notification_options argument in the host definition.
Pending - Usually temporary and means that the state has not yet been determined.
Up - A host is in an OK state and fully operating.
Service Service Groups
Critical Unscheduled - Services for a host or host group is unavailable or down and needs immediate attention.
Critical Scheduled - Services for a host or host group is unavailable or down and needs immediate attention. Services are scheduled for downtime.
Warning - Services for a host or host group that eventually need attention.
Unknown - Services for a host or host group that is unrecognized and cannot be categorized in one of the other states.
Pending - Services for all hosts and host groups that have not yet been determined. Pending status usually does not stay for a long period of time.
OK - Number of services for all hosts and host groups where the monitor is OK or correctly functioning.

3.0 Status Views

3.1 Entire Network Level

Upon launching Status you will see a page showing the first level of monitored components; Entire Network, which displays a one page high-level summary of your IT infrastructure status. It provides a quick summary of aggregated monitoring information including breaches, warnings, and OK summaries for the entire network.

Portlet views

This screen displays the Filters portlet, and the various Status Summary portlets, along with a Nagios Monitor Statistics table for the entire network.

  • Filters portlet - This portlet allows specific host and or service states to be filtered and displayed. Selections made in these drop-downs affect the displayed contents of all other portlets on the Status page.
  • Status Summary portlets - These portlets show monitoring statistics at the Entire Network, Host and Service Groups, and Host levels with drill-down capability for viewing detailed data. The graphs provide an at-a-glance filtered view of your overall host and service status representing segments of host states and service states. The displayed values indicate the number of filtered and total hosts and services in a specific state (e.g. Down, Up). The total columns indicate the total number of monitored hosts or services. By selecting a status summary link you will be presented with a popup window listing associated objects of the selected status. You can then continue to drill-down into one of the listed components for additional detail and you can directly acknowledge a host or service problem.
Nagios monitor statistics table

This table shows the number of services and hosts that have a monitoring feature setting that is enabled or disabled, in this case for the entire network. The corresponding numerical values are displayed in the host and service status summaries mentioned above. The color-coded indicators show the global state of the feature. Feature settings include Active Checks, Passive Checks, Notifications, Flap Detection, and Event Handlers for host and services (described in the table below). This portlet is also viewable at the host group level.

Active Checks Active checks are host and service checks that are initiated by Nagios.
Passive Checks Passive checks are host and service checks that are performed by external applications.
Notifications Each host and service definition contains options that determine whether or not Notifications can be sent out.
Flap Detection Flapping occurs when a host or service changes state too frequently, resulting in a storm of problem and recovery Notifications. Flapping can be indicative of configuration problems (i.e. thresholds set too low) or real network problems.
Event Handlers Event Handlers (host and service) are optional commands used to proactively fix problems before anyone is notified. They are executed whenever a host or service state change occurs.

Figure: Entire network view

3.2 Host Group Status Level
Portlet views

In this section we'll drill-down into the next level, Host Groups (e.g. ESX:morges.groundwork.groundworkopesource.com). Also note in our example below, you can see that this host is part of two custom groups; VE-VMWare and ESX-CORP, where ESX:morges.groundwork.groundworkopesource.com is the host group with its status displayed. A [Custom Groups] is a collection of host group or service group objects at the user interface level which allows for more freedom to group business functions, locations or infrastructure setup (e.g. workstations, servers, services) and make them more user accessible.

  • Host Group Health portlet - This portlet provides quick status for the selected host groups host and service availability. A color coded status indicator shows the host group's, as a parent node, most critical state, (e.g. a Host Group will be displayed as Down if any of the underlying Hosts are in a Down state).
  • Filters portlet - As in the Entire Network level outlined above, the host group level also provides a filters portlet to allow specific host and or service states to be filtered and displayed. Selections made in these drop-downs affect the displayed contents of all other portlets on the Status page.
  • The Status Summary portlets - These portlets show monitoring statistics with drill-down capability for viewing detailed data.
  • Events, Nagios Statistics, Host List - At the bottom of the page you can expand the view to display Events, Nagios Monitoring Statistics for the selected host group, and a Host List of all hosts that are part of the selected host group being displayed. The Events portlet is an embedded Event Console application which provides an event list for the entire network along with the capability of applying actions (e.g. Accept Log Message, Notify Log Message, Nagios Acknowledge), sorting, and pausing incoming events. The Host List portlet provides a listing and status of the host group's hosts. Each device name (Host Name) can be selected to drill-down into more detail, and each host problem can be directly acknowledged.

    Figure: Host group level status
3.3 Host Status Level
Portlet views

Here we take a look at the Host level (e.g. cent-6-64-template). The host level provides the portlet views;

  • Actions portlet - At the host level the Actions portlet provides the command categories; Acknowledge, Downtime, Notifications, Settings, Event Handlers, Check Results and Connections. These settings are reflected in various portlets on this page. The available commands throughout the Status application depends on the current status of the object in question. For example, if a Host is not in a Down state you will not see the action category Acknowledge, or if Event Handlers are already disabled for a host, the Disable Event Handler command for this host will not be listed; however, the Enable Event Handler will be.
  • Host Health portlet - The Host Health portlet provides quick status and information for the selected host. A color-coded status indicator shows the host as a parent node in the most critical state,(e.g. a Host will be displayed as Down if any of the underlying services are in a Down state). In addition, the time the host has been in the current state, and the number of Groups and Parents for this host are conveniently listed. The listed Groups and Parents can be selected to show a popup window with status information and further drill-down capability.
  • Filters portlet - The Filters portlet, mentioned in an earlier section, allows specific service states to be filtered and displayed in all other portlets on the Status page.
  • Status Information portlet - The host level also provides a host Status Information portlet showing detailed status and check information. Here you can directly schedule downtime or disable notifications, and schedule or disable checks.
  • Monitoring Statistics portlet - The Monitoring Statistics for host services portlet shows monitoring statistics with drill-down capability for viewing associated detailed data. In this case, service status for the selected host group.
Host Availability & Performance Measurement, Service List, and Events

At the bottom of this host status screen you can expand the view to display Host Availability, Performance Measurement, the host Service List,  and Events;

  • Host Availability and Performance Measurement - The Recent State Changes (Host Availability) portlet shows the total number of host services and shows dynamic state changes for any of the host's services for an indicated time (e.g. Today, Last 24 Hours). Also provides RRD graphs for the various host services. These graphs are arranged in the same order of the Host Availability portlet above and aligned too by their timelines. The integrated performance graphs display time-series data such as; network bandwidth, CPU utilization, machine-room temperature, transaction response times, and server load averages. Note, graphs may not be available for a service unless configured by your System Administrator using the Configuration>Performance feature in GroundWork Monitor. See the Performance reference section for more information on configuring performance graphs.
  • The Service List portlet lists the hosts services and their status. Each Service Name can be selected to drill-down for more detail, and each service problem can be directly acknowledged.
  • The Events portlet displays an integrated console, in this level the console provides an event list for the selected host along with the capability of applying actions.

    Figure: Host level status
3.4 Service Status Level

The bottom level is Service status (e.g. syn.vm.cpu.cpuToMax.used). The service level provides the portlet views;

Portlet views
  • Service Health portlet - As in the previously reviewed Host Health portlet, the Service Health portlet provides quick status and information for the selected service. A color-coded status indicator shows the service state along with the time the host has been in the current state. Listed are associated hosts and number of groups for this service, both of which you can drill-down into for more detailed information.
  • Status Information portlet - As we viewed in the host level, similarly the service level provides a service Status Information portlet showing detailed status and check information for the selected service. Here you can directly schedule downtime or disable notifications, and schedule or disable checks.
  • Service Availability & Performance Measurement - Provide host service state changes and performance measurement RRD graphs for the selected service and events. 
  • The Events portlet displays an integrated console, in this level the console provides an event list for the selected host along with the capability of applying actions.

    Figure: Service level status

4.0 Command Descriptions

4.1 Host Group Commands
Downtime Schedule Downtime for all Hosts - This command is used to schedule downtime for all hosts in a particular host group. During the specified downtime, Nagios will not send notifications out about the hosts. When the scheduled downtime expires, Nagios will send out notifications for the hosts as it normally would. Scheduled downtimes are preserved across program shutdowns and restarts. Both the start and end times should be specified in the following format: mm/dd/yyyy hh:mm:ss. If you select the fixed option, the downtime will be in effect between the start and end times you specify. If you do not select the fixed option, Nagios will treat this as 'flexible' downtime. Flexible downtime starts when a host goes down or becomes unreachable (sometime between the start and end times you specified) and lasts as long as the duration of time you enter. The duration fields do not apply for fixed downtime.

Schedule Downtime for all Services - This command is used to schedule downtime for all services in a particular host group. During the specified downtime, Nagios will not send notifications out about the services. When the scheduled downtime expires, Nagios will send out notifications for the services as it normally would. Scheduled downtimes are preserved across program shutdowns and restarts. Both the start and end times should be specified in the following format: mm/dd/yyyy hh:mm:ss. If you select the fixed option, the downtime will be in effect between the start and end times you specify. If you do not select the fixed option, Nagios will treat this as 'flexible' downtime. Flexible downtime starts when a service enters a non-OK state (sometime between the start and end times you specified) and lasts as long as the duration of time you enter. The duration fields do not apply for fixed downtime. Note that scheduling downtime for services does not automatically schedule downtime for the hosts those services are associated with. If you want to also schedule downtime for all hosts in the host group, check the 'Schedule downtime for hosts too' option.
Notifications Disable Notifications for all Hosts - This command is used to prevent notifications from being sent out for all hosts in the specified host group. You will have to re-enable notifications for all hosts in this host group before any alerts can be sent out in the future.

Enable Notifications for all Hosts - This command is used to enable notifications for all hosts in the specified host group. Notifications will only be sent out for the host state types you defined in your host definitions.

Disable Notifications for all Services - This command is used to prevent notifications from being sent out for all services in the specified host group. You will have to re-enable notifications for all services in this host group before any alerts can be sent out in the future. This does not prevent notifications from being sent out about the hosts in this host group unless you check the 'Disable for hosts too' option.

Enable Notifications for all Services - This command is used to enable notifications for all services in the specified host group. Notifications will only be sent out for the service state types you defined in your service definitions. This does not enable notifications for the hosts in this host group unless you check the 'Enable for hosts too' option.
Settings Disable Active Checks for all Services - This command is used to disable active checks of all services in the specified host group. This does not disable checks of the hosts in the host group unless you check the 'Disable for hosts too' option.

Enable Active Checks for all Services - This command is used to enable active checks of all services in the specified host group. This does not enable active checks of the hosts in the host group unless you check the 'Enable for hosts too' option.
4.2 Host Commands
Acknowledge Acknowledge Problem - This command is used to acknowledge a host problem. When a host problem is acknowledged, future Notifications about problems are temporarily disabled until the host changes state (e.g. recovers). Contacts for this host will receive a Notification about the Acknowledgment, so they are aware that someone is working on the problem. Additionally, a comment will also be added to the host. Make sure to enter your name and fill in a brief description of what you are doing in the comment field. If you would like the host comment to be retained between restarts of Nagios, check the Persistent check box. If you do not want an Acknowledgment Notification sent out to the appropriate Contacts, uncheck the Send Notification check box.

Remove Acknowledgement of Problem - This command is used to remove an Acknowledgment for a particular host problem. Once the Acknowledgment is removed, Notifications may start being sent out about the host problem. Note: Removing the Acknowledgment does not remove the host comment that was originally associated with the Acknowledgment. You'll have to remove that as well if that's what you want.
Downtime Schedule Downtime - This command is used to schedule downtime for a particular host. During the specified downtime, Nagios will not send notifications out about the host. When the scheduled downtime expires, Nagios will send out notifications for this host as it normally would. Scheduled downtimes are preserved across program shutdowns and restarts. Both the start and end times should be specified in the following format: mm/dd/yyyy hh:mm:ss. If you select the fixed option, the downtime will be in effect between the start and end times you specify. If you do not select the fixed option, Nagios will treat this as 'flexible' downtime. Flexible downtime starts when the host goes down or becomes unreachable (sometime between the start and end times you specified) and lasts as long as the duration of time you enter. The duration fields do not apply for fixed downtime.
Notifications Delay Next Notification - This command is used to delay the next problem notification that is sent out for the specified host. The notification delay will be disregarded if the host changes state before the next notification is scheduled to be sent out. This command has no effect if the host is currently UP.

Disable Notifications for All Services on Host - This command is used to prevent notifications from being sent out for all services on the specified host. You will have to re-enable notifications for all services associated with this host before any alerts can be sent out in the future. This does not prevent notifications from being sent out about the host unless you check the 'Disable for host too' option.

Enable Notifications for All Services on Host - This command is used to enable notifications for all services on the specified host. Notifications will only be sent out for the service state types you defined in your service definition. This does not enable notifications for the host unless you check the 'Enable for host too' option.

Disable Notifications - This command is used to prevent notifications from being sent out for the specified host. You will have to re-enable notifications for this host before any alerts can be sent out in the future. Note that this command does not disable notifications for services associated with this host.

Enable Notifications - This command is used to enable notifications for the specified host. Notifications will only be sent out for the host state types you defined in your host definition. Note that this command does not enable notifications for services associated with this host.
Settings Disable Active Checks on Host - This command is used to temporarily prevent Nagios from actively checking the status of a particular host. If Nagios needs to check the status of this host, it will assume that it is in the same state that it was in before checks were disabled.

Enable Active Checks on Host - This command is used to enable active checks of this host.

Disable Passive Checks on Host - This command is used to stop Nagios from accepting passive host check results that it finds in the external command file for a particular host. All passive check results that are found for this host will be ignored.

Enable Passive Checks on Host - This command is used to allow Nagios to accept passive host check results that it finds in the external command file for a particular host.

Disable Active Checks for All Services on Host - This command is used to disable active checks of all services associated with the specified host. When a service is disabled Nagios will not monitor the service. Doing this will prevent any notifications being sent out for the specified service while it is disabled. In order to have Nagios check the service in the future you will have to re-enable the service. Note that disabling service checks may not necessarily prevent notifications from being sent out about the host which those services are associated with. This does not disable checks of the host unless you check the 'Disable for host too' option.

Enable Active Checks for All Services on Host - This command is used to enable active checks of all services associated with the specified host. This does not enable checks of the host unless you check the 'Enable for host too' option.

Start Obsessing Over this Host - This command is used to have Nagios start obsessing over a particular host.

Stop Obsessing Over this Host - This command is used to stop Nagios from obsessing over a particular host.

Disable Flap Detection - This command is used to disable flap detection for a specific host.

Enable Flap Detection - This command is used to enable flap detection for a specific host. If flap detection is disabled on a program-wide basis, this will have no effect.
Event Handlers Disable Event Handler - This command is used to temporarily prevent Nagios from running the host event handler for a particular host.

Enable Event Handler - This command is used to allow Nagios to run the host event handler for a particular Service when necessary (if one is defined).
Check Results Re-Schedule the Next Check - This command is used to schedule the next check of a particular host. Nagios will re-queue the host to be checked at the time you specify. If you select the force check option, Nagios will force a check of the host regardless of both what time the scheduled check occurs and whether or not checks are enabled for the host.

Schedule Check for all Services on this Host - This command is used to scheduled the next check of all services on the specified host. If you select the force check option, Nagios will force a check of all services on the host regardless of both what time the scheduled checks occur and whether or not checks are enabled for those services.

Submit Passive Check Result - This command is used to submit a passive check result for a particular host.
4.3 Service Group Commands
Downtime Schedule Downtime for all Services - This command is used to schedule downtime for all services in a particular service group. During the specified downtime, Nagios will not send notifications out about the services. When the scheduled downtime expires, Nagios will send out notifications for the services as it normally would. Scheduled downtimes are preserved across program shutdowns and restarts. Both the start and end times should be specified in the following format: mm/dd/yyyy hh:mm:ss. If you select the fixed option, the downtime will be in effect between the start and end times you specify. If you do not select the fixed option, Nagios will treat this as 'flexible' downtime. Flexible downtime starts when a service goes down or becomes unreachable (sometime between the start and end times you specified) and lasts as long as the duration of time you enter. The duration fields do not apply for fixed downtime.
Notifications Disable Notifications for all Services - This command is used to prevent notifications from being sent out for all service in the specified service group. You will have to re-enable notifications for all services in this service group before any alerts can be sent out in the future.

Enable Notifications for all Services - This command is used to enable notifications for all service in the specified service group. Notifications will only be sent out for the service state types you defined in your service definitions.
Settings Disable Active Checks for all Services - This command is used to disable active checks of all services in the specified service group.

Enable Active Checks for all Services - This command is used to enable active checks of all services in the specified service group.
4.4 Service Commands
Acknowledge Acknowledge Problem - This command is used to acknowledge a service problem. When a service problem is acknowledged, future notifications about problems are temporarily disabled until the service changes state (i.e. recovers). Contacts for this service will receive a notification about the acknowledgement, so they are aware that someone is working on the problem. Additionally, a comment will also be added to the service. Make sure to enter your name and fill in a brief description of what you are doing in the comment field. If you would like the service comment to be retained between restarts of Nagios, check the 'Persistent' checkbox. If you do not want an acknowledgement notification sent out to the appropriate contacts, uncheck the 'Send Notification' checkbox.

Remove Problem Acknowledgement - This command is used to remove an acknowledgement for a particular service problem. Once the acknowledgement is removed, notifications may start being sent out about the service problem. Note: Removing the acknowledgement does not remove the service comment that was originally associated with the acknowledgement. You'll have to remove that as well if that's what you want.
Downtime Schedule Downtime for this Service - This command is used to schedule downtime for a particular service. During the specified downtime, Nagios will not send notifications out about the service. When the scheduled downtime expires, Nagios will send out notifications for this service as it normally would. Scheduled downtimes are preserved across program shutdowns and restarts. Both the start and end times should be specified in the following format: mm/dd/yyyy hh:mm:ss. option, Nagios will treat this as 'flexible' downtime. Flexible downtime starts when the service enters a non-OK state (sometime between the start and end times you specified) and lasts as long as the duration of time you enter. The duration fields do not apply for fixed downtime.
Notifications Disable Notifications - This command is used to prevent notifications from being sent out for the specified service. You will have to re-enable notifications for this service before any alerts can be sent out in the future.

Enable Notifications - This command is used to enable notifications for the specified service. Notifications will only be sent out for the service state types you defined in your service definition.

Delay Next Notification - This command is used to delay the next problem notification that is sent out for the specified service. The notification delay will be disregarded if the service changes state before the next notification is scheduled to be sent out. This command has no effect if the service is currently in an OK state.
Settings Disable Active Checks on Service - This command is used to disable active checks of a service.

Enable Active Checks on Service - This command is used to enable active checks of a Service.

Disable Passive Checks - This command is used to stop Nagios accepting passive service check results that it finds in the external command file for this particular service. All passive check results that are found for this service will be ignored.

Enable Passive Checks - This command is used to allow Nagios to accept passive Service check results that it finds in the external command file for this particular Service.

Disable Flap Detection - This command is used to disable flap detection for a specific service.

Enable Flap Detection - This command is used to enable flap detection for a specific Service. If flap detection is disabled on a program-wide basis, this will have no effect.
Event Handlers Disable Event Handler - This command is used to temporarily prevent Nagios from running the service event handler for a particular service.

Enable Event Handler - This command is used to allow Nagios to run the Service event handler for a particular Service when necessary (if one is defined).
Check Results Submit Passive Check Result - This command is used to submit a passive check result for a particular service. It is particularly useful for resetting security-related services to OK states once they have been dealt with.

Reschedule Next Check - This command is used to schedule the next check of a particular service. Nagios will re-queue the service to be checked at the time you specify. If you select the force check option, Nagios will force a check of the service regardless of both what time the scheduled check occurs and whether or not checks are enabled for the service.
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.