Managing Downtimes

Contents

This page covers the new configuration tools Downtimes and Recurring Downtimes for managing scheduled downtime. Scheduling downtimes is very useful during system maintenance as they enable GroundWork Monitor to stop sending messages during a down time period which also helps produce more accurate data in SLA reports. We start first with how to enable these tools and continue with the setting up regular and recurring downtimes.

Enabling the Downtimes Tool

Use the attached archive to put the additional script in to place and then enable the cronjob.

  1. Download the attached file to your Groundwork server:
    ZIP Archive TB6.7.0-6.downtime_job.tar.gz
  2. Extract the archive:
    run as nagios user
    tar xf TB6.7.0-6.downtime_job.tar.gz -C /
  3. The crontab entry to run this script is commented in the release. Uncomment it to start the job:
    run as nagios user
    crontab -l | sed -e '/downtime/s/^\#//' | crontab -
  4. If you want it to run this more frequently than once an hour, you will need to change the coding. This example runs at 1 minute past the hour every hour. The authors suggest once an hour or once a day. If you are running less frequently than once a day you are apt to miss some requests for recurring downtime.
How It Works
Regular Downtime

When you set up regular downtime scheduling in the next section, your selections are turned into external commands for the running Nagios. Suppose you choose the following:

  • Downtime of 60 minutes
  • For a hostgroup "Linux Servers"
  • Starting at 03:00 on December 22, 2012

As soon as you make the selection and press the final Add button, you can immediately look at the status.log file and view the addition. At this point Nagios has been advised and the downtime will be respected.

Recurring Downtime

The crontab entry and the newly added script work on recurring downtime selections. Making choices in the Recurring Downtimes panel will produce entries in the following file. The entries persist in that file until they are deleted using the application.

/usr/local/groundwork/nagios/var/downtime_schedule.cfg

That file is read by the regularly scheduled script:

/usr/local/groundwork/nagios/bin/downtime_job.pl

The entries are examined for jobs that might be significant in the coming 24 hours. Any entries found are converted into external commands and sent to Nagios. As with downtime initiated in any other way, Nagios will undertake the removal of the status.log entry upon the passage of the designated time. If you remove Downtime requests through the application the corresponding entries in the configuration file and the status.log will be removed accordingly.

Debug comments are produced on the command line. If you need to see them you must become user nagios and run the command at a prompt. Here is an example of the configuration file downtime_schedule.cfg:

define schedule {
    	sid		794192837294efd1cef4ad6b13969b67
    	user		rstools
    	comment		test linux
    	time		23:20
    	duration		60
    	days_of_week		mon,tue,thu,fri,sat,sun
    	days_of_month		1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,20,21,22,23,24,25,26,27,28,29,30,31
    	schedule_type		hostgroup
    	hostgroup_name		Linux Servers
    }

    define schedule {
    	sid		b0ea8267f9557b760c85bcdb745ef81b
    	user		rstools
    	comment		web maint
    	time		01:00
    	duration		600
    	days_of_week		sun
    	days_of_month		1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31
    	schedule_type		hostgroup
    	hostgroup_name		Apps-MemberWeb
    }

Here we run the downtime script and see the debug comments on line:

run as nagios user

    -bash-4.1$ /usr/local/groundwork/nagios/bin/downtime_job.pl
    Reading in configuration
    Reading in status log to get list of services
    Reading /usr/local/groundwork/nagios/var/objects.cache...Done.
    Reading in list of already scheduled downtime
    Reading /usr/local/groundwork/nagios/var/nagiosstatus.sav ...Adding 172.28.113.155:1355985900
    Adding 172.28.113.155:trap_unknown:1355985900
    Done.
    Checking for downtime due in next 10080 minutes
    test linux:
    Current candidate: 23:20 on 19/12/2012
    Current candidate: 23:20 on 20/12/2012
    Checking days of week: days (0,1,2,4,5,6) are valid
    Scheduling for day 4 (today is 3, looking at scheds for 4 and later)
    Current candidate: 23:20 on 20/12/2012
    Scheduling hostgroup Linux Servers
    Checking hostgroup representative localhost:1356074400
    Sending command 'SCHEDULE_HOSTGROUP_HOST_DOWNTIME;Linux Servers;1356074400;1356078000;1;0;0;rstools;AUTO: test linux
    '
    SCHEDULE_HOSTGROUP_HOST_DOWNTIME;Linux Servers;1356074400;1356078000;1;0;0;rstools;AUTO: test linux

    Sending command 'SCHEDULE_HOSTGROUP_SERVICE_DOWNTIME;Linux Servers;1356074400;1356078000;1;0;0;rstools;AUTO: test linux
    '
    SCHEDULE_HOSTGROUP_SERVICE_DOWNTIME;Linux Servers;1356074400;1356078000;1;0;0;rstools;AUTO: test linux

    web maint:
    Current candidate: 01:00 on 19/12/2012
    Current candidate: 01:00 on 20/12/2012
    Checking days of week: days (0) are valid
    Advancing a week to day 0
    Current candidate: 01:00 on 23/12/2012
    Scheduling hostgroup Apps-MemberWeb
    Checking hostgroup representative WB_Apps-MemberWeb:1356253200
    Sending command 'SCHEDULE_HOSTGROUP_HOST_DOWNTIME;Apps-MemberWeb;1356253200;1356289200;1;0;0;rstools;AUTO: web maint
    '
    SCHEDULE_HOSTGROUP_HOST_DOWNTIME;Apps-MemberWeb;1356253200;1356289200;1;0;0;rstools;AUTO: web maint

    Sending command 'SCHEDULE_HOSTGROUP_SERVICE_DOWNTIME;Apps-MemberWeb;1356253200;1356289200;1;0;0;rstools;AUTO: web maint
    '
    SCHEDULE_HOSTGROUP_SERVICE_DOWNTIME;Apps-MemberWeb;1356253200;1356289200;1;0;0;rstools;AUTO: web maint

Setting Up Regular Downtimes

This section reviews the Configuration Downtimes feature in GroundWork Monitor where you can schedule downtimes for a single specified date, time, and duration. We'll cover how to list, add, and delete regular downtimes by system hosts, host groups, and service groups.

Listing Downtimes

This command displays all scheduled downtimes and gives you the option to delete selected downtimes.

Using Listing Downtimes
  1. Go to Configuration>Downtimes>List Downtimes, a list of currently scheduled downtimes will be displayed.
    • To remove all scheduled downtimes click Delete all downtime(s).
    • To remove specific scheduled downtimes click the corresponding box at the end of each row, and click Delete selected.
    • You may refresh the current list of downtimes by selecting Refresh, as deleting downtimes can take some time to register.

      Figure: List Downtimes
Adding Downtimes

This command enables you to add downtimes for Hosts, Host Groups, and Service Groups

Using Add Host Downtime

Here you can indicate hosts and hosts service for scheduled downtime.

  1. Go to Configuration>Downtimes>Add host downtime, a list of current system hosts will be displayed.
  2. Using the check boxes and drop-down arrow for each host and service, select at least one host and or service to place in downtime. Checking the box in the upper right corner select all services, clicking the drop-down arrow in the upper right corner exposes all of the hosts services.
  3. Next, select Add downtime. A dialog box will be displayed, enter the downtime start time, end time, duration, and comment.
  4. Click Add to add the scheduled host downtime. You can view this downtime using the List downtimes option.

    Figure: Add Downtime by Host
Using Add Host Groups Downtime

Here you can indicate hostgroups, hosts, and services for scheduled downtime.

  1. Go to Configuration>Downtimes>Add hostgroup downtime, a list of current system hostgroups will be displayed, along with their corresponding hosts and services.
  2. Use the check boxes and drop-down arrow to select what to put into downtime. If you check a box for a hostgroup, the hosts and hosts services for that hostgroup will all be selected for downtime. You may also choose to select host(s) or service(s) separately.
  3. After you have made your selection(s), select Add downtime and define the time for the downtime in the next screen.

    Figure: Add Downtime by Host Group
Using Add Service Groups Downtime

Here you can indicate servicegroups, hosts, and services for scheduled downtime.

  1. Go to Configuration>Downtimes>Add servicegroup downtime, a list of current system servicegroups will be displayed, along with their corresponding hosts and services.
  2. Enter at least one servicegroup, host, and service to place in downtime.
  3. After you have made your entries, select Add downtime and define the time for the downtime in the next screen.

    Figure: Add Downtime by Service Group

Setting Up Recurring Downtimes

This section reviews the Configuration Recurring Downtimes feature in GroundWork Monitor. The previous option Downtimes lets you schedule downtimes for a single specified date, time, and duration. With this feature you can schedule downtimes for Hosts, Hostgroups, and Servicegroups with a recurring time, duration, days of the week, and days of the month. For example you can set up to have a downtime at 8PM, for 1 hour, on every second Friday of every month.

Adding a Schedule
  1. Go to Configuration>Recurring Downtimes.
  2. Select to add a host, hostgroup, or servicegroup downtime by selecting the corresponding tab and then selecting Add schedule.
  3. Next, you will need to define the recurring downtime as shown in the image below. Enter the name of the host, hostgroup, or servicegroup and any specific service(s). Wildcards can be used to specify multiple matches. Then, enter a start time, duration, and a comment describing the purpose of the downtime, also enter the valid days for the downtime, keep in mind it's the check boxes left that become valid:
    Days of Week and Days of Month this schedule is valid:
    • If you specify Days of Week and Days of Month, then both must match!  In our example we specify Days of Week = Friday and Days of Month = 8, 9, 10, 11, 12, 13,  and 14 which are the Fridays of our months.
    • If any Days of Week are selected, the remaining days of week become valid. In our example below the gray boxes were selected and the checked boxes become the valid days.
    • If any Days of Month are selected, the remaining days are valid. Again, the checked boxes indicate what days are valid.
  4. Select Create. You will see you recurring downtime listed. You can use the edit, delete, and copy icons for each scheduled downtime.

    Figure: Defining Recurring Downtimes


    Figure: Recurring Downtimes Example
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.