GWME-7.1.0-2 - Archive Patch

Key
This line was removed.
This word was removed. This word was added.
This line was added.

Changes (24)

View Page History
h1. Problem

The GroundWork Monitor 7.1.0 release did not fully implement for the archive database some changes that had been made in the runtime database. Consequently, archiving is broken in the 7.1.0 release. This happens both for fresh installs of 7.1.0 and for upgrades to 7.1.0 from some previous release. The broken archive database causes archiving to fail, which causes the normal purging of old records in the runtime database to be skipped, which causes records to build up to an unacceptable level in the runtime database, slowing down performance and causing some indirect hiccups.
The archive database in GroundWork Monitor 7.1.0 requires schema changes for regular archival to complete successfully. Consequently, archiving is broken in the 7.1.0 release. This affects both fresh installs of 7.1.0 and for upgrades to 7.1.0.

A secondary issue, not related to the archive database structure issue just mentioned, is that archiving has been removing certain old data points from the runtime database which might be needed for availability graphing in the Status Viewer. Bundled into this package are upgraded archiving scripts that perform more selective delete operations in the runtime database, preserving the important data points even though they would have otherwise aged out. This fix means the archiving software will no longer remove useful data, but by itself it does not replace the important data data points which have previously been deleted from the runtime database. A fix for that will come in a future patch now under development.
A secondary issue, is that archiving was overly aggressive removing records required for the Availability Graphs in Status Viewer to display properly after older data had been aged off.

h1. Solution

The {{TB7.1.0-2.archive-fixes.tar.gz}} tarball attached to this article provides replacement files for the 7.1.0 release to fully implement the archive database and get daily archiving back in order.
To address the above issues we are providing a patch. Bundled into this patch are upgraded archiving scripts that perform more selective delete operations in the runtime database, preserving the important data points even though they would have otherwise aged out.

This fix means the archiving software will no longer remove useful data, but by itself it does not replace the important data points which have previously been deleted from the runtime database.

The {{TB7.1.0-2.archive-fixes.tar.gz}} tarball attached to this article provides replacement files for the 7.1.0 release to get daily archiving back in order.

{attachments}

{noformat}
cd /usr/local/groundwork/config

diff log-archive-receive.conf.orig log-archive-receive.conf
diff log-archive-send.conf.orig log-archive-send.conf
{noformat}
# Edit as needed:
{noformat}
vim log-archive-receive.conf
vim log-archive-send.conf
h2. Additional steps for a freshly-installed 7.1.0 release

If your system was installed with 7.1.0 without upgrading that system from a previous release, follow these steps. This will re-create the archive database from scratch. Since archiving never worked in 7.1.0 before, and your 7.1.0 system was a fresh install not upgraded from a prior release, you won't be destroying any existing archive data by following these steps.

On the other hand, if you followed some process like installing a fresh 7.1.0 on a new server and then importing {{gwcollagedb}} and {{archive_gwcollagedb}} databases from an older server before starting production on the new server, you must instead follow the procedure in the "Additional steps for systems upgraded to 7.1.0 from a previous release" section, below.
{warning}If you upgraded to 7.1.0 or migrated a {{gwcollagedb}} and/or {{archive_gwcollagedb}} database from an earlier version, you must instead follow the procedure in the [Additional steps for systems upgraded to 7.1.0 from a previous release|#GWME-7.1.0-2-ArchivePatch-Additionalstepsforsystemsupgradedto7.1.0fromapreviousrelease] section, below.{warning}

Run the following commands in a {{bash}} shell. Each run of {{$psql}} will ask for a password. Respond with the administrative password of the PostgreSQL-database {{postgres}} user.
$psql -W -U postgres -d archive_gwcollagedb -f $scriptdir/GWCollage-Version.sql
{noformat}
Do not worry about the following message that appears when you run the {{Archive_GWCollageDB_extensions.sql}} script:
{quote}
NOTICE: constraint "host_hostname_key" of relation "host" does not exist, skipping
{quote}
message that appears when you run the {{Archive_GWCollageDB_extensions.sql}} script. It is only a NOTICE, not a WARNING or ERROR, and it is normal and expected.

h2. Additional steps for systems upgraded to 7.1.0 from a previous release
h2. Notes for all types of 7.1.0 installs

{info}Below is background data to help understand what to expect from the updated archiving process.{info}

Archiving is normally scheduled to run at 00:30 each morning, via a {{nagios}}\-user {{nagios}} user {{cron}} job. You can check to see whether it succeeded or failed by looking at the archiving log files:
{noformat}
/usr/local/groundwork/foundation/container/logs/log-archive-receive.log
tail /usr/local/groundwork/foundation/container/logs/log* | egrep -i 'SUCCEEDED|FAILED'
{noformat}
Archiving cycles cannot be run immediately back-to-back. The {{minimum_additional_hours_to_archive}} parameter in the {{config/log-archive-send.conf}} enforces a minimum delay period between cycles. Don't mess with this in an attempt to speed up the transfer of data from the runtime database to the archive database; it won't work.
Archiving cycles cannot be run immediately back-to-back. The {{minimum_additional_hours_to_archive}} parameter in the {{config/log-archive-send.conf}} enforces a minimum delay period between cycles. This setting should not be altered as it will not achieve any speedup in the archival process.

After the steps listed above, daily archiving should run without error. Given that there will have been many records in the runtime database that have built up since the 7.1.0 release was installed, the first few runs of archiving may take a fair amount of time to run. Take that into account. Also, records are not deleted from the runtime database until that data has lain in the archive database for a few days; this is controlled by the {{post_archiving_retention_days_for_messages}} and {{post_archiving_retention_days_for_performance_data}} configuration parameters in the {{config/log-archive-send.conf}} file. This is a good thing; you should allow it to happen without interference.
After the steps listed above, daily archiving should run without error. If there number of records ready to be archived is of sufficient size, it should be expected that the first few runs of archiving may take a fair amount of time to run.

In the normal configuration, both of those parameters are set to 2 (days). You should pretend that value is 3, because the determination of a full "day" might depend on exact timing of the daily archiving runs, which can vary somewhat. Once that time period has passed, the archiving will have been seen to be working (via the log files mentioned above), the runtime database will have been pruned back, and your system should be operating more smoothly.
For safety reasons, records are not deleted from the runtime database until that data has lain in the archive database for a few days; this is controlled by the {{post_archiving_retention_days_for_messages}} and {{post_archiving_retention_days_for_performance_data}} configuration parameters in the {{config/log-archive-send.conf}} file. Normal operation will not require these settings to be changed.

h3. First archiving run after fixes are installed