Your GroundWork Monitor (like any other application) may be subject to an unexpected failure. This stuff happens, and it always happens at the worst time. Like 5:45pm on a Friday night when you are leaving for a three-day weekend. Yes, we speak from sad experience. You might have a hardware failure, or if you are running virtual instances, just an outage at the hypervisor level. Often the cause is human error: an unintended action that deletes important files. No matter what the cause, when you find that GroundWork Monitor is no longer responsive, it may be necessary to put a new server or instance in place with the most recent image you can recover. If the outage is not a complete loss, a less extensive recovery may be sufficient, and while GroundWork support is here to help, you can give a a lot or a little to work with, and help us to get you great results.
As usual, it depends! You might just be running GroundWork as a virtual machine or instance, in which case you should snapshot it regularly, and automate that process. We will assume you aren't doing this, or that it is otherwise impractical to do so. Even if you use snapshots, though, don't forget to build a test machine from them in a maintenance window, just to be sure you can. See "Restore" below.
To be able to restore GroundWork Monitor to working condition, you must have a regular, automatic backup process that captures copies of the system. These copies must ideally be stored in a safe location, that is: not on the same server or storage which is being backed up, but in a physically separate facility. Often this can be arranged as a simple NFS mount, for example.
In this tech tip we can't define your ideal backup location, or course. Suffice to state, as soon as the backup is complete it should be possible to access it and use it even if the GroundWork Server is powered off.
So, this TechTip provides a typical, best practice example. Please put this into place. It's a matter of time before something breaks, and if you are prepared, you will look like a genius. So, by extension, this is a tech tip on how to look like a genius.
A good place to put the backups (like a mounted drive on the GroundWork Server)
Access as the root user to all of the GroundWork Servers. If you have multiple GroundWork servers in a distributed architecture, you are going to set this up for each one.
We are providing examples of two kinds of backup, here. The first is a dump of the databases and the RRD files, and it can be run without interrupting monitoring. The second demands a maintenance window, because it creates a complete recovery snapshot of GroundWork in a quiescent state.
Here is an example "cron" job that you can add to the root user's crontab. We are expecting that you would place the scripts in the "/usr/local" directory. Of course, it is up to you to choose the place:
05 00 * * 0 /usr/local/gw-weekly-db-bkup.sh 2>&1
00 04 * * 7 [ $(date +%d) -le 07 ] && /usr/local/gw-monthly-complete-bkup.sh 2>&1
Here is the first script (gw-weekly-db-bkup.sh) referenced in the cron job, which runs once a week and backs up the databases and the RRD files:
Please this script in the directory you choose, and make sure it is executable.
Here is the second script (gw-monthly-complete-bkup.sh) referenced in the cron job, which runs on the first Sunday of each month. It uses the GroundWork backup utility which is described in an extensive KB article:
Here is the How To for the backup utility. If you do not have the latest version of the backup utility, use the link to download and install it on your system. Make sure the script above references it, if the name changes.
Restoring a system may involve using one or the other of the elements generated above. If you were so unlucky as to lose the whole system, you can follow the directions here, using the most recent full backup you created:
Then you may ADDITIONALLY follow directions for restoring one or more of the databases, for which you might have a more recent backup. Look here for the detail on how to do that:
Restoring the RRD files is just an unpacking of the saved archive. Find the most recent copy in your backup directory and run this command, substituting the name of the tar file you find:
Test the restore, however you do the backups and whatever you choose to save. You will want to have a maintenance window in which you turn off the running GroundWork system(s) and create new servers on which to restore the collected backup files. When the new servers are restored, and have been assigned the same name and IP address as the existing server, bring up the GroundWork Monitor and validate that the system is operational and that monitoring is accurate and reliable.
Make sure you can reboot the system and have GroundWork come up and start monitoring without intervention. Power failures can happen too...
In our example we show backups running once a week or once a month. You might like to do it more frequently.
If you need our help, we will of course do whatever it takes to get you running again. If you just put these simple steps in place, though, we will be far faster at doing so. Thanks for reading!