Personal tools
You are here: Home Monitoring Heartbeat Monitors
Document Actions

Heartbeat Monitors

by admin last modified 2006-05-24 05:26

Add a heartbeat monitor to make sure you keep on receiving data.


It makes sense to monitor the status of a system, but what happens when no data is received by the monitoring system?  MonaLisa doesn't naturally have a way of querying "When did I last hear from the headnode?", but a heartbeat monitor through the alerts system can keep track of precisely this.  The format of a heartbeat filter is this:

<heartbeat farm="My Farm" cluster="My Cluster" node="My Node" param="My Param" timeout="300">
  <!-- list actions to take here -->
</heartbeat>
If the timeout value - specified in seconds - is not given, then it defaults to 300.  The best place to use the heartbeat is to check and see if your MonaLisa service is still alive from another MonaLisa service; in Nagios terminology, peering.  For example, on our osg-test2 host, we have the following Alert.xml:
<?xml version="1.0" encoding="UTF-8" ?>
<filters>
  <heartbeat farm="red.unl.edu" cluster="MonaLisa" node="localhost-gone" param="Load5" timeout="40">
    <print> Cluster headnode is down! </print>
    <email>
      <from>alerts@osg-test2.unl.edu</from>
      <to>some-address@your-site.edu</to>
      <subject>$FARM is down.</subject>
      <text>This is an automated MonaLisa alert.  To filter it out, simply filter any messages containing "FILTER_MONALISA".

$FARM has stopped sending heartbeat messages.  Please check.
      </text>
    </email>
  </heartbeat>
</filters>

This will let us know if our headnode has gone down.


Powered by Plone, the Open Source Content Management System