Check_logfile: generating OK vs Critical states


#1

I’ve got a command that’s 90% of the way there. I’m trying to parse a log for 2 things:

  1. stuck threads, this should be marked as critical
  2. cleared stuck threads, this should be marked as OK

I can get the command to recognize the difference between the two when pointed at a test log, but I can’t seem to figure out a way to have it look at the counts it’s receiving for ok vs crit and calculate if the number of critical events is greater than the number of ok events.

Here’s the command: ./check_nrpe -H test.server.com -c check_logfile -a file=“c:\stdout-stderr.log” “filter=column1 like ‘reported to be stuck’ OR column1 like ‘has been active for’” “ok=column1 like ‘reported to be stuck’” “crit=column1 like ‘has been active for’” top-syntax=’${status} (${crit_list}): ${ok_count}/${crit_count}/${total}’ ‘crit=crit_count>ok_count’

Here’s the output: CRITICAL (has been active, has been active): 3/2/8

So I can see that it recognizes that there are 3 ‘OK’ events in the test log file and 2 marked as ‘critical’, but it still flags the whole thing as critical and would therefore generate an alert even though there are more 'ok’s than 'crit’s. Is there a way to tell the command to do a ‘greater than’ and only send an alert when crit is greater than ok, and send a clear when ok is = to crit?


#2

Just in case anyone is trying something similar this is what I got back on the NagiosXI forum:

I wasn’t able to find an option with the nsclient module but was able to create a simple script to wrap the check in. Definitely room for improvement but it demonstrates the idea:

#!/bin/bash

results=`/usr/local/nagios/libexec/check_nrpe -H $1 -c check_logfile -a file="c:\stdout-stderr.log" "filter=column1 like 'reported to be stuck' OR column1 like 'has been active for'" "ok=column1 like 'reported to be stuck'" "crit=column1 like 'has been active for'" top-syntax='${ok_count} ${crit_count}'`

ok=`echo $results|awk '{ print $1 }'`
critical=`echo $results|awk '{ print $2 }'`

if [[ $critical > $ok ]]; then
        echo "Critical!"
        exit 2
fi

if [[ $ok > $critical ]]; then
        echo "OK!"
        exit 0
fi

That’s working for me.