Check_network gives total bits (bytes?), not bps


#1

I’m testing check_network on one of our servers. It kept coming back as critical to Nagios, so i jumped into test mode to see what’s going on. I run the command manually, and the result number is a ridiculously large number. Please see the output:

check_network L cli CRITICAL: : Intel® 82574L Gigabit Network Connection >2846316439 <2846316439 bps L cli Performance data: ‘Intel® 82574L Gigabit Network Connection_total’=2846316439;10000;100000 check_network "crit= total > 1000000000" L cli CRITICAL: : Intel® 82574L Gigabit Network Connection >2848405322<2848405322 bps L cli Performance data: ‘Intel® 82574L Gigabit Network Connection_total’=2848405322;10000;1000000000 check_network "crit= total > 1000000000" L cli CRITICAL: : Intel® 82574L Gigabit Network Connection >2848952588<2848952588 bps L cli Performance data: ‘Intel® 82574L Gigabit Network Connection_total’=2848952588;10000;1000000000 check_network "crit= sent > 1000000000" L cli CRITICAL: : Intel® 82574L Gigabit Network Connection >2849146692<2849146692 bps L cli Performance data: ‘Intel® 82574L Gigabit Network Connection_sent’=2849146692;0;1000000000 ‘Intel® 82574L Gigabit Network Connection_total’=2849146692;10000;0 check_network “crit= sent > 1000000000” warn=none L cli CRITICAL: : Intel® 82574L Gigabit Network Connection >2849769313<2849769313 bps L cli Performance data: ‘Intel® 82574L Gigabit Network Connection_sent’=2849769313;0;1000000000 check_network “crit= total > 1000000000” warn=none L cli CRITICAL: : Intel® 82574L Gigabit Network Connection >2859139610<2859139610 bps L cli Performance data: ‘Intel® 82574L Gigabit Network Connection_total’=2859139610;0;1000000000

The number “2859139610 bps” would translate to 2.8 Gbps, which would be an overwhelming data rate for my gigebit NIC. I’m thinking that what is really happening here is the check_network command is actually checking the total bits (bytes?) sent and received (since last reboot?). The hint here is that the number always increments.

Any ideas?


#2

I get the same issue and would be willing for any idea too.


#3

I was reading more, and I think a workaround might be to use CheckWMI, or monitor a performance counter. Unfortunately, my time today is tied up, but if I do come up with something, i’ll post it here.


#4

I was also leaning towards CheckWMI. The following query : "SELECT BytesTotalPerSec FROM Win32_PerfFormattedData_Tcpip_NetworkInterface" seems to be what we’re looking for but it’s an instantaneous value. I’ve been trying to see if it was possible to have the check do an average of multiple values and get some sort of counter like the check_cpu check so we could see the data rate of the network card for, let’s say, 5s, 1m, and 5m, and then trigger the warning and critical states according to these. Unfortunatlety I can’t figure out how this can be done. The documentation mentions something about the counters and how you can set some kind of collect mode, have it stack, and use the option “time” so you get something like check_cpu, but does this work with checkwmi ???


#5

Also after a little more research, it seems better to query the Win32_PerfRawData class instead of the Win32_PerfFormattedData. And for Windows Server previous to 2012 it’s Win32_PerfRawData_Tcpip_NetworkInterface while it’s Win32_PerfRawData_Tcpip_NetworkAdapter afterwards.

The name of the network card to query should be mentionned adding " WHERE Name = ‘adaptername’ " to the query.


#6

After some testing on Windows 7 & 10 machines, the PerfRawData class seems unreliable : the ‘BytesTotalPersec’ value is not a “persec” but rather a plain total, so constantly increasing over time (somehow similar to the issue we get in the first place using “check_network”). Querying the PerfFormattedData offers the real data. But again, it’s the real-time value and not an average, so hard to make it usable in a supervising solution where you want to know if the network is saturated (over a significant period of time). Still looking for some advice on how to do this…


#7

Please open a ticket for bug, it is easier to track…


#8

A potential fix for this can be found in this build: https://github.com/mickem/nscp/releases/tag/0.5.1.19


#9

Ok thanks ! I’ll try that.


#10

Ok, I did some testing and the values are changing (improvement on the original issue of fixed and increasing value), but they seem quite inaccurate.

I’ve opened a new bug as suggested so we can follow on the discussion over there (hope I did it well ; first time here) :


#11

Thank you guys. I didn’t mean to put this on the back burner.