NSclient issues with many checks


#1

Hi,

im hitting an issue at the moment where we use nsclient on 4 windows ‘proxy’ type servers which then go and run all the commands on all our remote hosts (thus getting rid of the need of agents on every server)

we currently have around 450 hosts and 2300 services the issue were hitting is with 1 proxy host every few minutes connections start getting rejected for a little bit and then they start being allowed again, which of course breaks things to mostly get around this we have now setup load balanacing with 4 proxy hosts and for the most part it works a lot better… but at times things still start failing

ive come across the following options and just wanted to find out if you think this could be the possible cause and what you would recommend they get set to? all our proxy’s have 8 cpu cores and 15GB of memory

[/settings/default] socket queue size=0 thread pool=10

[/settings/NRPE/server] socket queue size=0 thread pool=10

Could it be those settings? and if so what would you suggest we set them to to assist with the issue


#2

Proxied checks are synchronous so the number of threads you need can easily be calculated…

i.e.

I am guess 2300 service checks? Thus in theory… If you check them every 5 minutes that would be 8 check / second thus you would need on average 10 threads if the checks take 1 second. If they take 2 seconds you need 20 threads 10 seconds 100 threads… Of course in a real environment things will not be evenly distributed thus your mileage will vary…

Now I did some tests locally and ended up with som mixed results. Running multiple checks via NRPE (which is what I usually benchmark). Command check_ok (i.e. virtually no delay). I can get through 10.000 checks in ~20 seconds which is 500 check/second (with a 5 minute intervall would translate to 150.000 checks). Cpu load on NSClient++ here is around 50%, memory ~12Mb

Doing the same test when tunneling to another nsclient++ instance yielded significantly worse results… I can get through 10.000 checks in ~1.20 minute which is 83 check/second (with a 5 minute intervall would translate to 25.000 checks). Cpu load on NSClient++ here is also around 50%, memory ~12Mb

Not sure how scientific this is and as I said if you have checks which takes long to execute you will obviously need more threads…

My test:

nsclient1.ini:

[/modules]
NRPEServer = enabled
NRPEClient = enabled
CheckHelpers = enabled
CheckExternalScripts = enabled

[/settings/NRPE/server]
timeout = 99
insecure = true
allowed hosts = 127.0.0.1,::1,192.168.0.1
allow arguments = true
thread pool=100
socket queue size=1000

[/settings/external scripts/alias]
x = nrpe_query host=127.0.0.1 port=1234 insecure command=check_ok argument message=$ARG1$
y=check_ok message=$ARG1$

[/settings/NRPE/client/targets/default]
insecure = true

nsclient2.ini:

[/modules]
NRPEServer = enabled
CheckHelpers = enabled

[/settings/NRPE/server]
timeout = 99
insecure = true
allowed hosts = 127.0.0.1,::1,192.168.0.1
allow arguments = true
thread pool=100
socket queue size=1000
port = 1234

Command to seed commands:

time seq 10000 | parallel /usr/lib/nagios/plugins/check_nrpe -H 192.168.0.201 -c x -a {}

Please do let me know if you come up with a better test…