NSClient seemingly random timeouts and SSL connection setup


Hey folks,

I’m researching why a bunch of my Windows 2003 servers give seemingly random nuisance timeouts for different services. Most of the service checks that are nuisance alerting are event log checks (not via NSClient), but not all.

Some of them can be explained by checking my trending for CPU - if the CPU is pegged at 100% for an hour, yeah, there probably will be a time out or two. However, many more of them happen when the machine is pretty much sleeping - nothing going on, no spike in system load/memory usage/disk IO/disk queue.

I’m starting to see what I believe is a pattern - checking the nsclient.log file on the hosts themselves, I find:

error:D:\source\NSCP-stable\include\Socket.h:699: Error: Could not complete SSL handshake : [0] 5, attempting to resume…

corresponding to almost every one of these nuisance alerts. All of my hosts use SSL, all the time, so why an almost completely idle machine would suddenly have issues setting up an SSL connection is somewhat confusing.

I understand that the file pointer is the file on the build machine and not my own. However, does any of the above point to why these machines do this from time to time? I get these nuisance alerts dailY, so something is definately amiss here.

I’d really appreciate any pointers! Thanks much.



I am having this exact same issue. I was thinking it was performance related, but I’m starting to believe it isn’t (really). I get the first Nagios (Icinga I should say) alert at 0:44 and the only resource intensive app that was running, afaik was the backup job which was done at 0:40
My colleagues who run indexing jobs aren’t here atm, but I doubt they run at the same time.

Other than that, I get the same symptoms as Benny, at seemingly random times. The host doesn’t go down, it’s just that all checks with check_nrpe give a socket timeout, even with the timeout set at 30 seconds (which I would think is pretty long)


I am having the same issue.