Bind failed: 10048 error


#1

All,

I recently upgraded all our servers (about 200) to the latest 64-bit NSClient++ version. Since then, I’m seeing a lot of problems connecting to clients - about a dozen of them. When I check the nsclient log, I see the following:

bind failed: 10048: Only one usage of each socket address (protocol/network address/port) is normally permitted.

When I use a port utility to check the open ports, it shows that “System” with 100’s of 5666 and 12489 ports in CLOSE_WAIT status, even though I stop the nsclientpp service. I can kill those, but there is always at least one 5666 and 12489 port in listening state, held open by System, which I can’t kill. Thus the error above when attempting to start NSClient. I confirmed the NSclientpp service was off during the checks and attempts to close all the 5666 and 12489 ports.

This originally only affected one server - now at least 12 have this problem. Is there anything I can do to kill that port short of rebooting? Also, is there any way I can tweak nsclient++ so this doesn’t keep happening?

Thanks for any and all assistance,

Mike


#2

Forgot to mention…

We do not have anything using that port. The port stays open until the server is rebooted. Is seems to be left hanging since the last stop of the NSClientpp service, since I usually catch this problem upon restarting the service.

Thanks!

Mike


#3

Humm, nothing I have encounterd really.

It might be such especially a problem in the NSCA module (discovered recently9 could “hang” socket threads, but I am doubt full if this is the case here…

Anyways, I would need a reproducable scenario to debug such an issue.

// Michael Medin


#4

Mickem,

Meanwhile, can you suggest any ways we can force the “System” process to close those two ports? We have about 12 critical servers we both can’t launch Nagios on, or reboot at this time, which we really need to monitor.

Thanks

Mike


#5

I have never experience this issue, which is why I find this strange.

Usually windows closes all resource when it closes the program allocating them.
Thus when the program dies they should have been released.

I assume you have verified that no processes are running?
(nothing under NSClient++ in task manager or some such)

Another option to “get around it” would be to temporarily change the port.

// Michael Medin


#6

Yeah - I’ve never seen the behavior either, so I figured changing the port would be the only option. So, I’ll schedule changing the ports, rolling out a new config to each system, then restarting the service.

Thanks

Mike


#7

Hi!

Try to turn off embedded Windows Firewall, if this helps - try to add NSClient++.exe to list of trusted programs in the firewall.

P.S.: sorry for my bad english


#8

Having exactly the same issue.

Using TCPview to find a local process client to 5666 and 12489 local.

Restarting NSClient Services keeps this 2 sockets actives.

I was able to resolve by adding 127.0.0.1 as allowed_hosts in NSC.ini …


#9

Also had this issue running the 0.3.9 x64 client. For me the issue was that I had a vbs script testing NTP with the w32tm command, and this had crashed. Upon killing this process I was able to restart nsclient++ without rebooting a production server.

So tips for future troubleshooters, use Sysinternals Process Explorer and lookup nsclient and possible hung child processes.


#10

An issue in 0.3.9 (fixed in 0.4.0) is the reuse flag so this should hopefully be fixed in 0.4.0 (if anyone encounters this on 0.4.0 and later let me know)

// Michael Medin


#11

We recently have the same issue on nsclient v0.4.1.102. When we update the scripts and the nsclient.ini we need to restart the nsclient service to use the new settings. In some cases we get the same error as described here and also a lot of ports in status close_wait status with a pid that is not active anymore in the tasklist … bind nsclient to 127.0.0.1 does not fix the issue.