What to do when NC exhausts file descriptors

Most Linux kernels these days will support up to about 65K descriptors,
so you can permit a high limit for the NC owner account in
/etc/security/limits.conf

When starting NC, carefully review the number of descriptors
in the startup message. It is good to set the NC owner account
so that its startup files automatically lift the descriptor limit as
high as possible.

You do not have to reboot the machine after changing the above, 
but you do need to start a new shell.

Review this limit along with those of the maxNormalClients and maxNotifyClients
vovserver parameters that serve to avoid accidental or deliberate denial-
of-service by file descriptor exhaustion.

If the vovserver becomes unresponsive, there is a special-case where
you may be able to do a 'vovproject save' to minimize data loss.

This involves a client connecting via the loopback interface from
the vovserver machine. The vovserver tries to reserve a small number
of file descriptors for this, but even they can become allocated.

+ Get a shell as the NC owner on the NC vovserver machine
+ change to vncNNN.swd
+ enter the project context
% ves setup.tcl
+ change the VOV_HOST_NAME env-var to 'localhost'
% setenv VOV_HOST_NAME localhost
+ check that you can run client commands
% vsi
and maybe (to find out what clients are causing the problem)
% vovshow -clients > some-file
+ save the project data
% vovproject save
The vovserver fork-execs a copy of itself to get a memory snapshot
when saving, so that the saved data are consistent, so be sure there is
enough virtual memory available for 2x the current vovserver process size. 
+ if you have control, and got the clients list, you may be able to stop
some clients to free up file descriptors using 'vovclientmgr', e.g.
% vovclientmgr closebyuser [user]

If your vovserver fills up to 65K descriptors, something is off track.


The vovserver should have posted some alerts when the descriptor
limit was approached, and these would be saved in the alerts log.

The exhaustion can happen rapidly if someone submits a large batch
of notify jobs (each of these use two file descriptors).

Did you find this article helpful?