Advisory -- 2016.09u9-u11 vov_diagnostics_no_start
The version of the file
$VOVDIR/scripts/vov_diagnostics_no_start
that is shipped with 2016.09u9 and later contains a stanza that sends a
query to vovserver for other jobs running on the same vovslave.
This requires a CPU-intensive scan of all the jobs in the system.
If vovserver is under stressed, this added load forms a positive-feedback loop
that adds more load, possibly triggering more instances of vov_diagnostics_no_start
Runtime R&D is working on a patch
As a workaround, you can comment out the stanza that looks like this,
or duplicate the 'exit' line as shown below:
#!/bin/sh
# Script to grab information for subslave startup issues
# For use with VOV_DEBUG_NO_START
# This expects to receive 2 parameters
# 1 jobid 2 subslave pid
thisprog=vov_diagnostics_no_start
usage () {
cat <<End-Usage-Info
Usage: $thisprog nc-jobid subslave-pid
Gather info for troubleshooting NC subslave startup problems
Info is printed to the vovslave log
End-Usage-Info
exit 2
}
# body of script omitted
exit $nerr
#if [ "x$VOV_SLAVE_NAME" != "x" ]; then
# $ECHO "# jobs running on this slave: "
# vovselect id,user,isinteractive,duration,prop.ALLPIDS from jobs where #slavename==$VOV_SLAVE_NAME
#fi
# Skip this for now
# $ECHO "# environment of the affected job"
# nc info -e $jobid
exit $nerr
$VOVDIR/scripts/vov_diagnostics_no_start
that is shipped with 2016.09u9 and later contains a stanza that sends a
query to vovserver for other jobs running on the same vovslave.
This requires a CPU-intensive scan of all the jobs in the system.
If vovserver is under stressed, this added load forms a positive-feedback loop
that adds more load, possibly triggering more instances of vov_diagnostics_no_start
Runtime R&D is working on a patch
As a workaround, you can comment out the stanza that looks like this,
or duplicate the 'exit' line as shown below:
#!/bin/sh
# Script to grab information for subslave startup issues
# For use with VOV_DEBUG_NO_START
# This expects to receive 2 parameters
# 1 jobid 2 subslave pid
thisprog=vov_diagnostics_no_start
usage () {
cat <<End-Usage-Info
Usage: $thisprog nc-jobid subslave-pid
Gather info for troubleshooting NC subslave startup problems
Info is printed to the vovslave log
End-Usage-Info
exit 2
}
# body of script omitted
exit $nerr
#if [ "x$VOV_SLAVE_NAME" != "x" ]; then
# $ECHO "# jobs running on this slave: "
# vovselect id,user,isinteractive,duration,prop.ALLPIDS from jobs where #slavename==$VOV_SLAVE_NAME
#fi
# Skip this for now
# $ECHO "# environment of the affected job"
# nc info -e $jobid
exit $nerr
Did you find this article helpful?