Question : Stopped and Zombie processes

I have 3 exectuables, lets call them script1, script2, and script3

Script1 has a for loop which calls script2 several times (up to 200 times)
script2 is a binary executable which calls script3 with some parameters
script3 is an expect script which connects to devices via ssh depending on parameters

so we have:
script1 => script2 => script3
This was working for a while but now occasionally the for loop seems to freeze and I get script2 go to stat "T" when I run ps aux - script one becomes a zombie - stat "Z"
if I do a ctrl+z, then type fg, it seems to resume as normal without skipping the current item in the for loop.  The only other option is killing script1 with a -9, which I don't want to do.

I can understand why script 2 has become a zombie, but I don't understand why script 1 goes to "T", which is "stopped"?

What would cause a script to do this?

Answer : Stopped and Zombie processes

Well, I've now fixed it, but I'm still not too sure what is happening.

omarfarid:  I've tried the input redirection from /dev/null, but am still getting the same results.  The process is not requiring any other input other than the arguments that it gets called with.  Either way, even with the redirection, it was still stopping randomly.

woolmilk:  I doesn't look like its anything to do with ssh unable to login as when that happens (and it does), the expect script would error out and time out, then script 1 would move onto the next item in the for loop.  Also, when it freezes, I do a ps aux to see exactly what command is run with which arguments.  When I run that manually, it works perfectly.

The way I found the issue was by putting echo statements all over the script, to see where exactly it stops.  I will then look at the echo'ed string and identify the exact point.
It turned out to be in the for loop, the first command in there is to call script 2 with the arguments - this runs and immediately zombie-fies
As there is nothing wrong with script2, or the way that its called, and it is apparently stalling at random intervals, I thought it may be a timing issue, so I put a "sleep 1" as the first command in the for loop.  This now appears to work perfectly - it never stalls.

So my question now is, why is it doing this?  Why do I need a "sleep 1" in there?  In theory I could do without the for loop and have all these process running at the same time, they will not interfere with each other. - they run independently.

If it makes any difference, I am running this on Redhat AS 4, 32-bit.  Do you think I am running out of file descriptors or some other resource, which requires the process to stop until additional resources are available?
Random Solutions  
 
programming4us programming4us