Question : Understanding the behaviour of waitpid(-1, NULL, WNOHANG)

As far as I understand from waitpid's man page, when calling waitpid with WNOHANG, the function should return 0 if there are no child processes to wait for.
When running the following program -

===
#include
#include
#include
#include
#include
#include
#include

#define N 5

void error_exit(char *str)
{
      perror(str);
      exit(1);
}

void sigchld_handler(int sig)
{
      pid_t      pid;
      do {
            pid = waitpid(-1, NULL, WNOHANG);
            printf("child %d\n",pid);
      } while ( pid > 0 );
      
      if ( errno == ECHILD )
            puts("errno=ECHILD");
      if ( pid < 0 )
            error_exit("waitpid");

      if ( signal(SIGCHLD, sigchld_handler) == SIG_ERR )
            error_exit("signal");
}

int main()
{
      int       i;
      pid_t       pid;
      
      if ( signal(SIGCHLD, sigchld_handler) == SIG_ERR )
            error_exit("signal");
      
      for ( i = 0; i < N; i += 1 ) {
            pid = fork();
            if ( pid < 0 )
                  // fork error
                  error_exit("fork");
            else if ( pid == 0 ) {
                  // child process - sleep
                  sleep(N-i+1);
                  exit(0);
            }
      }
      while(1);
      
      return 0;
}
===

I get this output -

===
child 5398
child 0
child 5397
child 0
child 5396
child 0
child 5395
child 0
child 5394
child -1
errno=ECHILD
waitpid: No child processes
===

As you can see, after all child processes have been "collected", waitpid returns -1 (which means an error occured) and sets errno to ECHILD. I'd like to know why do I get this error, and what is the problem with my code.

TIA

Edit:
My kernel version is 2.6.4 (also tried 2.4.22), and my GCC version is 3.3.3 (also tried 3.2.3).

Answer : Understanding the behaviour of waitpid(-1, NULL, WNOHANG)

In your signal handler you have

     do {
          pid = waitpid(-1, NULL, WNOHANG);
          printf("child %d\n",pid);
     } while ( pid > 0 );


As long as there is a child, waitpid can get something (even if not at the current instant) so it keeps returning 0 ... When it collects the last child, the loop makes it execute waitpid again and this time there are no children left to collect ... So it produces error ECHILD

Also the way you have added error checking is inaccurate ...

Instead of

     if ( errno == ECHILD )
          puts("errno=ECHILD");
     if ( pid < 0 )
          error_exit("waitpid");

It should have been

if  ( pid < 0 )
{
     if ( errno == ECHILD )
          puts("errno=ECHILD");
     error_exit("waitpid");
}

you should not check only the error number ... It could have been set by some other program ... It is necessary to check the return value
Random Solutions  
 
programming4us programming4us