8.4.Process Control

发表于 2024-08-16 更新于 2024-08-18 分类于 CSAPP ， Chapter 8.Exceptional Control Flow 本文字数： 2.1k 阅读时长 ≈ 8 分钟

\(8.4.\)Process Control

1.Obtaining Process IDs

Each process has a unique positive process ID(PID).
The getpid function returns the PID of the calling process.
The getppid function returns the PID of its parent(i.e., the process that created the calling process).

#include <sys/types.h>
#include <unistd.h>

pid_t getpid(void);
pid_t getppid(void);

/* Returns: PID of either the caller or the parent */

2.Terminating Processes

We can think of a process as being in one of three states:

Running: The process is either executing on the CPU or waiting to be executed and will eventually be scheduled by the kernel.
Stopped: The execution of the process is suspended and will not be scheduled.
Terminated: The process is stopped permanently. This is due to the following reasons:
1. Receiving a signal whose default action is to terminate the process.
2. Returning from the main routine.
3. Calling the exit function.

#include <stdlib.h>

void exit(int status);

/* This function does not return */

The exit function terminates the process with an exit status of status.

The other way to set the exit status is to return an integer value from the main routine.

3.Creating Processes

A parent process creates a new running child process by calling the fork function.

#include <sys/types.h>
#include <unistd.h>

pid_t fork(void);

/* Returns: 0 to child, PID of child to parent, −1 on error */

The newly created child process is almost, identical to the parent:

The child gets an identical(but separate) copy of the parent’s user-level virtual address space.
The child also gets identical copies of any of the parent’s open file descriptors, which means the child can read and write any files that were open in the parent when it called fork.
However, they share different PIDs.

The fork function is called once but it returns twice.

Since the PID of the child is always nonzero, the return value provides an unambiguous way to tell whether the program is executing in the parent or the child.

Take the following program as an example:

int main()
{
    pid_t pid;
    int x = 1;

    pid = Fork();
    if (pid == 0) { /* Child */
        printf("child : x=%d\n", ++x);
        exit(0);
    }

    /* Parent */
    printf("parent: x=%d\n", --x);
    exit(0);
}

When we run the program on the Unix system, we get the following result:

1
2
3

linux> ./fork
parent: x=0
child : x=2

The parent and the child are separate processes that run concurrently, and whether we get the parent: x=0 first or child: x=2first depends on whether the parent process completes faster than the child process.
Since the parent and the child are separate processes, they each have their own private address spaces, so that any subsequent changes that a parent or child makes to x are private and are not reflected in the memory of the other process.

We can use a process graph to describe the process:

Let's take a look at a more complicate example:

int main() {
    Fork();
    Fork();
    printf("hello\n");
    exit(0);
}

The process graph of the process is as below:

For a program running on a single processor, any topological sort of the vertices in the corresponding process graph represents a feasible total ordering of the statements in the program.

4.Reaping Child Processes

\(a.\)Basic concept

When a process terminates for any reason, the kernel does not remove it from the system immediately. Instead, the process is kept around in a terminated state until it is reaped by its parent.

When the parent reaps the terminated child, the kernel passes the child's exit status to the parent and then discards the terminated process.

A terminated process that has not yet been reaped is called a zombie.
Even though zombies are not running, they still consume system memory resources.

When a parent process terminates, the kernel arranges for the init process to become the adopted parent of any orphaned children. The init process:

has a PID of 1,
is created by the kernel during system start-up,
never terminates
is the ancestor of every process.

\(b.\)Reaping operations

\(i.\)The `waitpid` function

A process waits for its children to terminate or stop by calling the waitpid function:

#include <sys/types.h>
#include <sys/wait.h>

pid_t waitpid(pid_t pid, int *statusp, int options);

/* Returns: PID of child if OK, 0 (if WNOHANG), or −1 on error */

By default(when options = 0), waitpid suspends execution of the calling process until a child process in its wait set terminates.
If a process in the wait set has already terminated at the time of the call, then waitpid returns immediately.
Else, waitpid returns the PID of the terminated child that caused waitpid to return.

If pid > 0, then the wait set is the singleton child process whose process ID is equal to pid.
If pid = -1, then the wait set consists of all of the parent's child processes.

\(ii.\)Modifying the default behavior

The default behavior can be modified by setting options to various combinations of the WNOHANG, WUNTRACED, and WCONTINUED constants:

WNOHANG: Return immediately(with a return value of 0) if none of the child processes in the wait set has terminated yet.
- The default behavior suspends the calling process until a child terminates, this option, however, allows us to do useful work while waiting for a child to terminate.
WUNTRACED: Suspend execution of the calling process until a process in the wait set becomes either terminated or stopped. Return the PID of the terminated or stopped child that caused the return.
- The default behavior returns only for terminated children; this option is useful when you want to check for both terminated and stopped children.

The options can be combined by ORING, for example:

WNOHANG | WUNTRACED: Return immediately, with a return value of 0, if none of the children in the wait set has stopped or terminated, or with a return value equal to the PID of one of the stopped or terminated children.

\(iii.\)Checking the exit status of a reaped child

If the statusp argument is non-NULL, then waitpid encodes status information about the child that caused the return in status, which is the value pointed to by statusp.

The wait.h include file defines several macros for interpreting the status argument:

WIFEXITED(status): Returns true if the child terminated normally, via a call to exit or a return.
WEXITSTATUS(status): Returns the exit status of a normally terminated child. This status is only defined if WIFEXITED() returned true.
WTERMSIG(status): Returns the number of the signal that caused the child process to terminate. This status is only defined if WIFSIGNALED() returned true.
WIFSTOPPED(status): Returns true if the child that caused the return is currently stopped.
WSTOPSIG(status): Returns the number of the signal that caused the child to stop. This status is only defined if WIFSTOPPED() returned true.

\(iv.\)Error conditions

If the calling process has no children, then waitpid returns −1 and sets errno to ECHILD.
If the waitpid function was interrupted by a signal, then it returns −1 and sets errno to EINTR.

\(v.\)The `wait` function

#include <sys/types.h>
#include <sys/wait.h>

pid_t wait(int *statusp);

/* Returns: PID of child if OK or −1 on error */

Calling wait(&status) is equivalent to calling waitpid(-1, &status, 0).

5.Putting Processes to Sleep

#include <unistd.h>
unsigned int sleep(unsigned int secs);

/* Returns: seconds left to sleep */

Another function that we will find useful is the pause function, which puts the calling function to sleep until a signal is received by the process.

#include <unistd.h>

int pause(void);

/* Always returns −1 */

6.Loading and Running Programs

\(a.\)The `execve` function & executing process

#include <unistd.h>

int execve(const char *filename, const char *argv[], const char *envp[]);

/* Does not return if OK; returns −1 on error */

The data structure of argument list and the list of environment variables are as below:

By convention, argv[0] is the name of the executable object file.

After execve loads filename, it calls the start-up code. The start-up code sets up the stack and passes control to the main routine of the new program, which has a prototype of the form:

1 2	int main(int argc, char argv, char envp); /* int main(int argc, char argv[], char envp[]); */

argc:It gives the number of non-null pointers in the argv[] array.
argv:It points to the first entry in the argv[] array.
envp:It points to the first entry in the envp[] array.

The stack has the organization as below:

\(b.\)Manipulating the environment array

#include <stdlib.h>

char *getenv(const char *name);

/* Returns: pointer to name if it exists, NULL if no match */

The getenv function searches the environment array for a string name=value. If found, it returns a pointer to value; otherwise, it returns NULL.

#include <stdlib.h>

int setenv(const char *name, const char *newvalue, int overwrite);

/* Returns: 0 on success, −1 on error */

void unsetenv(const char *name);

/* Returns: nothing */

7.Using `fork` and `execve` to Run Programs

A shell is an interactive application-level program that runs other programs on behalf of the user. The original shell was the sh program, which was followed by variants such as csh, tcsh, ksh, and bash.

A shell performs a sequence of read/evaluate steps and then terminates.

The read step reads a command line from the user.
The evaluate step parses the command line and runs programs on behalf of the user.

\(a.\)The `main` routine of a simple shell

#include "csapp.h"
#define MAXARGS 128

/* Function prototypes */
void eval(char *cmdline);
int parseline(char *buf, char **argv);
int builtin_command(char **argv);

int main()
{
    char cmdline[MAXLINE]; /* Command line */

    while (1) {
        /* Read */
        printf("> ");
        Fgets(cmdline, MAXLINE, stdin);
        if (feof(stdin))
            exit(0);

        /* Evaluate */
        eval(cmdline);
    }
}

\(b.\)The evaluating routine

/* eval - Evaluate a command line */
void eval(char *cmdline)
{
    char *argv[MAXARGS]; /* Argument list execve() */
    char buf[MAXLINE]; /* Holds modified command line */
    int bg; /* Should the job run in bg or fg? */
    pid_t pid; /* Process id */

    strcpy(buf, cmdline);
    bg = parseline(buf, argv);
    if (argv[0] == NULL)
        return; /* Ignore empty lines */

    if (!builtin_command(argv)) {
        if ((pid = Fork()) == 0) { /* Child runs user job */
            if (execve(argv[0], argv, environ) < 0) {
                printf("%s: Command not found.\n", argv[0]);
                exit(0);
            }
        }

        /* Parent waits for foreground job to terminate */
        if (!bg) {
            int status;
            if (waitpid(pid, &status, 0) < 0)
                unix_error("waitfg: waitpid error");
        }
        else
            printf("%d %s", pid, cmdline);
    }
    return;
}

/* If first arg is a builtin command, run it and return true */
int builtin_command(char **argv)
{
    if (!strcmp(argv[0], "quit")) /* quit command */
        exit(0);
    if (!strcmp(argv[0], "&")) /* Ignore singleton & */
        return 1;
    return 0; /* Not a builtin command */
}

/* parseline - Parse the command line and build the argv array */
int parseline(char *buf, char **argv)
{
    char *delim; /* Points to first space delimiter */
    int argc; /* Number of args */
    int bg; /* Background job? */

    buf[strlen(buf)-1] = ' '; /* Replace trailing '\n' with space */
    while (*buf && (*buf == ' ')) /* Ignore leading spaces */
        buf++;

    /* Build the argv list */
    argc = 0;
    while ((delim = strchr(buf, ' '))) {
        argv[argc++] = buf;
        *delim = '\0';
        buf = delim + 1;
        while (*buf && (*buf == ' ')) /* Ignore spaces */
            buf++;
    }
    argv[argc] = NULL;

    if (argc == 0) /* Ignore blank line */
        return 1;

    /* Should the job run in the background? */
    if ((bg = (*argv[argc-1] == '&')) != 0)
        argv[--argc] = NULL;

    return bg;
}

The phaseline function parses the space-separated command-line arguments and builds the argv vector that will eventually be passed to execve.
- If the last argument is an '&' character, then parseline returns 1, indicating that the program should be executed in the background(the shell does not wait for it to complete).
- Otherwise, it returns 0, indicating that the program should be run in the foreground(the shell waits for it to complete).
After parsing the command line, the eval function calls the builtin_command function, which checks whether the first command-line argument is a built-in shell command. If so, it interprets the command immediately and returns 1. Otherwise, it returns 0.
If builtin_command returns 0, then the shell creates a child process and executes the requested program inside the child.
- If the user has asked for the program to run in the background, then the shell returns to the top of the loop and waits for the next command line.
- Otherwise the shell uses the waitpid function to wait for the job to terminate. When the job terminates, the shell goes on to the next iteration.