Sending signals to a running container using 'kill'

This is Part 6 of the series Building a container runtime from scratch in Go.

In the sixth part of the series we learn how to send signals to a running container using the 'kill' operation.

The source code to accompany this post is available on GitHub.

A container on Linux is simply a running process that has some particular properties associated with it (cgroups, namespaces, etc…). As such, it can be interacted with much like any other process on the system, aforementioned properties not withstanding.

Sending signals to a container process is handled by the container runtime’s kill operation. The kill operation of the container runtime is analogous to the Linux kill command*, so let’s have a quick refresher on that.

* In fact, we’ll use the kill syscall to implement the kill operation for our container runtime.

Signals and the Linux kill command

Signals1 are a kind of software interrupt that is sent asynchronously to a process to inform it of an event. The process may then take action based on the signal it received.

A signal can be sent to a process using a number of methods - raise, kill, sigqueue, and others, including user input, like pressing CTRL+C in a terminal. For our purposes, we’re going to be focusing on kill2.

The kill command takes in the PID of a process to send a signal to and the signal to send.

kill 40135 -9

The Chromium Project has a pretty decent Linux Signal Table describing the available signals.

We can get a list of signals and their names by running kill -l. Appending a signal name or number will get the corresponding number or name, respectively.

kill -l
 1) SIGHUP       2) SIGINT       3) SIGQUIT     4) SIGILL        5) SIGTRAP
 6) SIGABRT      7) SIGBUS       8) SIGFPE      9) SIGKILL      10) SIGUSR1
11) SIGSEGV     12) SIGUSR2     13) SIGPIPE    14) SIGALRM      15) SIGTERM
16) SIGSTKFLT   17) SIGCHLD     18) SIGCONT    19) SIGSTOP      20) SIGTSTP
21) SIGTTIN     22) SIGTTOU     23) SIGURG     24) SIGXCPU      25) SIGXFSZ
26) SIGVTALRM   27) SIGPROF     28) SIGWINCH   29) SIGIO        30) SIGPWR
31) SIGSYS      34) SIGRTMIN    35) SIGRTMIN+1 36) SIGRTMIN+2   37) SIGRTMIN+3
38) SIGRTMIN+4  39) SIGRTMIN+5  40) SIGRTMIN+6 41) SIGRTMIN+7   42) SIGRTMIN+8
43) SIGRTMIN+9  44) SIGRTMIN+10 45) SIGRTMIN+1146) SIGRTMIN+12  47) SIGRTMIN+13
48) SIGRTMIN+14 49) SIGRTMIN+15 50) SIGRTMAX-1451) SIGRTMAX-13  52) SIGRTMAX-12
53) SIGRTMAX-11 54) SIGRTMAX-10 55) SIGRTMAX-9 56) SIGRTMAX-8   57) SIGRTMAX-7
58) SIGRTMAX-6  59) SIGRTMAX-5  60) SIGRTMAX-4 61) SIGRTMAX-3   62) SIGRTMAX-2
63) SIGRTMAX-1  64) SIGRTMAX

kill -l stop
19

kill -l 9
KILL

The 0 signal, which isn’t listed above (or even usually in signal tables), is a special case in that it can be used to check whether a process exists, assuming that we have the necessary privileges (more on these later) to send it a signal.

# process exists
kill 37174 -0

# process doesn't exist
kill 343453 -0
bash: kill: (343453) - No such process

# process exists but don't have permissions
kill 40135 -0
bash: kill: (40135) - Operation not permitted

When a signal is sent to a process, the process is able to catch, ignore or block it, with the exception of SIGSTOP and SIGKILL. If a process chooses not to catch a process then the kernel will take the default action for that signal (which is often to terminate the process).

We can see what action a process is going to take when it receives a signal, and any signals pending, by inspecting the relevant fields in the process’ status file.

❯ grep ^Sig...: /proc/1/status

SigPnd: 0000000000000000
SigBlk: 7fefc1fe28014a03
SigIgn: 0000000000001000
SigCgt: 00000000000004ec
  • SigPnd indicates signals that have been raised but have yet to be acted on.
  • SigBlk indicates signals that are blocked.
  • SigIgn indicates signals that are ignored.
  • SigCgt indicates signals that can be caught.

The value to the right of the identifier is a hex number which, when converted to binary, is a bitmask of the signals. Taking SigCgt from above as an example:

# https://nixpig.dev/til/convert-binary-hex/

0    0    0    0    0    0    0    0    0    0    0    0    0    4    e    c
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0100 1110 1100

0100 1110 1100
 │   │││  ││
 │   │││  ││
 │   │││  │└──  3 SIGQUIT
 │   │││  └───  4 SIGILL
 │   │││
 │   ││└──────  6 SIGABRT
 │   │└───────  7 SIGBUS
 │   └────────  8 SIGFPE
 └──────────── 11 SIGSEV
 

This indicates that the process will catch SIGQUIT, SIGILL, SIGABRT, SIGBUS, SIGFPE and SIGSEV signals.

Sending a signal to a container process

As previously mentioned, the kill operation is used to send a signal to a running container process.

Since it can only be executed against a container that’s in either the created or running state, we’ll create a helper function to check if the container can be ‘killed’.

func (c *Container) canBeKilled() bool {
    return c.State.Status == specs.StateRunning ||
	c.State.Status == specs.StateCreated
}

Now we can implement the kill functionality on the container.

func (c *Container) Kill(sig unix.Signal) error {
    if !c.canBeKilled() {
	return fmt.Errorf("container cannot be killed in current state (%s)", c.State.Status)
    }

    if err := syscall.Kill(c.State.Pid, sig); err != nil {
	return fmt.Errorf("send signal '%d' to process '%d': %w", sig, c.State.Pid, err)
    }

    c.State.Status = specs.StateStopped

    if c.Spec.Hooks != nil {
	if err := hooks.ExecHooks(
	    c.Spec.Hooks.Poststop, c.State,
	); err != nil {
	    fmt.Println("Warning: failed to execute poststop hooks")
	}
    }

    return nil
}

Kill receives a unix.Signal to send to the container process.

First, we check if the container is in a state that can be killed. Then, we issue the Kill syscall to the PID of the container process which we get from the state. After updating the container status to stopped, we execute the poststop hook.

Finally, we just need to hook it up to the kill operation.

func Kill(opts *KillOpts) error {
    cntr, err := container.Load(opts.ID)
    if err != nil {
	return fmt.Errorf("load container: %w", err)
    }

    sig, err := strconv.Atoi(opts.Signal)
    if err != nil {
	return fmt.Errorf("convert signal to int: %w", err)
    }

    if err := cntr.Kill(unix.Signal(sig)); err != nil {
	return fmt.Errorf("kill container: %w", err)
    }

    if err := cntr.Save(); err != nil {
	return fmt.Errorf("save container: %w", err)
    }

    return nil
}

As with other operations, we first load the container using it’s ID. The signal is passed via the CLI, meaning it will be a string, so we convert it to an int. We then send that signal by calling cntr.Kill. Finally, we save the latest state of the container.

To test that the kill operation is working, we’ll first need to update the config.json from which we’re creating a container to execute a long-running process. We can just run sleep 60, which will ‘wait’ for 60 seconds - plenty of time.

{
    // ...

    "process": {
	"args": ["sleep", "60"],
	"env": ["PATH=/bin"],
	"user": {
	"uid": 10,
	"gid": 10
	},
	"cwd": "/"
    },

    // ...
}

To give it a try, lets first create and start a container.

❯ ./anocir create --bundle alpinefs test1
❯ ./anocir start test1

Then, check the state of the container.

❯ ./anocir state test1

{
    "ociVersion": "1.2.0",
    "id": "test1",
    "status": "running",
    "pid": 87799,
    "bundle": "/home/nixpig/projects/alpinefs"
}

Note that the container is running.

We can also get details of the process using ps and the pid from the state.

❯ ps -p 100249

   PID TTY          TIME CMD
100249 ?        00:00:00 sleep

Now, send a SIGKILL signal to the container process.

# SIGKILL = 9
❯ ./anocir kill test1 9

…and check the state of the container again to see that the container is stopped.

❯ ./anocir state test1

{
    "ociVersion": "1.2.0",
    "id": "test1",
    "status": "stopped",
    "pid": 87799,
    "bundle": "/home/nixpig/projects/alpinefs"
}

…and verify the process has indeed gone, using ps.

❯ ps -p 100249

    PID TTY          TIME CMD
# nothing to see here

So far, for testing, we’ve just been running commands ourselves - creating, starting, stopping, deleting containers, inspecting them via querying their state and using external tools, like ps. But that’s rather inefficient. As we start to build out our runtime’s functionality, we’re going to want a much more efficient process for testing. To that end, in the next installment we’re going to set up the OCI Runtime test suite so we can:

  1. Get immediate feedback on the changes we’re making to ensure the features we’re implementing are doing what we expect them to, without introducting regressions.
  2. Validate that the features we’re implementing are satisfying the requirements of the OCI Runtime Spec.

Part 7: Setting up the OCI Runtime Spec test suite    🔜 Coming soon!

References


  1. https://man7.org/linux/man-pages/man7/signal.7.html “signal(7) — Linux manual page” ↩︎

  2. https://man7.org/linux/man-pages/man1/kill.1.html “kill(1) - Linux manual page” ↩︎

Enjoyed this article? Consider buying me a coffee.