Sending signals to a running container using 'kill'
This is Part 6 of the series Building a container runtime from scratch in Go.
In the sixth part of the series we learn how to send signals to a running container using the 'kill' operation.
The source code to accompany this post is available on GitHub.
A container on Linux is simply a running process that has some particular properties associated with it (cgroups, namespaces, etc…). As such, it can be interacted with much like any other process on the system, aforementioned properties not withstanding.
Sending signals to a container process is handled by the container runtime’s kill
operation. The kill
operation of the container runtime is analogous to the Linux kill
command*, so let’s have a quick refresher on that.
* In fact, we’ll use the kill
syscall to implement the kill
operation for our container runtime.
Signals and the Linux kill
command
Signals1 are a kind of software interrupt that is sent asynchronously to a process to inform it of an event. The process may then take action based on the signal it received.
A signal can be sent to a process using a number of methods - raise
, kill
, sigqueue
, and others, including user input, like pressing CTRL+C
in a terminal. For our purposes, we’re going to be focusing on kill
2.
The kill
command takes in the PID of a process to send a signal to and the signal to send.
❯ kill 40135 -9
The Chromium Project has a pretty decent Linux Signal Table describing the available signals.
We can get a list of signals and their names by running kill -l
. Appending a signal name or number will get the corresponding number or name, respectively.
❯ kill -l
1) SIGHUP 2) SIGINT 3) SIGQUIT 4) SIGILL 5) SIGTRAP
6) SIGABRT 7) SIGBUS 8) SIGFPE 9) SIGKILL 10) SIGUSR1
11) SIGSEGV 12) SIGUSR2 13) SIGPIPE 14) SIGALRM 15) SIGTERM
16) SIGSTKFLT 17) SIGCHLD 18) SIGCONT 19) SIGSTOP 20) SIGTSTP
21) SIGTTIN 22) SIGTTOU 23) SIGURG 24) SIGXCPU 25) SIGXFSZ
26) SIGVTALRM 27) SIGPROF 28) SIGWINCH 29) SIGIO 30) SIGPWR
31) SIGSYS 34) SIGRTMIN 35) SIGRTMIN+1 36) SIGRTMIN+2 37) SIGRTMIN+3
38) SIGRTMIN+4 39) SIGRTMIN+5 40) SIGRTMIN+6 41) SIGRTMIN+7 42) SIGRTMIN+8
43) SIGRTMIN+9 44) SIGRTMIN+10 45) SIGRTMIN+1146) SIGRTMIN+12 47) SIGRTMIN+13
48) SIGRTMIN+14 49) SIGRTMIN+15 50) SIGRTMAX-1451) SIGRTMAX-13 52) SIGRTMAX-12
53) SIGRTMAX-11 54) SIGRTMAX-10 55) SIGRTMAX-9 56) SIGRTMAX-8 57) SIGRTMAX-7
58) SIGRTMAX-6 59) SIGRTMAX-5 60) SIGRTMAX-4 61) SIGRTMAX-3 62) SIGRTMAX-2
63) SIGRTMAX-1 64) SIGRTMAX
❯ kill -l stop
19
❯ kill -l 9
KILL
The 0
signal, which isn’t listed above (or even usually in signal tables), is a special case in that it can be used to check whether a process exists, assuming that we have the necessary privileges (more on these later) to send it a signal.
# process exists
❯ kill 37174 -0
# process doesn't exist
❯ kill 343453 -0
bash: kill: (343453) - No such process
# process exists but don't have permissions
❯ kill 40135 -0
bash: kill: (40135) - Operation not permitted
When a signal is sent to a process, the process is able to catch, ignore or block it, with the exception of SIGSTOP
and SIGKILL
. If a process chooses not to catch a process then the kernel will take the default action for that signal (which is often to terminate the process).
We can see what action a process is going to take when it receives a signal, and any signals pending, by inspecting the relevant fields in the process’ status file.
❯ grep ^Sig...: /proc/1/status
SigPnd: 0000000000000000
SigBlk: 7fefc1fe28014a03
SigIgn: 0000000000001000
SigCgt: 00000000000004ec
SigPnd
indicates signals that have been raised but have yet to be acted on.SigBlk
indicates signals that are blocked.SigIgn
indicates signals that are ignored.SigCgt
indicates signals that can be caught.
The value to the right of the identifier is a hex number which, when converted to binary, is a bitmask of the signals. Taking SigCgt
from above as an example:
# https://nixpig.dev/til/convert-binary-hex/
0 0 0 0 0 0 0 0 0 0 0 0 0 4 e c
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0100 1110 1100
0100 1110 1100
│ │││ ││
│ │││ ││
│ │││ │└── 3 SIGQUIT
│ │││ └─── 4 SIGILL
│ │││
│ ││└────── 6 SIGABRT
│ │└─────── 7 SIGBUS
│ └──────── 8 SIGFPE
│
│
└──────────── 11 SIGSEV
This indicates that the process will catch SIGQUIT
, SIGILL
, SIGABRT
, SIGBUS
, SIGFPE
and SIGSEV
signals.
Sending a signal to a container process
As previously mentioned, the kill
operation is used to send a signal to a running container process.
Since it can only be executed against a container that’s in either the created
or running
state, we’ll create a helper function to check if the container can be ‘killed’.
func (c *Container) canBeKilled() bool {
return c.State.Status == specs.StateRunning ||
c.State.Status == specs.StateCreated
}
Now we can implement the kill functionality on the container.
func (c *Container) Kill(sig unix.Signal) error {
if !c.canBeKilled() {
return fmt.Errorf("container cannot be killed in current state (%s)", c.State.Status)
}
if err := syscall.Kill(c.State.Pid, sig); err != nil {
return fmt.Errorf("send signal '%d' to process '%d': %w", sig, c.State.Pid, err)
}
c.State.Status = specs.StateStopped
if c.Spec.Hooks != nil {
if err := hooks.ExecHooks(
c.Spec.Hooks.Poststop, c.State,
); err != nil {
fmt.Println("Warning: failed to execute poststop hooks")
}
}
return nil
}
Kill
receives a unix.Signal
to send to the container process.
First, we check if the container is in a state that can be killed. Then, we issue the Kill
syscall to the PID of the container process which we get from the state. After updating the container status to stopped
, we execute the poststop
hook.
Finally, we just need to hook it up to the kill
operation.
func Kill(opts *KillOpts) error {
cntr, err := container.Load(opts.ID)
if err != nil {
return fmt.Errorf("load container: %w", err)
}
sig, err := strconv.Atoi(opts.Signal)
if err != nil {
return fmt.Errorf("convert signal to int: %w", err)
}
if err := cntr.Kill(unix.Signal(sig)); err != nil {
return fmt.Errorf("kill container: %w", err)
}
if err := cntr.Save(); err != nil {
return fmt.Errorf("save container: %w", err)
}
return nil
}
As with other operations, we first load the container using it’s ID. The signal is passed via the CLI, meaning it will be a string
, so we convert it to an int
. We then send that signal by calling cntr.Kill
. Finally, we save the latest state of the container.
To test that the kill
operation is working, we’ll first need to update the config.json
from which we’re creating a container to execute a long-running process. We can just run sleep 60
, which will ‘wait’ for 60 seconds - plenty of time.
{
// ...
"process": {
"args": ["sleep", "60"],
"env": ["PATH=/bin"],
"user": {
"uid": 10,
"gid": 10
},
"cwd": "/"
},
// ...
}
To give it a try, lets first create and start a container.
❯ ./anocir create --bundle alpinefs test1
❯ ./anocir start test1
Then, check the state of the container.
❯ ./anocir state test1
{
"ociVersion": "1.2.0",
"id": "test1",
"status": "running",
"pid": 87799,
"bundle": "/home/nixpig/projects/alpinefs"
}
Note that the container is running
.
We can also get details of the process using ps
and the pid
from the state.
❯ ps -p 100249
PID TTY TIME CMD
100249 ? 00:00:00 sleep
Now, send a SIGKILL
signal to the container process.
# SIGKILL = 9
❯ ./anocir kill test1 9
…and check the state of the container again to see that the container is stopped
.
❯ ./anocir state test1
{
"ociVersion": "1.2.0",
"id": "test1",
"status": "stopped",
"pid": 87799,
"bundle": "/home/nixpig/projects/alpinefs"
}
…and verify the process has indeed gone, using ps
.
❯ ps -p 100249
PID TTY TIME CMD
# nothing to see here
So far, for testing, we’ve just been running commands ourselves - creating, starting, stopping, deleting containers, inspecting them via querying their state and using external tools, like ps
. But that’s rather inefficient. As we start to build out our runtime’s functionality, we’re going to want a much more efficient process for testing. To that end, in the next installment we’re going to set up the OCI Runtime test suite so we can:
- Get immediate feedback on the changes we’re making to ensure the features we’re implementing are doing what we expect them to, without introducting regressions.
- Validate that the features we’re implementing are satisfying the requirements of the OCI Runtime Spec.
Part 7: Setting up the OCI Runtime Spec test suite 🔜 Coming soon!
References
-
https://man7.org/linux/man-pages/man7/signal.7.html “signal(7) — Linux manual page” ↩︎
-
https://man7.org/linux/man-pages/man1/kill.1.html “kill(1) - Linux manual page” ↩︎