Docker recently resolved a runc privilege escalation vulnerability that could be exploited by a malicious program to escape container and access the host.
Tracked as CVE-2016-9962, the security issue is created because runc passes a file descriptor from the host's filesystem to the "runc init" bootstrap process when joining a container. This means that a malicious process inside the container can gain access to the host filesystem with its current privilege set.
Discovered by Alexander Bergmann, the vulnerability is rather difficult to exploit, because the race window between join-and-execve is quite small. According to Docker’s CVE database, the privilege escalation issue is the result of insecure opening of file-descriptor. Docker 1.12.6 resolves the bug.
Because the issue resides in the runc code, other containers might also be affected, Aqua Security’s Sagie Dulce says. The vulnerability is triggered when exec-ing an application in an already running container, the security researcher explains.
The use of an inherited file descriptor inside the container allows a malicious process to access the file descriptor of a directory that resides on the host and then the rest of the host's filesystem. Because the bug can be leveraged for directory traversal to the host's file system, it results in an effective container escape, Dulce notes.
Apparently, exec-ing commands inside a running container is actually a bigger issue that the problem of an open file descriptor is part of. However, the window of opportunity where the container has access to the runc init process on the host is very small before the runc init process execs the command inside the container.
“This is because runc enters the namespace of the container before it execs the final command. This window could enable a container, for example, to list file descriptors on the host process, which can then lead it to the host’s file system. Because many containers run as root, this indeed has serious implications,” the researcher notes.
The issue can be exploited in containers that lack the CAP_SYS_PTRACE capability, although it is much easier to access the file descriptors if the capability exists. A correctly timed exploit can leverage the vulnerability without having control of the runc init process. “One can escape a container […] by simply patching runc to sleep before calling exec,” Dulce says.
According to Red Hat’s Dan Walsh, SELinux mitigates the vulnerability. “SELinux is the only thing that protects the host file system from attacks from inside of the container. If the processes inside of the container get access to a host file and attempt to read and write the content SELinux will check the access,” he explains.
The released patch for this issue ensures that there are no host file descriptors present in the runc init process. Moreover, the fix sets the runc init process as non-dumpable, before setns into the container, which apparently protects it from processes inside the container.