Posts tagged "sandboxing"

Tips for Sandboxing Docker Containers

In the world of virtualization, we know two words: Virtual Machines and Containers. Both provide sandboxing: Virtual Machines provide it through hardware level abstraction while containers provide a process level isolation using a common kernel. Docker containers by default are secure but do they provide complete isolation? Let us look at the various ways sandboxing could be achieved in containers and what we need to do to try and achieve complete isolation.

Namespaces

One of the building blocks of containers that provides the first level of sandboxing is Namespaces. It allows processes with their own view of the system. It isolates the processes from having less effect on other processes in container environment or in the host system. Today there are 6 namespaces available in Linux and all of them are supported by Docker.

  • PID namespace: Provides isolation such that a process belonging to a particular PID namespace can only see other processes in the same namespace. It makes sure that processes that belong to one PID namespace cannot know the existence of processes in other PID namespace and hence cannot inspect or kill them.
  • User namespace: Provides isolation such that a process belonging to a particular user namespace is given a view such that a user could be a root within that namespace, but on the host system, it is mapped as a non-privileged user. This provides a great security improvement in Docker environment.
  • Mount namespace: Provides isolation of the host filesystem from the new filesystem created for the process. This allows processes in different namespaces to change the mount points without affecting each other.
  • Network namespace: Provides isolation such that a process belonging to a particular network namespace gets its own network stack that includes routing tables, IP tables rules, sockets and interfaces. Additionally, we would require Ethernet bridges that allow networking between hosts and namespaces.
  • Uts namespace: Isolates two system identifiers – nodename and domainname. This allows containers to have its own hostname and NIS domain name, which is helpful during the initialization steps.
  • IPC namespace: Provides isolation of InterProcess communication resources that includes IPC message queues, semaphores etc.

Although, namespaces provide a great level of isolation, there are resources that a container can access, but they are not namespaced. These resources are common to all the containers on the host machine which raises concerns over the security. This may present a risk of attack or information exposure. Resources that are not sandboxed include the following:

  • The Kernel Keyring: The Kernel Keyring separates keys using UID. Since we have multiple users in different containers that might have the same UID, all of these users are allowed to have access to the same keys in the keyring. Applications using Kernel Keyring for handling secrets are much less secured due to lack of sandboxing
  • The /proc and system time: Due to the “one size fits all” nature of Docker, a number of linux capabilities remain enabled. With certain capabilities enabled, the exposure of /proc offers a source of information leak and large attack surface. /proc includes files that contain configuration information of the kernel. It has information about the host system resources. Another set of Capabilities include the SYS_TIME and SYS_ADMIN, that allow changes to the system time not just inside the container, but also for the host and other containers.
  • The Kernel Modules: If an application loads kernel modules, that would allow the newly added module to be available across all the containers in the environment and the host system. There are some modules that enforce security policies. Access to such modules would allow the applications to make changes to the security policies which again is a big concern.
  • Hardware: The underlying hardware of the host system is shared between all the containers running on the system. A proper cgroup configuration and access control is required to have a fair distribution of resources. In other words, namespaces allow a larger area to be divided into smaller areas and cgroups allow proper usage of these areas. Cgroups work on resources like memory, cpu, disk drives etc. Having a well-defined cgroup configuration would prevent DoS attacks.

Capabilities

Capabilities are rules that help in performing privileged operations. The privileged operations are only allowed by the root user. An individual non-root process would not be able to perform any privileged operation. By dividing the rules into Capabilities, we can assign them to individual processes without elevating their privilege level. This way we can sandbox the container with certain restricted action and if it is compromised, it would perform less damage than it would with the “root” access. Be careful when using capabilities:

  • Defaults: As mentioned earlier, with “one size fits all” nature of Docker, a number of Capabilities remain enabled. These default set of capabilities given to a container does not provide complete isolation. A better approach would be to remove all the capabilities for the container and then add only those capabilities that are required by the application process running in the container. Adding capabilities comes from trial and error approach using various test scenarios for the application running on the container.
  • SYS_ADMIN capability: Another issue here is that even capabilities are not finegrained. One such capability that is most talked about is the SYS_ADMIN capability. It has a lot of functionalities, some of which are used only by the privileged user. Another reason of concern here.
  • SETUID binary: The setuid bit provides full root permission to a process using it. Many linux distributions use the setuid bit on several binaries, despite the fact that capabilities can be an alternative to using setuid, thus making it more safe and provide less surface for attack in case there is a break out from a non-privileged container. Defang SETUID binaries by removing the SETUID bit or mount filesystems with nosuid.

Seccomp

Seccomp (Secure Computing mode) is a simple sandboxing tool feature in the Linux Kernel. Seccomp provides a filtering mechanism for incoming system calls. It provides a process to monitor all the system calls it can make and take action if the system call is not allowed by the filter. Thus, if an attacker gains access to the container, it would have a limited number of system calls in its arsenal. The seccomp filter system uses Berkeley Packet Filter (BPF) system, similar to the one that uses socket filters. In other words, seccomp allows a user to catch a syscall and “allow”, “deny”, “trap”, “kill”, or “trace” it via the syscall number and arguments passed. An additional layer of granularity is added in locking down the process in one’s containers to only do what is needed.

Docker has provided a default seccomp profile for running on the containers that is more like a whitelist of calls that are allowed. This profile disables only 44 system calls out of 300+ available system calls. This is because of the vast use cases of the containers and its current deployment. Making it stricter would make many applications not usable via Docker container environment. Eg: System call such as reboot is disabled, because there would never be a situation where a container would ever need to reboot the host machine.

Another good example is keyctl – a system call for which a vulnerability was recently found (CVE 2016-0728). Keyctl is also disabled by default now. A most secure seccomp profile would be to create a Custom seccomp profile that blocks these 44 system calls and the ones running on the container that are not required by the app. This can be done with the help of DockerSlim (http://dockersl.im) that auto-generates seccomp profiles.

The good part about the seccomp feature is that it would make the attack surface very narrow. However, it also has around 250+ calls still available that would make it susceptible to attacks. For example, CVE 2014-2153 is a vulnerability that was found in the futex system call, which enables privilege escalation through a kernel exploit. This system call is still enabled and is inevitable since it has legitimate use for implementing basic resource locking for synchronization needs. Although the seccomp feature makes the containers more secured than earlier versions of Docker, it only provides moderate security in the container environment. This needs to be hardened, especially for enterprises, to make it compatible with the application running on the containers.

Conclusion

Through the hardening methods for namespaces, cgroups and the use of seccomp profiles we are able to sandbox our containers to a great extent. By following various benchmarks and using least privileges we can make our container environment secure. However, this only scratches the surface and there are plenty of things to take care of.

Rahul Gajria
Cloud Security Researcher Intern

 

References

1. https://www.toptal.com/linux/separation-anxiety-isolating-your-system-withlinux-namespaces
2. http://www.slideshare.net/jpetazzo/anatomy-of-a-container-namespaces-cgroupssome-filesystem-magic-linuxcon
3. https://www.oreilly.com/ideas/docker-security?intcmp=il-webops-free-articlelgen_five_security_concerns_when_using_docker
4. https://www.nccgroup.trust/globalassets/ourresearch/us/whitepapers/2016/april/ncc_group_understanding_hardening_linux_c ontainers-10pdf/
5. https://docs.docker.com/engine/security/security/

Adobe @ NullCon Goa 2015

The ASSET team in Noida recently attended NullCon, a well-known Indian conference centered around information security held in Goa. My team and I attended different trainings on client side security, malware analysis, mobile pen-testing & fuzzing, delivered by industry experts in their respective fields. A training I found particularly helpful was one on client-side security by Mario Heiderich. This training revealed several interesting aspects of browser parsing engines. Mario revealed various ways XSS protections can be defeated and how using modern JavaScript frameworks like AngularJS can also expand attack surface. This knowledge can help us build better protective “shields” for web applications.

Out of the two night talks, the one I found most interesting was on the Google fuzzing framework. The speaker, Abhishek Arya, discussed how fuzz testing for Chrome is scaled using a large infrastructure that can be automated to reveal exploitable bugs with the least amount of human intervention. During the main conference, I attended a couple of good talks discussing such topics as the “sandbox paradox”, an attacker’s perspective on ECMA-2015, drone attacks, and the Cuckoo sandbox. James Forshaw‘s talk on sandboxing was of particular interest as it provided useful knowledge on sandboxes that utilize special APIs on the Windows platform that can help make them better. Another beneficial session was by Jurriaan Bremer on Cuckoo sandbox where he demonstrated how his tool can be used to automate analysis on malware samples.

Day 2 started with the keynote sessions from Paul Vixie (Farsight Security) and Katie Moussouris (HackerOne). A couple of us also attended a lock picking workshop. We were given picks for some well-known lock types. We were then walked through the process of how to go about picking those particular locks. We were successful opening quite a few locks. I also played Bug Bash along with Gineesh (Echosign Team) and Abhijeth (IT Team) where we were given live targets to find vulnerabilities. We were successful in finding a couple of critical issues winning our team some nice prize money. 🙂

Adobe has been a sponsor of NullCon for several years. At this year’s event, we were seeking suitable candidates for openings on our various security teams. In between talks, we assisted our HR team in the Adobe booth explaining the technical aspects of our jobs to prospective candidates. We were successful in getting many attendees interested in our available positions.

Overall, the conference was a perfect blend of learning, technical discussion, networking, and fun.

 

Vaibhav Gupta
Security Researcher- ASSET

The Evolution of Exploit Sophistication

When we look at the exploits that Adobe patched from February and March of this year, it is clear that today’s zero-day exploits are increasingly more sophisticated. This increase in sophistication is not limited to the skills needed to find and exploit the vulnerability. The code used to exploit the environment is also more robust in terms of code quality and testing. In short, exploit creation today requires the same level of rigor as professional software engineering projects.

Today’s advanced exploits need to be written to work in any target environment. For instance, February’s Reader 0-day supported 10 different versions of Reader with 2 sub-versions dependent on the end-user’s language. In addition, Flash Player CVE-2013-0634 had shell code for Windows XP, Vista, Windows 7, Server 2003, Server 2003 R2, Server 2008 and Server 2008 R2 as well as supporting six versions of Flash Player. Variants of CVE-2013-0634 also supported Firefox and Safari on Mac OS X. An exploit developer would need a robust testing environment to ensure that the exploit would work in that many different environments for each version of Flash Player. The exploit writers even took into account different CPU architectures by including a signed 32-bit payload and a 64-bit payload. This reflects the fact that these exploits are written with professional code quality and stability requirements for distribution across a dynamic target base.

As vendors are increasing software defenses through techniques such as sandboxing, attackers are now combining multiple vulnerabilities from different vendors to achieve their goals.When I look at the reports from Pwn2Own and some of the recent zero-day reports such as CVE-2013-0643, attacks are moving toward combining vulnerabilities from multiple products, some of which are from different vendors. We are moving away from the model of single vulnerability exploits.

This is all a part of the natural evolution of the threat landscape and the commercialization of exploits. This will require an equal evolution on the part of vendors in their software defences. Karthik Raman and I will be discussing this topic, “Security Response in the Age of Mass Customized Attacks,” in more detail at the upcoming Hack in the Box Conference (HITB) Amsterdam next week. Please stop by our talk if you would like to discuss this further.

Peleus Uhley
Platform Security Strategist