Containment

Isolating programs

Paul Krzyzanowski

March 12, 2025

Two lessons we learned from experience are that applications can be compromised and may not always be trustworthy.

The first risk is compromise. Server applications, in particular, such as web servers, databases, and mail servers have been compromised time and again. This is particularly harmful as they often run with elevated privileges and on systems on which normal users do not have accounts. This provides a way for an attacker to get access to a system.

The second risk is trust. We may not always trust an application. We cannot necessarily trust that the game we downloaded from some unknown developer will not try to upload our files, destroy our data, or try to change our system configuration. Unless we have the ability to scrutinize the codebase of a service, we will not know for sure if it tries to modify any system settings or writes files to unexpected places.

With this realization that we might not be immune to attacks, security in modern computing depends on containment – creating isolation mechanisms that can protect sensitive processes and data from unauthorized access or modification. Traditionally, two widely used approaches for isolation have been containerization and full virtualization. While both provide security benefits, they differ in their design and security guarantees.

Access control isn’t enough

Our initial approach to achieving containment may involve properly using access controls. For example, we can run server applications as low-privilege users and ensure we have set proper read/write/execute permissions on files, read/write/search permissions on directories, or even set up role-based policies.

However, access controls usually do not allow us to set permissions for “don’t allow access to anything else.” For example, we may want our web server to have access to all files in /home/httpd but nothing outside of that directory. Access controls do not let us express that rule. Instead, we are responsible for changing the protections of every file on the system and making sure it cannot be accessed by “other”.

We also have to hope that no users change those permissions. In essence, we must disallow the ability for anyone to make files publicly accessible because we never want our web server to access them. We may be able to use mandatory access control mechanisms if they are available but, depending on the system, we may not be able to restrict access properly either. More likely, we will be at risk of comprehension errors and be likely to make a configuration error, leaving parts of the system vulnerable. To summarize, even if we can get access controls to help, we will not have high assurance that they do.

Access controls also only focus on protecting access to files and devices. A system has other resources, such as CPU time, memory, disk space, and network. We may want to control how much of these an application is allowed to use.

Security implications

Unlike app containment mechanisms such as jails, containers, or sandboxes, virtual machines enable isolation all the way through the operating system. A compromised application, even with escalated privileges, can wreak havoc only within the virtual machine. Even compromises to the operating system kernel are limited to that virtual machine. However, a compromised virtual machine is not much different form having a compromised physical machine sitting inside your organization: not desirable and capable of attacking other systems in your environment.

Multiple virtual machines are usually deployed on one physical system. In cases such as cloud services (e.g., such as those provided by Amazon Web Services, Microsoft Azure, or the Google Cloud), a single physical system may host virtual machines from different organizations or running applications with different security requirements. If a malicious application on a highly secure system can detect that it is co-resident on a computer that is hosting another operating system and that operating system provides fewer restrictions, the malware may be able to create a covert channel to communicate between the highly secure system with classified data and the more open system. A covert channel is a general term to describe the the ability for processes to communicate via some hidden mechanism when they are forbidden by policy to do so. In this case, the channel can be created via a side channel attack. A side channel is the ability to get or transmit information using some aspects of a system’s behavior, such as changes in power consumption, radio emissions, acoustics, or performance. For example, processes on both systems, even though they are not allowed to send network messages, may create a means of communicating by altering and monitoring system load. The malware on the classified VM can create CPU-intensive task at specific times. Listener software on the unclassified VM can do CPU-intensive tasks at a constant rate and periodically measure their completion times. These completion times may vary based on whether the classified system is doing CPU-intensive work. The variation in completion times creates a means of sending 1s and 0s and hence transmitting a message.

Microsoft Virtualization-Based Security (VBS)

Virtualization-Based Security (VBS) is a Windows security feature that uses hardware virtualization to create secure, isolated memory regions. This isolated environment is used to protect sensitive security features, such as credential protection and code integrity enforcement.

The main goal of VBS is to protect critical security functions from being compromised by malware, privilege escalation attacks, or other unauthorized system modifications. Unlike traditional security measures that rely solely on software-based access controls, VBS uses Microsoft’s Hyper-V hypervisor to strictly enforce isolation at the hardware level without deploying separate instances of the operating system.

VBS enclaves extend this concept to provide a more secure execution environment for security-focused operations. Within a VBS enclave, data and processes remain completely inaccessible to the normal Windows operating system, even if the OS itself is compromised. This level of protection ensures that credentials, encryption keys, and authentication mechanisms remain secure from even the most advanced threats.

VBS Enclave Design

At the core of VBS enclaves is the Hyper-V hypervisor, a lightweight virtualization layer that operates below the operating system. The hypervisor is responsible for creating and managing secure memory regions that cannot be accessed by the main OS or any user-mode processes. This provides a fundamental security boundary that protects enclave data from unauthorized access.

To further strengthen this protection, Windows also employs a specialized execution environment known as the secure kernel. Unlike the traditional Windows kernel, the secure kernel operates in a highly restricted mode where only trusted, verified code is allowed to execute. The secure kernel works in conjunction with the hypervisor to enforce strict access controls, ensuring that even if an attacker gains administrative privileges over the Windows OS, they cannot manipulate or extract data from VBS enclaves. When encryption and decryption processes take place inside a VBS enclave, cryptographic keys remain inaccessible to any unauthorized process, even if the operating system is compromised. This makes VBS enclaves useful for securing sensitive communications, digital signatures, and encrypted data storage.

Here are a few examples of how VBS protections go beyond Linux mechanims such as namespaces, cgroups, and capabilities:

Protecting credentials:
Linux capabilities and namespaces can remove certain root privileges and filesystem access to limit damage. However, if an attacker can get root access or read memory via proc/<pid>/mem. VBS makes it impossible to access another process' memory by isolating authentication secrets in a protected memory region.
Enforcing code integrity:
Capabilities like CAP_SYS_MODULE prevent unprivileged users from loading kernel modules, but a root user or a compromised kernel can still load unsigned or malicious kernel modules. With VBS, the hypervisor enforces that only signed and verified kernel drivers can be loaded and run.
Protecting kernel memory (read-only):
While Linux supports configuring hardened kernels, in general, a root user has several ways to modify kernel code and kernel data structures. With VBS, kernel memory is configured to be read-only.
Secure execution environments:
On certain architectures (like Intel SGX), Linux allows enclave-based secure execution, but Linux namespaces do not provide these capabilities directly, VBS enclaves create hardware-protected execution environments within normal applications. Even the OS kernel cannot access enclage-protected memory.

VBS Limitations

Despite their strong security guarantees, VBS enclaves are not without limitations. They require hardware support for virtualization, such as Intel VT-x or AMD-V. VBS also introduces some performance overhead due to the additional processing required to maintain secure execution environments, which can impact system responsiveness, particularly in latency-sensitive applications. Compatibility is also a challenge, as some legacy software may not function correctly when VBS is enabled.

Containerization vs. Virtualization vs. Virtualization-Based Security

Containerization, used in technologies like Docker and Kubernetes, provides process-level isolation by creating separate user-space environments that share the same underlying operating system kernel. Containers offer a lightweight and efficient method for running applications securely since they eliminate the need to duplicate the full operating system stack for each instance. However, because all containers share the same kernel, they remain vulnerable to kernel-level attacks. If an attacker gains access to the kernel, they could potentially compromise all running containers.

Full virtualization, provides a much stronger form of isolation by running each virtual machine with its own dedicated OS and kernel. Hypervisors such as VMware, VirtualBox, and Microsoft Hyper-V allow multiple virtual machines (VMs) to operate on the same hardware, each completely isolated from the others. This approach greatly enhances security by ensuring that a compromise in one virtual machine does not affect others. However, full virtualization comes with significant performance overhead. Each VM requires its own OS, consuming additional memory and processing power, making it less efficient compared to containerization.

Microsoft’s Virtualization-Based Security (VBS) enclaves offer a hybrid approach that balances the efficiency of containers with the strong isolation properties of full virtualization. Unlike traditional virtual machines, VBS enclaves do not require an entire OS instance for each process, but unlike containers, they provide hardware-enforced memory isolation through virtualization. This means that even if an attacker gains control of the operating system kernel, they cannot access the secure enclave’s data or execution environment. VBS enclaves are particularly useful in protecting sensitive security functions such as cryptographic key management and authentication processes. By leveraging hardware-based protections, VBS enclaves ensure that critical security operations remain isolated from both user-space and kernel-space threats, providing a powerful mechanism for modern cybersecurity defenses.

Last modified March 24, 2025.
recycled pixels