Part VI·Systems and Software·Chapter 49 of 62

Part VISystems and Software

Security and Isolation

May 16, 2026·16 min read·advanced

Computer security is a vast topic. This chapter focuses on the hardware mechanisms that support security and the OS features that build on them. The threat models include: malicious user-space code escalating to kernel privilege, malicious processes reading another process's memory, malicious code injecting itself into a target program, and side-channel attacks that extract secrets without violating privilege rules.

Security is not a single feature; it's a set of layered defenses. No mechanism is sufficient on its own. The best modern systems combine hardware features (NX, SMEP, SMAP, PAC, BTI, MTE, CET, MTE), OS policies (ASLR, KASLR, capabilities, MAC), and software hardening (stack canaries, control-flow integrity, fortified libc). This chapter walks through the major mechanisms.

01.The Threat Landscape

Common attack categories:

Memory corruption. Bugs in C/C++ programs (buffer overflows, use-after-free, double free, integer overflows leading to short allocations) let attackers overwrite memory they shouldn't. Classic exploitation: overwrite a return address on the stack to redirect execution.

Code injection. The attacker writes attacker-controlled bytes to memory and causes execution to jump there. Defended by making memory either writable or executable, but not both (W^X).

Code reuse. With W^X preventing direct injection, attackers chain together "gadgets" — short sequences of legitimate instructions ending in a return or indirect jump. ROP (Return-Oriented Programming) and JOP (Jump-Oriented Programming) are the dominant techniques.

Privilege escalation. A user-space attacker exploits a kernel bug to gain kernel privileges. Defended by reducing kernel attack surface, sandboxing system calls (seccomp), and keeping kernel and user separate.

Side channels. The attacker observes timing, cache state, branch predictor state, or other indirect signals to learn secrets. Spectre and Meltdown are the famous examples; Chapter 51 covers them in detail.

Physical attacks. The attacker has physical access: cold-boot attacks on RAM, fault injection, side channels via power analysis, bus probing. Defended by memory encryption, tamper-resistant designs, secure enclaves.

Supply chain. Malicious code introduced during build or distribution. Defended by signed binaries, reproducible builds, verified boot.

This chapter focuses on hardware-supported defenses for the first several categories.

02.Memory Protection: The Foundations

Two mechanisms underlie almost everything:

Privilege separation. User code (ring 3 / EL0 / U-mode) cannot execute privileged instructions or access kernel memory.

Virtual memory. Each process has its own address space; a process cannot directly access another process's memory.

These were covered in earlier chapters. They are necessary but not sufficient.

03.Execute-Disable / NX

The first significant hardware addition for security was the NX (No-Execute) bit in page-table entries. AMD added it to AMD64 in 2003; Intel followed with XD (eXecute Disable) in 2004; ARM has had AP/UXN/PXN bits; RISC-V has the X bit (positive — set means executable).

With NX, a page can be marked non-executable. Attempting to execute (fetch instructions from) such a page raises a fault. This kills the simple "inject shellcode and jump to it" attack: data pages are non-executable, so the injected bytes can't be run as code.

Combined with making code pages read-only (no W on code), the result is W^X (Write XOR Execute): every page is either writable or executable, never both. This is universal in modern OSes.

Some legitimate code needs to generate executable code at runtime (JIT compilers like V8, JVM, .NET CLR). They allocate memory writable, write code, then change it to executable (and read-only) before jumping. The kernel API (mprotect on Unix, VirtualProtect on Windows) supports this. Some hardened JITs also use WX-exclusive modes — the page is mapped twice, once writable, once executable, with the writable view in a separate address space view than the executable.

04.SMEP, SMAP, PXN, PAN

W^X defends against user-mode injection. But the kernel was historically vulnerable to a related attack: a kernel bug causes the kernel to jump to (or read from) a user-controlled address, where the attacker has placed malicious code or data.

Hardware additions:

SMEP (Supervisor Mode Execution Prevention, Intel Ivy Bridge 2012): the kernel cannot fetch instructions from user-accessible pages. Even with a kernel pointer-corruption bug, the attacker can't put their payload in user memory and have the kernel execute it.

SMAP (Supervisor Mode Access Prevention, Broadwell 2014): the kernel cannot read or write user-accessible pages by default. Explicit instructions (STAC / CLAC) toggle access for legitimate copy_from_user / copy_to_user paths.

PXN (Privileged Execute-Never, ARM): kernel cannot execute user-accessible pages. The same as SMEP.

PAN (Privileged Access Never, ARMv8.1): kernel cannot read/write user-accessible pages without explicit permission. The same as SMAP.

These features close common kernel-exploit primitives. An attacker who manages to redirect the kernel's instruction pointer must now find a target inside the kernel's own code (which is harder).

05.ASLR and KASLR

ASLR (Address Space Layout Randomization): the OS places executables, libraries, the heap, the stack, and mmap regions at randomized addresses on each process launch. An attacker who has a memory-corruption primitive doesn't know where in memory legitimate code lives, so they can't reliably build ROP chains.

ASLR was first implemented in PaX/grsec in 2001 and gradually adopted by mainline OSes: OpenBSD 2003, Linux 2.6.12 (2005), Vista (2007), iOS 4.3 (2011), macOS 10.5 (2007).

The entropy varies. On 64-bit systems, randomization can use 30+ bits of entropy, making brute force impractical (a brute-force attempt has minuscule probability per try, and most failures crash the process). On 32-bit systems, entropy was limited to ~16-19 bits, brute-forceable.

ASLR fails when:

A separate vulnerability leaks an address (an "info leak").
Code is not position-independent (PIE binaries are needed for binary randomization; otherwise, the executable itself is at a fixed address).
The randomization uses too few bits.

Modern systems require PIE for everything important. Linux distros now build all packages PIE by default.

KASLR (Kernel ASLR): the kernel itself is loaded at a randomized address. Less effective than user-space ASLR because the kernel image is a single, large, observable target — once an attacker leaks any kernel address, they can compute the base. Spectre-class side channels make KASLR particularly fragile; KASLR is best treated as raising the bar, not as a hard barrier.

06.Stack Protections

Stack-based buffer overflows historically overwrote the return address to redirect execution. Defenses:

Stack canary. Compiler-inserted: at function entry, a known random value is placed on the stack; at function exit, it's checked. If a buffer overflow has overwritten the canary, the program detects this and aborts before returning.

GCC's -fstack-protector-strong or -fstack-protector-all enables canaries.

Shadow stack. The hardware (or the runtime) maintains a parallel stack containing only return addresses. On call, the return address is pushed to both stacks. On return, both are checked; mismatch indicates corruption.

Hardware support:

Intel CET-SS (Control-flow Enforcement Technology, Shadow Stack): introduced in Tiger Lake (2020). The CPU maintains a shadow stack, transparent to most software.
ARM GCS (Guarded Control Stack): announced in ARMv9.4-A.

Shadow stacks defend against return-address corruption (the most common ROP variant).

Non-executable stack. Already covered under W^X — the stack is not executable.

07.Control-Flow Integrity (CFI)

ROP and JOP rely on the attacker redirecting forward (calls and jumps) and backward (returns) edges of the control-flow graph to arbitrary code. CFI is a family of techniques that constrain control flow to legitimate edges.

Coarse-grained CFI: at every indirect call, check that the target is a function entry. At every return, check that the return address points just after a call instruction. This prevents the most flexible ROP gadgets but is still permissive.

Fine-grained CFI: at every indirect call, check that the target matches the function's signature. Each indirect call site is allowed only the functions whose signature matches.

Hardware support:

Intel CET-IBT (Indirect Branch Tracking). Every indirect branch target must be an ENDBR64 (or ENDBR32) instruction. Branches to instructions that aren't ENDBR fault. This restricts indirect branches to predefined entry points in the code.

ARM BTI (Branch Target Identification, ARMv8.5). Equivalent to IBT: indirect branches must target a BTI instruction. BTI's variants (BTI c, j, jc) distinguish between call targets, jump targets, and both.

ARM PAC (Pointer Authentication, ARMv8.3). A different approach: authenticate pointers. The CPU computes a MAC over a pointer using a key (set by the OS) and a context (a salt), packing the MAC into unused upper bits of the pointer. Before using the pointer (e.g., for a return), the CPU verifies the MAC; corruption causes a fault.

PAC is used heavily on Apple silicon (since A12, ARMv8.3): every return address is signed (PACIASP/AUTIASP) and authenticated. For an attacker to forge a return address, they need the key — which is per-process, kept in privileged registers.

PAC is more flexible than IBT/BTI: it can authenticate any pointer (function pointers, vtable pointers), not just branch targets.

CFI software (LLVM CFI, MSVC /guard:cf, Linux's kCFI for kernel) layers checks at compile time, enforced by hardware where available.

08.Memory Tagging

A different angle: detect memory-safety bugs by tagging memory and pointers with matching tags.

ARM MTE (Memory Tagging Extension, ARMv8.5). Each 16-byte memory block has a 4-bit tag in the granule's metadata. Each pointer carries a 4-bit tag in its top byte (using ARM's TBI feature). On every memory access, hardware checks pointer tag against memory tag; mismatch generates a fault (synchronous or asynchronous mode).

When a memory allocator returns a pointer to the user, it sets a random tag on the underlying memory and the same tag on the pointer. If the user uses the pointer after free (when the allocator has retagged the memory), the tags mismatch and the bug is detected.

MTE catches:

Heap use-after-free: the freed memory is retagged before reuse; the stale pointer's tag no longer matches.
Out-of-bounds access: the adjacent allocation has a different tag.
Heap buffer overflow: same — tag mismatch on the next allocation.

MTE does not catch:

Type confusion within the same allocation.
Stack buffer overflows (in the basic mode; stack tagging is a separate, more involved feature).
Use-after-free where the freed memory is reused by the same allocator with the same tag (probabilistic — 1/16 chance with 4-bit tags).

Apple has implemented something similar called EMTE (Enhanced MTE) on M-series silicon and used it in production. Google is deploying MTE on Pixel devices. Intel and AMD have related ideas (LAM — Linear Address Masking — provides the pointer-side support; HW-tag-checking hardware support is more recent).

HWAsan (Hardware-Assisted AddressSanitizer): a software compatible-mode that uses MTE-style tagging to find bugs even on hardware without MTE. Slower than native MTE but works everywhere.

09.Sandboxing and Capabilities

OS-level mechanisms for restricting what a process can do:

Unix DAC (Discretionary Access Control): file permissions (rwx for owner/group/other), suid/sgid. The classic Unix model.

MAC (Mandatory Access Control): SELinux, AppArmor, smack. Policy is set system-wide and not under the discretion of file owners. SELinux on Red Hat/Fedora; AppArmor on Ubuntu/SUSE.

seccomp: a Linux mechanism for restricting which syscalls a process can make. Used heavily by browsers (Chrome's renderer processes) and container runtimes. Can filter syscalls by a BPF program (seccomp-bpf), allowing complex policies.

Linux namespaces: per-process views of various kernel resources (PID namespace, network namespace, mount namespace, user namespace, etc.). Containers use these to give each container an isolated view.

capabilities (POSIX): split root's powers into discrete capabilities (CAP_NET_ADMIN, CAP_SYS_ADMIN, etc.). A process needs only the specific capabilities for its task, not full root.

Pledge / unveil (OpenBSD): a process declares which subset of operations it intends to do; subsequent disallowed operations fail. Simpler than seccomp; widely admired.

These are all software mechanisms. Hardware contributes by providing the privilege separation and memory protection that the OS uses to enforce them.

10.Trusted Execution Environments

TEEs are isolated execution environments designed to protect sensitive code from the rest of the system, including (in some cases) the OS.

ARM TrustZone: the original TEE. Splits the CPU into secure-world and normal-world states; the secure world has its own memory, peripherals, and software stack. Used in mobile devices for DRM, biometrics, payments. Software running there: OP-TEE, Trusty (Google), various proprietary TEE OSes.

Intel SGX (Software Guard Extensions): enclaves — small isolated regions of memory protected even from the OS. Code in an enclave runs at the regular ring 3 but with hardware-enforced memory isolation and attestation. Has had a tumultuous history: the foundation of many security products, but also the target of multiple side-channel attacks (Foreshadow, MDS, PlunderVolt). Intel deprecated SGX on consumer CPUs (Tiger Lake and later); it remains on server-class Xeons.

AMD SEV / SEV-ES / SEV-SNP: VM-level confidential computing, mentioned in Chapter 48. Protects entire VMs, not individual enclaves.

Intel TDX (Trust Domain Extensions): Intel's response to SEV. VM-level confidential computing.

ARM CCA (Confidential Compute Architecture): ARM's VM-level confidential computing, with realms as isolated execution environments.

RISC-V CoVE: the analogous RISC-V framework, in development.

The trend over the last few years: from process/enclave-level TEEs (SGX) to VM-level confidential computing (SEV-SNP, TDX, CCA, CoVE). VM-level is easier to use (existing OSes work, no enclave-aware code), more performant, and has fewer side-channel surfaces — though not zero.

11.Speculative Execution Attacks (Spectre, Meltdown)

A topic large enough to deserve its own chapter (Chapter 51). A summary here:

Meltdown (2018): on vulnerable Intel CPUs, user-mode speculation could read kernel memory mapped into the user-accessible-but-permission-denied region. The trap from the access fault was deferred until retirement, but during the speculation window, dependent loads could happen and leave traces in the cache. Mitigation: KPTI (kernel page table isolation), removing the kernel mapping from user page tables.

Spectre (2018): the branch predictor could be trained to mispredict in a way that causes a victim process or kernel routine to speculatively execute code that depends on secret data, leaking the secret via cache side channels. Mitigations: barriers (LFENCE before potentially-dangerous loads, IBRS / IBPB to clear predictor state, retpolines to make indirect branches non-speculative).

These two papers opened a vast research area. Subsequent variants include MDS (Microarchitectural Data Sampling), L1TF (L1 Terminal Fault), TAA (TSX Asynchronous Abort), Zenbleed, Downfall, and many others. Each requires hardware microcode updates, software mitigations, or both.

The takeaway: speculation crosses security boundaries in ways the original architects didn't anticipate. Closing the gaps without crippling performance has been a multi-year industry effort. We'll cover the technical details in Chapter 51.

12.Hardware Security Modules

Some operations need to happen in hardware that the rest of the system can't tamper with: storing root keys, performing crypto with keys that never leave hardware, maintaining secure counters.

TPM (Trusted Platform Module): per-system hardware for measured boot, sealed storage, and attestation (Chapter 47). TPMs have a small key store, a few PCRs, an RNG, and crypto primitives. Not a general-purpose enclave; a small fixed set of functions.

HSM (Hardware Security Module): enterprise-class device for key management at scale. Banks, certificate authorities, large enterprises use HSMs to protect private keys.

Apple Secure Enclave: a dedicated security coprocessor on Apple silicon, running its own RTOS (sepOS). Manages biometric data, Touch ID/Face ID, encryption keys. Architecturally similar to a small ARM-based TEE.

Google Titan / Pixel security chips: equivalent in role to Apple's Secure Enclave.

These specialized chips supplement the main CPU, providing functions the main CPU's TEE cannot trust itself with (e.g., the main CPU's firmware updates can be controlled by an attacker; the Secure Enclave's cannot).

13.Defense in Depth: Putting It Together

A modern hardened system layers many defenses:

Hardware features active: NX, SMEP, SMAP/PAN, CET, MTE/PAC where available.
Kernel hardened: KPTI, KASLR, slab freelist randomization, stack canaries, kCFI, BPF JIT-spray protections.
User space hardened: PIE, RELRO (read-only relocations), now (immediate binding so the GOT is initialized then locked), stack canaries, fortify_source for libc, AArch64 PAC for return addresses, MTE for heap.
OS policies: ASLR, mandatory access control (SELinux/AppArmor), seccomp filters, namespaces.
Application sandboxing: browsers run renderers in restricted containers; container runtimes use seccomp + namespaces + capabilities.
Verified boot: secure boot, measured boot, signed firmware updates.
Confidential computing: when needed, run sensitive workloads in TEEs or confidential VMs.

No layer is sufficient on its own. An attacker with one bug must defeat multiple layers to actually compromise the system. The defense-in-depth model has held up well against most real-world attacks, with the notable exception of speculative-execution side channels, which often bypass several traditional defenses.

14.What Hardware Security Doesn't Solve

It's worth being honest about limits:

Side channels remain hard. Spectre, MDS, RowHammer, Foreshadow, and dozens of others. Mitigations exist; new variants keep appearing; complete elimination would require radical microarchitectural changes (and likely large performance costs).

Software bugs remain. Hardware can detect some classes of bugs (MTE for heap UAF) but cannot prevent logic bugs, design flaws, or incorrectly-applied authorization checks.

Supply chain attacks: hardware is not unique here. Compromise during build, distribution, or update is a software/process problem.

Physical attacks: with sufficient resources, physical attacks bypass most defenses. Cold-boot, fault injection, side channels via power analysis, decapsulation. Defended by tamper-resistant designs and operational security; not by general-purpose hardware features.

Insider threats: any system administrator with sufficient privilege can bypass defenses. Defended by organizational policy, audit, and least privilege.

Hardware security raises the cost of attack but does not eliminate it.

15.Summary of Part X

Part X has covered the OS / firmware / virtualization / security stack — the layer between hardware and applications:

Chapter 46 examined the OS interface: privilege levels, virtual memory, system calls, signals, scheduling, NUMA, power management. The negotiated boundary between hardware and kernel.
Chapter 47 covered boot and firmware: from reset vector through multi-stage firmware to OS handoff, including UEFI, ARM Trusted Firmware, OpenSBI, secure boot, and measured boot.
Chapter 48 was virtualization: VT-x / SVM / ARM virt / RISC-V H, two-stage translation, paravirtualized devices, IOMMU, KVM/QEMU, containers, confidential computing.
Chapter 49 is security and isolation: NX, SMEP/SMAP, ASLR, stack protections, CFI (CET, BTI, PAC), memory tagging (MTE), TEEs (TrustZone, SGX, SEV, TDX, CCA), and the limits.

Together, these chapters cover the system view of the CPU — what runs on top of the hardware, how it manages resources, and how it isolates components. Modern systems are extraordinary not so much in what they can do as in how reliably and securely they do it; that reliability is the cumulative product of decades of architecture, OS engineering, and security research.

Part XI shifts to advanced topics that revisit several earlier chapters in greater depth: cache hierarchies beyond the basics, branch prediction and speculative-execution attacks, power and thermal physics, reliability and validation, performance analysis tools, and modern packaging (chiplets, 3D-stacked memory). Chapter 50 begins with advanced cache, picking up where Chapter 17 left off.

Book mode