Part IFoundations

Number Systems and Data Representation

May 16, 2026·39 min read·beginner

A computer, taken at face value, is just an enormous collection of switches. Each switch is either on or off, and that is the only distinction the hardware can directly make. Yet from this single,…

A computer, taken at face value, is just an enormous collection of switches. Each switch is either on or off, and that is the only distinction the hardware can directly make. Yet from this single, almost embarrassingly limited primitive, every number, letter, image, sound, and program in modern computing is somehow built. The bridge between switches that are on or off and the things we actually want to compute on is the subject of this chapter.

We will start with the simplest question — how to write down numbers — and work outward. By the end of the chapter you should be comfortable converting between bases, reasoning about signed and unsigned integers, recognizing when arithmetic will overflow, understanding how characters and decimals are stored, and appreciating why the machine cares about whether your data is properly aligned in memory. These ideas will recur in every later chapter, often without further comment, so it is worth slowing down and getting them right.

01. Binary, Decimal, and Hexadecimal

The number systems we use to write numbers are called positional number systems. In a positional system, the value of a digit depends on its position. The string 327 in everyday decimal does not mean "three plus two plus seven"; it means

327=3102+2101+7100.327 = 3 \cdot 10^2 + 2 \cdot 10^1 + 7 \cdot 10^0.

The number 10 here is the base or radix of the system. Decimal uses base 10 because human beings have ten fingers, but there is nothing mathematically special about it. Any integer greater than 1 can serve as a base. Computers, built from two-state switches, naturally use base 2.

Binary

In binary, the only digits are 0 and 1, and each position represents a power of 2. The binary string 1011 means

123+022+121+120=8+0+2+1=11.1 \cdot 2^3 + 0 \cdot 2^2 + 1 \cdot 2^1 + 1 \cdot 2^0 = 8 + 0 + 2 + 1 = 11.

A single binary digit is called a bit, short for binary digit. A string of nn bits can represent 2n2^n distinct patterns. With four bits you get sixteen patterns, with eight bits you get 256, with sixteen bits you get 65,536, and with thirty-two bits you get a little over four billion. Doubling the bit count squares the number of patterns, which is why hardware tends to advance in factors of two.

To convert a decimal number into binary by hand, repeatedly divide by 2 and record the remainders. The remainders, read in reverse order, give the binary digits. For example, to convert 13:

Plain Text
13 ÷ 2 = 6 remainder 1 (least significant bit)
6 ÷ 2 = 3 remainder 0
3 ÷ 2 = 1 remainder 1
1 ÷ 2 = 0 remainder 1 (most significant bit)

Reading the remainders bottom to top gives 1101, and indeed 18+14+02+11=131 \cdot 8 + 1 \cdot 4 + 0 \cdot 2 + 1 \cdot 1 = 13.

Hexadecimal

Binary is the language of the hardware, but it is awful to read. A 32-bit number written in binary is a wall of zeroes and ones in which the eye gets lost almost immediately. Hexadecimal, or base 16, was adopted as a practical compromise. It uses the digits 0 through 9 and the letters A through F to represent the values 10 through 15:

HexDecimalBinary
000000
110001
220010
330011
440100
550101
660110
770111
881000
991001
A101010
B111011
C121100
D131101
E141110
F151111

Because 16=2416 = 2^4, every hexadecimal digit corresponds to exactly four binary digits. This makes the conversion between hex and binary trivial: group the bits in fours, starting from the right, and replace each group with the matching hex digit. For instance,

Plain Text
binary: 1011 0010 1111 0001
hex: B 2 F 1

In source code, hexadecimal numbers are usually written with a 0x prefix (0xB2F1) and binary with a 0b prefix (0b1011). When the base is unclear from context, it is often given as a subscript, as in 101121011_2 or B2F116\text{B2F1}_{16}.

Octal, and a brief warning about leading zeros

You will occasionally encounter octal (base 8), which uses the digits 0 through 7 and groups bits in threes. It was popular on early computers whose word sizes were multiples of three bits, but it has largely been displaced by hexadecimal. The one place octal still bites people is in C and its descendants, where a numeric literal that starts with a 0 is interpreted as octal: 0123 does not mean one hundred twenty-three but rather 164+28+3=831 \cdot 64 + 2 \cdot 8 + 3 = 83. This is a famous source of bugs. Modern languages either deprecate the convention or require an explicit 0o prefix.

02. Bits, Bytes, Words, and Machine Sizes

A bit by itself is not very useful. Hardware almost always groups bits into larger fixed-size chunks for storage and processing.

The byte is the most familiar grouping: eight bits. Eight is not a law of nature — early machines used six-bit, seven-bit, and nine-bit bytes — but the eight-bit byte became standard in the 1970s and is now universal. A byte can hold 28=2562^8 = 256 distinct values, which happens to be enough to represent every character in the English alphabet, both cases, all the digits, common punctuation, and a fair amount of control information. Memory is almost always byte-addressable, meaning the smallest unit the CPU can read from or write to memory is one byte. You cannot, on a normal machine, ask for "bit number 17 of memory" directly.

A word is a larger grouping that the processor handles naturally as a single unit. The word size is one of the defining characteristics of a machine. On a 32-bit processor the word is typically 32 bits (4 bytes); on a 64-bit processor it is 64 bits (8 bytes). The word size determines, among other things, the size of the registers in the CPU, the largest integer the ALU can add in a single operation, and — usually — the maximum amount of memory the machine can address.

To make matters slightly confusing, the word word also has a historical meaning that does not always agree with the natural word size of a modern machine. In the x86 family, for example, the term "word" was fixed at 16 bits in the 1970s and stayed there even after the architecture grew to 32 and 64 bits. So Intel documentation talks about a word (16 bits), a doubleword (32 bits), and a quadword (64 bits), regardless of how wide the registers actually are. Other ISAs, such as RISC-V, use word to mean 32 bits and doubleword to mean 64 bits. When reading a manual, always check what the author means by the term.

Common sizes you will see again and again:

NameBitsBytesDistinct values
Nibble4½16
Byte81256
Halfword16265,536
Word (32-bit ISA)324≈ 4.29 × 10⁹
Doubleword / 64-bit word648≈ 1.84 × 10¹⁹

A useful rule of thumb is that every additional bit doubles the range. Twenty bits gets you about a million, thirty bits about a billion, forty bits about a trillion. This is why a 32-bit address space tops out at four gigabytes: 2322^{32} bytes is roughly 4.29×1094.29 \times 10^9.

There is also a long-running confusion about the prefixes kilo, mega, and giga in computing. In physics and engineering, "kilo" means exactly 1,000. In computing, particularly when describing memory, it has historically meant 1024=2101024 = 2^{10}, because powers of two are what the hardware actually handles. To resolve the ambiguity, the IEC introduced the binary prefixes kibi, mebi, gibi, and so on:

  • 1 KiB=210 bytes=1,024 bytes1\ \text{KiB} = 2^{10}\ \text{bytes} = 1{,}024\ \text{bytes},
  • 1 MiB=220 bytes=1,048,576 bytes1\ \text{MiB} = 2^{20}\ \text{bytes} = 1{,}048{,}576\ \text{bytes},
  • 1 GiB=230 bytes=1,073,741,824 bytes1\ \text{GiB} = 2^{30}\ \text{bytes} = 1{,}073{,}741{,}824\ \text{bytes}.

Storage manufacturers still tend to use the decimal prefixes (a "1 TB" disk really holds 101210^{12} bytes, not 2402^{40}), while operating systems often report capacities in binary prefixes. This is why a brand-new 1 TB drive shows up as "931 GB" in the file manager. Both numbers are correct; they simply use different definitions.

03. Unsigned Integers

The simplest interpretation of a string of bits is as a non-negative whole number, written in binary. This is called an unsigned integer. An nn-bit unsigned integer can take any value from 00 up to 2n12^n - 1 inclusive. For 8-bit bytes that range is 0 to 255; for 32-bit words it is 0 to 4,294,967,295.

Arithmetic on unsigned integers follows the ordinary rules of binary addition. Adding two binary digits produces a sum digit and possibly a carry into the next position, exactly as in decimal:

    1110000001101101(109)+00101100(44)10011001(153)\begin{array}{rcl} & \;\;1\,1\,1\phantom{\,0\,0\,0\,0\,0} & \\ & 0\,1\,1\,0\,1\,1\,0\,1 & (109) \\ + & 0\,0\,1\,0\,1\,1\,0\,0 & (44) \\ \hline & 1\,0\,0\,1\,1\,0\,0\,1 & (153) \end{array}

This is precisely what the hardware adder in Chapter 4 will compute, one bit position at a time, with the carry from each position fed into the next.

A subtle but important property of fixed-width arithmetic is that it is modular. If you have an 8-bit register and you add 200 to 100, the true mathematical answer is 300, but 300 does not fit in 8 bits (300=256+44300 = 256 + 44). The hardware silently throws away the high carry and the register is left holding 44. In other words, nn-bit unsigned arithmetic operates modulo 2n2^n:

a+nb=(a+b)mod2n.a +_n b = (a + b) \bmod 2^n.

Whether this is a bug or a feature depends on what you were trying to do. For computing memory addresses or hashing, modular arithmetic is exactly what you want. For counting people in a stadium, it can be a disaster.

04. Signed Integers and Two's Complement

So far we have only described non-negative numbers. To represent negatives, we have to dedicate part of the bit pattern to indicating sign. Several schemes have been tried over the years, and one has decisively won.

Sign-magnitude

The most obvious idea is to reserve the leftmost bit as a sign bit: 0 for positive, 1 for negative, and the remaining bits as the magnitude. With 8 bits, 00000101 would mean +5 and 10000101 would mean −5. This scheme, called sign-magnitude, is easy to read but has two ugly problems. There are two representations of zero (00000000 and 10000000), which complicates equality testing. And the rules for adding numbers depend on the signs of the operands, so the hardware needs separate logic for addition and subtraction.

One's complement

Another scheme, one's complement, represents a negative number by flipping every bit of its positive form. With 8 bits, +5 is 00000101 and −5 is 11111010. This is closer to what we want — addition almost works correctly — but it still has two zeroes, and additions involving negatives need an awkward "end-around carry" correction. Modern machines do not use it.

Two's complement

The scheme used by essentially every modern computer is two's complement. To negate a number, flip all the bits and add one. With 8 bits, +5 is 00000101; flipping gives 11111010; adding 1 gives 11111011, which is the two's-complement representation of −5.

The interpretation rule is best stated mathematically. Treating the high-order bit as having a negative place value, an nn-bit two's-complement number bn1bn2b1b0b_{n-1} b_{n-2} \cdots b_1 b_0 has the value

v=bn12n1+i=0n2bi2i.v = -b_{n-1} \cdot 2^{n-1} + \sum_{i=0}^{n-2} b_i \cdot 2^i.

So 11111011 is interpreted as 128+64+32+16+8+0+2+1=5-128 + 64 + 32 + 16 + 8 + 0 + 2 + 1 = -5, which agrees with the construction above.

Two's complement has three properties that explain why it has crowded out every other scheme.

There is exactly one zero. The pattern 00000000 is zero, and there is no negative zero.

The range is slightly asymmetric. An nn-bit two's-complement integer covers 2n1-2^{n-1} to 2n112^{n-1} - 1. For 8 bits, that is −128 to +127. Notice that −128 has no positive counterpart inside the range; trying to negate it gives back −128, which is a real and occasional source of bugs.

The hardware needs only one adder. Subtraction aba - b can be computed as a+(b)a + (-b), and since negation is just bit-flipping plus one, the same circuit that adds unsigned numbers also adds signed numbers without any modification. This is the decisive practical advantage. The CPU has a single ALU that does not need to know whether its operands are signed; the difference between signed and unsigned shows up only in how the result and the flags are interpreted.

A handy mental shortcut for negating a two's-complement number by hand: starting from the right, copy bits up to and including the first 1, then flip every bit to its left. Try it on 00001100 (12) and you should get 11110100 (−12).

05. Overflow and Underflow

Because registers have a fixed width, every arithmetic operation has the potential to produce a result that does not fit. When that happens we say the operation has overflowed (the result is too large in magnitude) or underflowed (in floating-point, the result is too small to represent). Overflow does not throw an error in hardware by default; the CPU simply stores the wrapped or saturated result and sets a flag. Whether that becomes a real bug depends on whether the program checks the flag.

The conditions for unsigned and signed overflow are different, and getting them muddled is a classic mistake.

For unsigned addition, overflow occurs exactly when the operation produces a carry out of the most significant bit. Adding 200 + 100 in 8 bits is 300, which is 1  0010110021\;00101100_2; the leading 1 falls off the top and the register is left with 44. The CPU records this in a carry flag.

For signed (two's-complement) addition, the rule is different. Overflow occurs when two operands of the same sign produce a result of the opposite sign. Adding two positives can never give a negative real result, so if the sign bit of the result is 1, something has gone wrong. Symmetrically for two negatives. Adding a positive and a negative can never overflow, because the true result is bounded by the larger of the two in magnitude. The CPU records signed overflow in a separate overflow flag.

A worked example. In 8-bit two's complement, add 100 + 50:

01100100(100)+00110010(50)10010110(106 signed)\begin{array}{rcl} & 0\,1\,1\,0\,0\,1\,0\,0 & (100) \\ + & 0\,0\,1\,1\,0\,0\,1\,0 & (50) \\ \hline & 1\,0\,0\,1\,0\,1\,1\,0 & (-106 \text{ signed}) \end{array}

The mathematical answer is 150, but 150 does not fit in the signed range −128 to 127. The result, read as a signed number, is −106. Two positives produced a negative; signed overflow occurred. Note that there was no carry out of the top bit, so as far as unsigned arithmetic is concerned, nothing went wrong: 100+50=150100 + 50 = 150, and 150 is a perfectly valid 8-bit unsigned number. The same bit pattern, the same operation; only the interpretation differs.

This is why the hardware computes both flags on every addition. The compiler and programmer choose which one to consult based on whether they were treating the operands as signed or unsigned.

In high-level languages, the response to overflow is a language-level decision. C and C++ leave signed overflow as undefined behavior, which means the compiler is free to assume it does not happen — a permission it routinely exploits when optimizing loops. C's unsigned overflow, by contrast, is fully defined as wrap-around. Rust panics on overflow in debug builds and wraps in release builds. Python uses arbitrary-precision integers and does not overflow at all; it just allocates more memory. Each choice is a tradeoff between speed, predictability, and safety.

06. Widening, Narrowing, and Sign Extension

Real programs constantly move values between registers and memory locations of different widths. A byte is loaded into a 32-bit register; a 64-bit value is stored back into a 32-bit slot; an int16_t is added to an int64_t. Each of these conversions has to answer one question: when the destination is wider than the source, what should the new high-order bits be?

There are two answers, and choosing the right one is essential for correctness.

Zero extension fills the new high bits with zeroes. The bit pattern 1111 0001 (8 bits) becomes 0000 0000 1111 0001 (16 bits). This preserves the value when the source is interpreted as unsigned: 241 stays 241.

Sign extension copies the most significant bit of the source into every new high-order bit. The bit pattern 1111 0001 becomes 1111 1111 1111 0001. This preserves the value when the source is interpreted as signed in two's complement: 15-15 stays 15-15.

The two operations are different, and choosing wrong silently changes the number. Most ISAs therefore provide separate load instructions for the two cases. On x86-64, movzx zero-extends and movsx sign-extends. On ARM, the suffixes LDRB/LDRSB and LDRH/LDRSH distinguish unsigned and signed loads of byte and halfword values. On RISC-V, the load opcodes lb and lbu differ only in whether they sign- or zero-extend a loaded byte.

The inverse operation, narrowing, is simpler in mechanism: just discard the high bits. But it is dangerous in meaning. Truncating a 32-bit integer to 8 bits keeps only the low byte, which agrees with the original value only when the original fits in eight bits. Otherwise the result is the original value modulo 256, with no warning. Many serious bugs trace to a wide value being silently truncated on its way into a narrow field.

A related operation is bit extension by shifting. Shifting a value right by kk positions produces a different result depending on whether the top bit is filled with zero (a logical shift right) or with copies of the original sign bit (an arithmetic shift right). For unsigned values, logical shift right by kk is division by 2k2^k, rounded toward zero. For signed values, arithmetic shift right by kk is division by 2k2^k, rounded toward -\infty, which is not the same as ordinary integer division and is one of the rare places C explicitly leaves the behaviour implementation-defined. Most ISAs provide both kinds of right shift; left shifts behave the same way for both interpretations because they fill from the right with zeroes.

07. Fixed-Point Representation

Integers are useful, but a great deal of real-world data — sensor readings, currency, audio samples — has a fractional part. Two main strategies exist for representing such values: fixed-point and floating-point. We treat fixed-point here because it follows naturally from the integer ideas above; floating-point is a deeper topic and gets its own treatment alongside the SIMD chapters.

A fixed-point number is just an integer with an implied decimal (or rather, binary) point at a fixed position. If we agree, for instance, that an 8-bit value will be interpreted with three fractional bits, then the bit pattern 00010110 is read as

121+12223+1 \cdot 2^1 + 1 \cdot 2^2 \cdot 2^{-3} + \ldots

— more cleanly, you place the point between bits 3 and 2 and read off 0010.1102=2.750010.110_2 = 2.75. The hardware sees only an 8-bit integer; the position of the binary point lives in the programmer's head and in the surrounding code.

This style of representation is sometimes written as Q-notation: a Qm.n format has mm bits before the implied point and nn bits after. So Q4.4 is an 8-bit signed format with four integer bits and four fractional bits, ranging from −8 to just under +8 in steps of 24=0.06252^{-4} = 0.0625.

Fixed-point arithmetic is attractive in three settings. Embedded processors often lack hardware floating-point units, so fixed-point gives fractional values at integer-arithmetic speeds. Digital signal processing, where samples have a known dynamic range, is often easier to reason about with fixed-point. And anywhere absolute precision matters, such as financial calculations, fixed-point avoids the rounding surprises of floating-point: storing currency amounts as integer cents is a form of fixed-point.

The price is that the programmer has to manage the scaling. Adding two Q4.4 numbers is exactly the same as adding two 8-bit integers, which is convenient. But multiplying two Q4.4 numbers gives a Q8.8 result that takes 16 bits, and bringing it back to Q4.4 requires shifting right by four bits and possibly rounding. Get the shift wrong and your filters drift, your audio clips, and your bank balances quietly lose accuracy.

08. Decimal Representations and BCD

A related but distinct family of formats stores numbers as binary-coded decimal, or BCD. The idea is to encode each decimal digit, 0 through 9, in its own four-bit group, instead of converting the whole number to base 2. The decimal value 1234 in BCD is

Plain Text
0001 0010 0011 0100
1 2 3 4

This wastes some bits — four bits can encode sixteen patterns but BCD uses only ten of them — and arithmetic is more complicated, because each decimal digit can carry into the next at the value 10, not 16. Early processors compensated with dedicated decimal-adjust instructions, such as x86's DAA and DAS, which fix up the result of an ordinary binary addition so that each nibble represents a valid decimal digit again.

Despite the inefficiency, BCD survives in three places. Financial software sometimes uses it, or its denser cousin packed decimal, because every decimal fraction (0.10, 0.01, and so on) is exact, with none of the rounding surprises that floating-point produces when it tries to represent a tenth in binary. Mainframe systems built around the IBM zSeries provide hardware decimal arithmetic precisely for this market. And anywhere humans read raw values — the seven-segment displays of microwaves, alarm clocks, and old calculators — BCD makes the digit-by-digit translation to glowing bars trivial.

A closely related approach is decimal floating-point, standardized in IEEE 754-2008, which combines BCD-like storage of the significand with an exponent so that values like 1.23 can be stored exactly. It is supported in hardware on some IBM Power and zSeries processors and in software libraries on most others.

09. Gray Code and Other Special Encodings

Not every binary encoding is a positional number system. Some are designed so that adjacency in the encoding has a special meaning.

The most familiar example is the Gray code, also called the reflected binary code, named after Frank Gray. In a Gray code, consecutive values differ in exactly one bit. The 3-bit Gray code, for instance, runs

Plain Text
0: 000
1: 001
2: 011
3: 010
4: 110
5: 111
6: 101
7: 100

Notice that incrementing by one always flips a single bit. This property matters whenever the bits of a value are sampled by physically separate hardware, because if multiple bits change at the moment of sampling, a glitch can produce an entirely wrong intermediate value. Optical and magnetic rotary encoders — the kind found on robot joints, digital callipers, and shaft sensors — use Gray code on their tracks for this reason. Asynchronous FIFOs that cross between two clock domains traditionally store their read and write pointers in Gray code so that the receiver, sampling at a different clock, can never see a transient pointer value that did not actually exist. Gray-coded counts also reduce the switching activity of a counter, saving a small amount of power.

Other specialized encodings show up where they earn their keep: one-hot encoding, in which exactly one bit out of nn is set, is widely used inside hardware state machines because decoding which state you are in becomes free; thermometer codes, in which the low kk bits of a width-nn field are set to indicate the value kk, simplify certain analog-to-digital converters and priority encoders. None of these are number systems in the positional sense, but they are part of the working vocabulary of digital design.

10. Booleans, Flags, and Enumerations

Not every value a program manipulates is numeric. A great deal of data is categorical: a switch is on or off, a request succeeded or failed, an object is in one of a small number of named states. Hardware has no native concept of any of these; they are all encoded as bit patterns and given meaning by convention.

A Boolean value, true or false, in principle needs only one bit. In practice it is almost always stored in a whole byte (or wider), because byte addressing is cheaper than bit addressing and because compilers prefer that every variable have its own address. C's _Bool and C++'s bool are typically a single byte; Java's boolean is conceptually one bit but usually consumes a byte in arrays and four bytes elsewhere. The encoding is similarly ad hoc: false is always the all-zeroes pattern, but any non-zero value is true, so a bool field can hold the values 0 and 1 in well-behaved code and any pattern at all in code that reads memory loosely.

When many independent on/off conditions need to be tracked together, they are often packed into the bits of a single integer as a bit field or flags word. The classic example is the operating system's representation of file-open modes: read access, write access, append, create, exclusive, and so on, each assigned to one bit, ORed together to express any combination. Hardware status registers work the same way: a single 32-bit word might contain a carry flag, a zero flag, an interrupt-enable bit, a privilege level, and a dozen other things, each at a known position. The standard idioms are

C
flags |= MASK; // set
flags &= ~MASK; // clear
flags ^= MASK; // toggle
if (flags & MASK) // test

These operations — covered as Boolean algebra in the next chapter — are how programs read and write individual bits of a packed word.

An enumeration assigns symbolic names to a small set of integer values, usually starting at zero. C's enum, Java's enum, and Rust's plain enum are all variations on the same idea: at runtime the value is just an integer, but the source code uses readable names. Enumerations have to be designed with the same care as any other encoding. If new values are added later, code that switched on the old set may need updating; if values are persisted to disk or sent over a network, the numeric assignments become part of an external interface and can no longer be changed freely.

11. Pointers and Addresses as Data

One especially important kind of value is an address: the location in memory where some other value lives. To the hardware, an address is just an integer of a particular width. On a 32-bit machine, addresses are 32 bits and look exactly like 32-bit unsigned integers; on a 64-bit machine they are 64 bits, although for reasons of cost and software complexity most current processors implement only the lower 48 or 57 bits and leave the rest reserved.

In high-level languages, an address that is being used to refer to data is called a pointer. C's pointers, Rust's references, Java's object references, and Python's everything-is-a-reference model are all built on top of the same hardware machinery. The differences are in what the language allows you to do with the value: C lets you do arithmetic on it, Rust restricts that arithmetic to maintain memory safety, Java forbids it entirely.

Three facts about pointers are worth flagging now because they will keep coming up.

A pointer is the same size as an address, which is the same as a machine word on most systems. Doubling the address width doubles the memory cost of every pointer-rich data structure. This is one reason 64-bit machines, despite their many advantages, can use noticeably more memory than 32-bit machines for the same workload, and why some systems use a 32-on-64 mode (called x32 on Linux) in which addresses are 32 bits but registers are 64.

A pointer's bit pattern is meaningful only inside a single process's address space. The same pointer value loaded into a different process refers to a different (or no) object, because each process has its own private mapping from addresses to physical memory. Chapter 19 will explain why.

Addresses often have known low bits because the data they point at is aligned. A pointer to a 4-byte integer always has its bottom two bits zero, on architectures that require alignment. Some language implementations exploit this by stuffing extra information — type tags, garbage-collection bits — into those unused bits, recovering them with a mask before dereferencing. This trick is called pointer tagging, and it is the reason small integers in many dynamic-language runtimes are stored inside the pointer rather than as separate objects.

12. Character Encodings

So far we have been talking about numbers. To represent text, we need to agree on a mapping between characters and bit patterns. Such an agreement is called a character encoding, and getting them straight is one of the persistent headaches of practical computing.

ASCII

The earliest widely adopted encoding was ASCII, the American Standard Code for Information Interchange, defined in 1963. ASCII assigns a 7-bit code to each of 128 characters: the uppercase and lowercase Latin letters, the digits 0–9, common punctuation, the space, and a set of control characters such as newline (0x0A), carriage return (0x0D), and tab (0x09). Because seven bits fit easily inside an 8-bit byte, ASCII text uses one byte per character with the high bit set to zero.

A useful property of ASCII is that the codes for the digits and letters are arranged sequentially. The digit '0' is 0x30, '1' is 0x31, and so on. The uppercase 'A' is 0x41, 'B' is 0x42, and the lowercase letters start at 0x61. This is why a programmer can convert a digit character to its numeric value by subtracting '0', and toggle case by flipping a single bit (the bit at position 252^5).

C
int digit_value = ch - '0'; // works because digits are sequential
char to_upper = ch & ~0x20; // forces bit 5 off, only correct for ASCII letters

ASCII is fine if your text is in English, but the world contains many more languages and writing systems than 128 characters can accommodate.

Code pages and the early extensions

Through the 1980s and 1990s, the industry papered over this with code pages: an 8-bit encoding where the lower 128 codes still meant ASCII but the upper 128 were reinterpreted differently in different regions. ISO 8859-1 covered Western European languages, ISO 8859-7 covered Greek, Windows-1251 covered Cyrillic, and so on. The result was that the byte 0xE9 could be é, щ, or some other character entirely depending on which code page the reader's machine assumed. Mixing languages in a single document was nearly impossible. Files emailed across regions arrived as garbage. The world clearly needed a unified scheme.

Unicode and UTF-8

Unicode is that scheme. It assigns a unique number, called a code point, to every character in essentially every script in human use, with room for many more. Code points are written U+ followed by hexadecimal digits: U+0041 is the Latin capital letter A, U+00E9 is é, U+4E2D is the Chinese character 中, and U+1F600 is 😀. The current standard defines well over 100,000 code points and continues to grow.

Unicode by itself is just a numbering scheme. To store Unicode text in memory or on disk, you need an encoding that turns each code point into a sequence of bytes. Several exist:

  • UTF-32 uses four bytes for every code point. Simple, but wasteful for ASCII-heavy text.
  • UTF-16 uses two bytes for code points up to U+FFFF and four for the rest, via surrogate pairs. It is the native encoding inside Windows and Java.
  • UTF-8 uses one byte for ASCII characters, two bytes for most European scripts, three for most Asian scripts, and four for less common ones. It is fully backward-compatible with ASCII, in the sense that any valid ASCII file is also a valid UTF-8 file.

UTF-8 has emerged as the dominant encoding for text on the web, in source code, and in modern file formats. Its variable-length design encodes the length in the leading bits of the first byte: a byte starting with 0 is a one-byte sequence, 110 introduces a two-byte sequence, 1110 introduces three bytes, 11110 introduces four. Continuation bytes always start with 10. This means a decoder can resynchronize quickly after a corruption, and a search for an ASCII substring in UTF-8 text never produces a false match across character boundaries.

The cost of UTF-8 is that the number of bytes and the number of characters are no longer the same. A program that wants to count, slice, or reverse text must be careful to operate on whole code points, not raw bytes. Many real-world bugs in text-handling code come from confusing the two.

13. Strings, Arrays, and Aggregate Data

Individual characters are rarely interesting on their own; programs work with strings. Once the encoding for a single character is fixed, the next question is how a sequence of them is laid out in memory and where the sequence ends.

Two conventions dominate.

Null-terminated strings, inherited from C, store the characters one after another and mark the end with a single zero byte. The string "Hello" occupies six bytes: H, e, l, l, o, \0. The advantage is simplicity: a string is just a pointer, and any function that processes one walks forward until it sees the terminator. The disadvantage is that finding the length is an O(n)O(n) operation, the string cannot itself contain a zero byte, and many security-critical bugs come from forgetting to leave room for the terminator or from forgetting to write it at all.

Length-prefixed strings, used by Pascal and most modern systems internally, store the length first, followed by the characters. A string is now a small structure rather than just a pointer, but the length is known in constant time, embedded zero bytes are allowed, and bounds checking is straightforward. Rust's String, Go's string, Java's String, and the C++ std::string all use length-prefixed (or pointer-plus-length) layouts internally even when they interoperate with C through a null-terminated view.

The same considerations apply, in slightly different form, to arrays. The bare hardware-level array is a contiguous block of equally sized elements: an array of nn values of size ss bytes occupies exactly nsn \cdot s bytes, and the address of element ii is the base address plus isi \cdot s. This linear arithmetic is the reason arrays are so fast, and the reason that out-of-bounds access in low-level languages reads or writes whatever happens to be next in memory rather than producing a clean error. Higher-level array types add a length and possibly a separate capacity to enable safe access and dynamic growth, but the underlying contiguous storage is the same.

Multi-dimensional arrays raise an additional choice: which dimension varies fastest as you walk through memory. Row-major layout, used by C, C++, Python's NumPy by default, and most modern systems, stores a[0][0], a[0][1], a[0][2], ..., a[1][0], ..., so that the rightmost index varies fastest. Column-major layout, used by Fortran, MATLAB, and Julia, reverses this. Numerically, the two are equivalent. Performance-wise they are not: a loop that walks an array against the grain of its layout can run several times slower than one that walks with the grain, because cache lines are wasted. We will return to this in Chapter 17.

14. Error Detection and Correction Codes

Every encoding considered so far assumes the bits arrive intact. In reality, bits flip. A cosmic ray strikes a memory cell. A connector vibrates loose. A weak signal on a cable is misread. The longer the data lives, and the further it travels, the more likely some bit will be wrong by the time anyone reads it. The standard response is to spend some bits on error-detecting or error-correcting codes that let the receiver notice or repair the damage.

The simplest scheme is the parity bit. Append a single extra bit to a group of nn data bits, chosen so that the total number of 1 bits is always even (in even parity) or always odd (in odd parity). If exactly one bit later flips, the parity will no longer match and the error is detected. Two simultaneous flips, however, cancel out, and parity will incorrectly say the data is fine. Parity is therefore a one-bit-error detector, not a corrector, and it is fooled by any even number of errors. It was historically used on serial links and inside main memory for many decades.

More powerful schemes use multiple check bits arranged so that the pattern of failed checks identifies which bit went wrong. The classical example is Hamming codes, invented by Richard Hamming at Bell Labs in 1950, which can correct any single-bit error and detect (but not correct) any double-bit error. A common variant, SECDED (single-error correct, double-error detect), is the basis of the ECC memory used in servers and workstations: a 64-bit data word is stored alongside 8 check bits, and the memory controller can silently fix any single-bit flip in the resulting 72-bit codeword.

Long-distance and storage codes go further. Cyclic redundancy checks (CRCs) compute a hash-like remainder from a long message under polynomial arithmetic; they are used everywhere from Ethernet frames to ZIP archives to detect bursts of errors very efficiently. Reed–Solomon codes, treating each symbol as an element of a finite field, can correct multiple symbol errors and underpin everything from CDs and DVDs to QR codes, deep-space communication, and modern flash memory. Disk arrays add their own RAID layouts that distribute parity or Reed–Solomon information across multiple drives so that a complete drive failure can be tolerated.

For the purposes of this chapter, the point is just that error-detecting and error-correcting codes are part of how data is represented on real storage and real wires. The architectural consequences — ECC in DRAM, parity in caches, redundancy in interconnects — will appear in the relevant chapters of Part IV.

15. Range, Precision, and Resolution

A recurring source of confusion when discussing numerical representations is the difference between range, precision, and resolution. The three words sound alike and are often used loosely, but they describe different properties.

Range is the spread between the smallest and largest values a representation can hold. An 8-bit unsigned integer has a range of 0 to 255; an 8-bit signed integer, 128-128 to +127+127. Range is a property of the representation alone.

Resolution, sometimes called step size or granularity, is the smallest difference between two adjacent representable values. For an integer it is 1. For a Q4.4 fixed-point number it is 24=0.06252^{-4} = 0.0625. Resolution constrains how finely the representation can distinguish nearby values.

Precision is a measure of how many significant digits a value has. It is most useful when discussing floating-point, but it applies anywhere: an integer count of milliseconds and an integer count of nanoseconds may have the same range in principle but very different precisions when used as timestamps.

Range, resolution, and precision can be traded against each other for a fixed bit budget. Spending more bits on the integer part of a fixed-point number widens the range at the cost of resolution. Spending more bits on the fractional part does the opposite. Floating-point, which we touch on only briefly here, makes this tradeoff dynamic by separating an exponent from a significand, so that the resolution scales with the magnitude of the value. The fundamental constraint, though, is that an nn-bit representation has at most 2n2^n distinct values; no clever scheme can produce more. Choosing where to place those values along the number line is one of the more interesting design problems in computing.

16. Data Size, Alignment, and Padding

A program does not just store individual numbers; it stores compound data — structures, arrays, records — laid out in memory. How those pieces are packed together has consequences for both correctness and performance.

Why alignment matters

Memory hardware is built to read and write data in fixed-size chunks. A typical processor's memory bus might transfer 8 bytes at a time, and the cache below it might operate on 64-byte lines. When you ask the CPU to load a 4-byte integer, the hardware would much rather pick it up from an address that is a multiple of 4, because then the integer sits entirely inside one transfer. An integer straddling the boundary between two 8-byte chunks requires two loads and a stitch-together step. A value is said to be aligned when its address is a multiple of its size; otherwise it is misaligned or unaligned.

The cost of misalignment varies wildly across architectures. On many x86 implementations, misaligned loads and stores work transparently but run a little slower. On older ARM cores and many embedded processors, a misaligned access raises a hardware fault and the program crashes. SIMD instructions, which we will meet in the chapter on data-level parallelism, are particularly sensitive: many of them simply will not function on misaligned data, or have separate "aligned" and "unaligned" forms with different performance.

A reasonable rule for portable code is that values should be aligned to their natural size: 1-byte values anywhere, 2-byte values on even addresses, 4-byte values on multiples of 4, 8-byte values on multiples of 8, and so on.

Structure layout and padding

Consider a C structure containing a char, an int, and a short:

C
struct example {
char a; // 1 byte
int b; // 4 bytes
short c; // 2 bytes
};

You might expect this to occupy 1+4+2=71 + 4 + 2 = 7 bytes. On a typical 32-bit or 64-bit system, it does not. The compiler must place b on a 4-byte boundary, so it inserts three bytes of unused space — padding — after a. Then c follows immediately, since 2-byte alignment is satisfied. Finally, the structure as a whole is padded out so its size is a multiple of its strictest member's alignment, which is 4 here, giving a final size of 12 bytes:

Plain Text
offset 0: a
offset 1: pad
offset 2: pad
offset 3: pad
offset 4-7: b
offset 8-9: c
offset 10-11: trailing pad

Reordering the members can eliminate most of this waste:

C
struct better {
int b; // offset 0..3
short c; // offset 4..5
char a; // offset 6
// trailing pad at offset 7
};

Now the structure fits in 8 bytes. In a program that allocates millions of these objects, the difference is real, both in memory consumed and in cache behavior.

Endianness

There is one more subtle issue when bytes are grouped into multi-byte values: the order in which they sit in memory. A 32-bit integer with the value 0x12345678 is made up of four bytes — 12, 34, 56, 78 — but which byte goes at the lowest address?

In big-endian ordering, the most significant byte comes first:

Plain Text
address: 1000 1001 1002 1003
content: 12 34 56 78

In little-endian ordering, the least significant byte comes first:

Plain Text
address: 1000 1001 1002 1003
content: 78 56 34 12

Both conventions are internally consistent; neither is mathematically preferable. Big-endian is sometimes called "network byte order" because most internet protocols use it. Little-endian is what x86, x86-64, most ARM systems in practice, and RISC-V use. The endianness almost never matters within a single machine, because the CPU reads multi-byte values back the same way it wrote them. It matters intensely the moment data crosses a machine boundary — a file written on one architecture and read on another, or a network packet exchanged between hosts. Whenever you see code calling functions like htonl, ntohs, or bswap, it is dealing with this issue.

17. Summary

Every piece of data inside a computer is, at the bottom, a string of bits. The number system used to write those bits down is positional, almost always binary inside the hardware and hexadecimal in the documentation. Bits are grouped into bytes, halfwords, words, and doublewords; the choice of word size shapes the entire machine. Unsigned integers are read as ordinary binary values; signed integers are almost universally stored in two's complement, which lets a single adder handle both. Fixed-width arithmetic wraps around modulo 2n2^n, so overflow is silent and easy to miss, and the conditions for signed and unsigned overflow are different. Moving between widths requires a deliberate choice between zero extension and sign extension, and narrowing always risks silent truncation. Fractional values can be stored in fixed-point form by agreeing on the position of an implied binary point, while applications that need exact decimals use BCD or decimal floating-point, and specialized encodings like Gray code, one-hot, and thermometer codes earn their place wherever bit-by-bit transitions or ease of decoding matter more than density. Booleans, flags, enumerations, and pointers are all just bit patterns under the hood, given meaning by convention. Text relies on a separate layer of conventions, of which Unicode encoded as UTF-8 is now the de facto standard, and strings are arranged in memory either as null-terminated sequences or with explicit length fields. Real bits flip occasionally, so storage and interconnects spend extra bits on parity, Hamming, CRC, and Reed–Solomon codes that detect or correct the damage. Finally, when these values are laid out in memory, alignment, padding, and endianness all matter, and a program that ignores them either crashes, runs slowly, or fails to talk to its peers.

With this representational machinery in hand, we can now turn in Chapter 3 to the logic that operates on it: the algebra of true and false, and the gates that physically realize it.

Book mode
computer-architecturefundamentalsfoundations
Was this helpful?