Active

The theory behind Yantra

A ground-up CPU RTL design in SystemVerilog — from combinational building blocks through a fully pipelined processor, with open-source tooling.

Source Code

What is a CPU?

CPU stands for Central Processing Unit. As the name indicates, the primary task of a CPU is to process data or instructions as per certain rules called Algorithms and it does this over and over, several times in a second. At a hardware level, a CPU is a complex integrated circuit containing billions of microscopic transistors. These transistors act as switches to process the binary data or instructions and execute the machine-level code that drives operating systems and software applications.

Core Architectural Components

The internal architecture of a modern CPU relies on several integrated components working in tandem to process data efficiently.

Control Unit (CU): This component directs the operation inside the CPU.
Arithmetic Logic Unit (ALU): The ALU performs all arithmetic operations (addition, subtraction, etc.) and logical operations (AND, OR, NOT, stc.).
Registers: These are small, extremely fast memory locations built directly into the CPU, nanometers away from the ALU. They hold the data and instructions that the CPU is currently processing.
Cache Memory: This is a small amount of high-speed static RAM (SRAM) located directly on or very close to the CPU die. It stores copies of frequently used data from main memory to reduce access latency. Modern architectures typically divide this into L1, L2, and L3 cache tiers.
Memory Management Unit (MMU): This component handles translating virtual memory addresses to physical memory addresses, enabling the operating system to manage memory efficiently and securely.

Instruction Cycle

The functioning of the CPU, at a very fundamental level can be best understood from the Instruction Cycle, commonly referred to as the fetch-decode-execute-write (FDEW) cycle.

Fetch: The Control Unit retrieves the next instruction from the main memory (or cache) using the address stored in the Program Counter register.
Decode: The instruction is broken down into signals that control other parts of the CPU.
Execute: The CPU performs the operation dictated by the decoded instruction. This might involve the ALU performing a calculation or moving data between registers.
Writeback: The result of the execution is stored back into a register or written to main memory.

Once the write is complete the CPU inadvertently goes to the Fetch stage and the process keeps on cycling.

Functions that Yantra CPU can perform

So now we have understood that the CPU executes some instructions and gives us output. We need to now figure out what functions can our Yantra CPU perform (or what kind of instructions can Yantra execute). The functions can be broadly classified into four distinct categories:

Compute: These are the add, subtract, compare, bitwise logic (AND, OR, XOR), shift left or right operations. This is the "calculator" part of the CPU. Without this, Yantra cannot do any useful work.
Remember: The CPU needs fast scratchpad storage to hold the values it is currently working with. These are tiny, ultra-fast storage slots right inside the CPU called registers.
Decide: The CPU must decide whether to execute the instructions in a straight line, instruction after instruction or do something else. In real programs we often encounter situations like "if this happens, then do that other thing" logic. Branch instructions let the CPU skip ahead or jump back to a different instruction based on a condition (like "is this number zero?" or "is A greater than B?"). Loops are just branches that jump backward.
Move Data: Registers are fast but tiny. A real program on the other hand, works with arrays, strings, large data structures that lives in the big but slow main memory (RAM). Load/store instructions move data between memory and registers. The CPU pulls data into registers, computes, then pushes results back to memory.

Design Considerations

We need to remember that the instructions and data resides mainly in the main memory (RAM) which is slow. Our Yantra-CPU must therefore be able to pull data from RAM into registers (slow, but it has to only do it once), do all the work in the registers (fast), then push the result back to RAM it is done.

So we need registers, but we need to decide on the number of such registers. This is a genuine tradeoff with real consequences. More registers means the CPU can hold more values at once without going back to slow RAM. But more registers also means:

More hardware: Each register is a physical circuit. $32$ registers costs twice the silicon as $16$ .
Wider instructions: Every instruction that says "use register X" needs enough bits to name which register. If we have 8 registers, we need 3 bits to pick one (because $2^3=8$ ). If we have 16, we need 4 bits (because $2^4=16$ ). If we have 32, we would therefore need 5 bits (because $2^5=32$ ).

Thus it is clear that as we increase the number of registers, the no. of bits required to access those registers increases as well. The bit costs adds up fast.

The Yantra CPU that we are designing is 32-bit wide. This means the encoding of all the instructions in the CPU will have to be done using these bits. Which equivalently means, that we have to describe the operation, registers involved, and any extra data within this fixed budget. Therefore, as described above, if we use a lot of registers, we are blocking the available bits to address all the registers.