Skip to content

Latest commit

 

History

History
205 lines (129 loc) · 12.7 KB

cpu.md

File metadata and controls

205 lines (129 loc) · 12.7 KB

Processor

CPU (Central Processing Unit) - interprets instructors stores in main memory. Essentially, it's a word-size storage device called program counter (PC). PC contains address of some machine-language instruction in main-memory.

Instruction set architecture - defines how a processor operates on this very simple instruction execution model.

The processor reads the instruction from memory pointed at by the PC, interprets the bits, performs simple operations and updates PC to point to next instruction.

Instructions revolve around main memory, the register file (small storage that consists of word-size registers) and ALU.

Each instruction tells the computer to add, subtract, multiply, or divide two numbers, compare numbers to see if they are equal or which is larger, and move numbers between the CPU and a location in memory. The rest of the instructions are mostly housekeeping.

Example of such operations:

  • Load: copy a byte/word from main memory into a register
  • Store: copy a byte/word from a register to a location in main memory
  • Operate
    • copy the contents of two registers to ALU
    • perform arithmetic operation on the words
    • store results in a register
  • Jump: extract a word from the instruction and copy the word into the program counter (PC)

All of the these overwrites previous contents.

CPU components - PC (Program Counter) - Register file (array of processors registers) - ALU (Arithmetic Logic Unit) - Bus Interface (the mechanism by which CPU communicates with memory and devices)

Because processors run faster than memory, small, faster storage devices called cache memories are used inside the CPU. The main idean is that one level serves as cache for level below it. There are different levels of caches:

A single CPU can execute multiple processes concurrently by having a processor switch among them. The interleaving is performed by the OS using a mechanism known as context switching.

Transition from one process to another is managed by the kernel (resides in memory). When action (read, write a file, etc.) is required to be performed by the OS, a special system call instruction is executed which transfers the call to the kernel.

OS keeps track of all the state information that the process needs in order to run. This state is known as the context.

Threads

A thread is a unit of execution. A process can have multiple threads, running in the context of the context of a process and sharing same code and data. Threads are typical more efficient that processes, and it is easier to share data between threads than processes.

Hyperthreading - when a single CPU can execute multiple flows of control, involves having multiple copies of some of CPU hardware (program counter, register files) and having single copies of such things as floating-point arithmetic unit.

Cache

L0 (~1 kb) - Registers L1 (~256 kb) - holds tens of thousands of bytes and nearly as fast as the registers (2kb to 64kb) - Can be split into two parts: one to hold recently fetched instructions, one to hold data L2 (~1 mb) - larger cache, hundreds of thousands to millions of bytes, connected to the processor by special bus and 5 times longer for processor to access but 5 to 10 times faster than accessing the main memory L1 & L2 - are implemented using SRAM (static random access memory) L3 (~8 mb) - some more powerful systems have this level too. This is shared between CPUs. L4 (128 mb) - still quite uncommon and is generally (DRAM),

L4 - (DRAM) main memory L5 - local disk - How is memory laid out in pages, page fault (page does exist in RAM). Dirty bit is whether that page was touched and used. Cache eviction (LRU) L6 - remote storage

Cache lines - chunks of memory handled by cache are called cache lines. It is a block of data of fixed size that gets transferred between memory and cache. Cache miss -

Set - a row in the cache. The number of blocks per set is determined by the layout of cache (direct mapped, set-associative, or fully associative)

Block - the basic unit for cache storage, may contain multiple bytes/words of data.

Locality - tendency to access data and code in localized regions. By setting up caches to hold data that is frequently accessed, most memory operations can be performed using the fast caches.

Memory Management (Physical / Virtual)

Virtual memory (VM) - is one of the great ideas in CS. It is an abstraction of the physical memory that provides each process with an illusion that it has exclusive use of the memory. A process has the same uniform view of memory which is known as its virtual address space. In order to translate every address generated by the process, an interaction is required between the hardware and OS. A Linux virtual address space consists of: - kernel virtual memory, reserved for the kernel, applications do not read or write to it - user stack (created at run-time and grows downward, expands and shrinks dynamically) - shared libraries, holds code and data for shared libraries (i.e. c standard library, rust std library, etc.) - heap (run-time), expands and contracts dynamically as the result of calling malloc and free in C (alloc and dealloc in Rust) - program code (read-only code and data) and data (read/write data)

Virtual address can be mapped to either physical memory or disk.

  • Virtual to physical page translations -

Every process has it's own virtual memory mapping. Two processes might have the same virtual address mapped to different physical space

VM has 3 main important capabilities: 1. Keeps only active areas in main memory and transfers data back and forth between disk and memory. 2. Simplifies memory management by providing each process with a uniform address space. 3. Protects each addresses' address space from corruption by other processes.

Virtual memory plays a key role the design of hardware exceptions, assemblers, linkers, loaders, shared objects, files and processes. The CPU accesses main memory by generating a virtual address which is then converted (by MMU on the fly) to the appropriate physical address

Virtual - Physical memory - can be viewed as an array of fixed-sized slots called page frames. Each of these frames can contain a single virtual-memory page.

Q: Difference between a page and a frame A: Page is a contiguous virtual memory block with a set length (described by one single entry in a page table). The frame is fixed-length RAM block (physical memory). It is the smallest data unit for memory management.

DRAM cache hit is known as page fault.

Logical memory - is the address that a program uses to reference data (i.e. the address space seen by the programmer). This is provided to the

Page table

Virtual memory pages are much larger than cache blocks. A fully associative policy is used with appropriate LRU (least recently used).

The data on disk (lower level) is partitioned into blocks (these are used as transfer units between the disk and memory. Virtual memory handles this by partitioning the virtual memory into fixed-size blocks called virtual pages.

Typical physical memory page size is 4096 (typically between 512 and 8192). Page size is always a power of two. Physical memory is arranged into page frames.

Making virtual memory fast (i.e. TLBs) - translation look-aside buffers -

Paging is when virtual memory is chopped up into fixed-sized pieces/units called a page.

Files

A file is just a sequence of bytes. Every device (disk, keyboard, network etc) is modeled as file. IO in the system is performed by reading and writing files using system calls (Unix I/O).

Resources

Page table - Walking the page table -

Concurrency

  • Thread-level concurrency
  • Instruction-level parallelism - processors can execute multiple instructions at one time
  • Single-Instruction, Multiple-Data (SIMD) Parallelism - (aka short vector processors) single instruction to cause multiple operations to be performed in parallel. These instructions are provided in order to speed up applications that process image, sound and video data.

CPU

How quickly a CPU can carry out instructions is dependent on: clock speed, cores and cache. Clock speed is the speed at which the CPU can carry out instructions which is controlled by a clock. With every tick of the clock, the CPU fetches and executes instructions. The clock speed is measured in cycles per second. One cycle per second is 1 hertz. 2 GHz is 2 billion cycles per second.

The OS has itself threads (large number). The processor might only have a few (depending on the CPU).

8 physical threads but it can have more logical cores (i.e. hyper-threading)., 16 logical threads.

A hyper-thread - cpu state, it knows where it is in the instructions. Can run concurrency on the same core as another hyper-thread thread at the same time.

A processor gets an interrupt (from OS) asking it to stop what it's doing and interrupt and ask it to do something else. Preemtive OS

Increase CPU wider and more execution at the same time.

  • CPU needs to keep track where in the code we are - Instruction Pointer
  • TLB
  • CPU state is what CPU needs to know to execute
    • memory layout
    • register states
  • Process is the state of memory layout
  • registers assembly instructions are not per process, it's at the level below the process

Wide execution

  1. Wider. Can't be any faster, then do operations on many things SIMD 4x, 8x, 10x things
  2. More execution. More execution different programs at once (break work in multiple simultaneous programs)

Resource

Threads

  • a threat it knows where it is, what instructions it's executing

  • main execution threads

  • thread exists on a hardware

  • fibers - a thread and a fiber are very similar. A thread is first class citizen in OSes (exists on the hardware). Manual version on of the thread, do not execute concurrently. Saves the state of the thread. do not execute concurrently (overlapping work in the olden days)

  • logical processing units

  • core and hyper thread

    • core is a set of units that can do work (instruction decoding, alu, memory units), several instructions per clock cycle if available
    • hyper-threading -
  • Interrupt (save current state to a piece of memory and load it some other state from some other thread and resume what you are doing)

OS

Interrupt - (interrupt handler) processor decides to replace the thing that is currently running with something else (e.g. scheduling timer) could be happening periodically.

Preemptive (multi-tasking) - (preemption) more reliable, better load balancing, thread does not give permission to OS to empty it, OS comes and empties it. Cooperative (multi-tasking) - OS does not have the interrupt, every thread when it was written that has a thing that yields it (physically calls out to the scheduler), must less reliable and let's say the code is buggy never calls yield then we are stuck.

Storage

  • Smallest addressable unit of memory - 8 bit or 1 byte. Virtual memory is a very large array of bytes and that's how machine-level program views memory. Every bytes of memory is identified by a unique number (aka address). Set of such addresses is virtual address space. The value of C pointer is the virtual address of the first byte of some block os storage. Pointers (in C) reference elements of data structures and have two aspects a value (location of an object) and type (kind of object, integer or floating-point number) that are stored at a location.

Data Size

Word size is the size of the pointer data (e.g. 32bit or 64bit). To work out the max virtual addresses 2 to the power of n (n being n-bit word size). So for 64bit word sizes, this will be approximately 18 exabytes. 32bit word size is 4 byte pointer, 64bit is 8 byte (in C).

Addressing and Byte Ordering

Objects that span multiple bytes must have an address and how the bytes will be ordered in memory. Multi-byte objects are usually stored as stored as contiguous sequence of bytes, with the smallest address of the bytes used as the address.

  • Big endian - ordering of the bytes starts from the high-order byte (most significant, left-most byte to least, right-most).
  • Little endian - ordering of the bytes starts from the low-order byte (least significant, left-most byte to most, right-most).

Byte ordering becomes important when binary data is communicated over the network and the source is produced the machine of one endianness and received by another.