title |
---|
Reading List and Schedule |
Most of the reading links are based on DOI. You can find the papers in ACM DL, IEEE Explore, or on the Morgan Claypool website. You will have to log into the Campus VPN to access the papers.
Let us know on Teams if you can't find the paper or can't log into the VPN and we can upload a version of it on Teams for you.
NOTE: All chapter/section numbers are inclusive. I.e., if it's Sections 4-4.2 you should read Sections 4, 4.1, 4.1.1, 4.1.2, and Section 4.2.
You're encoraged to discuss the reading outside of class with your fellow classmates. You are welcome to use Teams to discuss the paper and ask questions. You may also find it useful to form "reading groups" to discuss the paper together.
Intro to High-performance Computer Architecture
Intro and technology
Required reading: Watch the 2019 Turing Lecture by Hennessy and Patterson. https://youtu.be/3LVeEjsn8Ts
Required reading: IEEE MICRO papers on AMD's Zen2 and Intel's Skylake.
Optional/Reference: Wikichip's coverage of Zen2 and Skylake.
![Caches! How do they even work?]({{"/assets/images/caches.jpg" | relative_url}})
Cache coherence (intro) and memory consistency
Required reading: Synthesis Lecture: A Primer on Memory Consistency and Cache Coherence, Second Edition
- Chapter 1
- Chapter 2
- Chapter 3 (Skip 3.8-3.11)
- Sections 4.1, 4.2
- Sections 5.1, 5.2-5.2.2
- Optional: Sections 5.4 and 5.9
Choice of papers for presentation on current trends in computer architecture.
- Dark Silicon and the End of Multicore Scaling
- Gables: A Roofline Model for Mobile SoCs
- ACT: designing sustainable computer systems with an architectural carbon modeling tool
- A Systematic Evaluation of Transient Execution Attacks and Defenses
- Ten Lessons From Three Generations Shaped Google's TPUv4i : Industrial Product
- Attack of the Killer Microseconds
- There's plenty of room at the Top: What will drive computer performance after Moore's law?
- Amdahl's Law in the Multicore Era
- User-selected paper. Ask Prof. Jason Lowe-Power at least 1 week in advance for approval.
Required reading: Synthesis Lecture: A Primer on Memory Consistency and Cache Coherence, Second Edition
- Chapter 6
- Sections 7-7.2.5
- Sections 8-8.2.6
- Optional: Chapter 11
Paper presentations on memory consistency models. See paper list below.
- PipeCheck: Specifying and Verifying Microarchitectural Enforcement of Memory Consistency Models
- TriCheck: Memory Model Verification at the Trisection of Software, Hardware, and ISA
- Heterogeneous-race-free memory models
- Frightening Small Children and Disconcerting Grown-ups: Concurrency in the Linux Kernel
- Non-Speculative Load-Load Reordering in TSO
- Fast RMWs for TSO: semantics and implementation
- Atomic SC for simple in-order processors
- Efficient sequential consistency via conflict ordering
- User-selected paper. Ask Prof. Jason Lowe-Power at least 1 week in advance for approval.
NO CLASS!
Paper presentations on cache coherence protocols. See paper list below.
- Token coherence: decoupling performance and correctness
- Heterogeneous system coherence for integrated CPU-GPU systems
- In-Network Snoop Ordering (INSO): Snoopy coherence on unordered interconnects
- Cache coherence for GPU architectures
- HieraGen: Automated Generation of Concurrent, Hierarchical Cache Coherence Protocols
- Spandex: A Flexible Interface for Efficient Heterogeneous Coherence
- DeNovo: Rethinking the Memory Hierarchy for Disciplined Parallelism
- Crossing Guard: Mediating Host-Accelerator Coherence Interactions
- User-selected paper. Ask Prof. Jason Lowe-Power at least 1 week in advance for approval.
Required reading: [Computer Architecture - A Quantitative Approach, 6th Edition,Appendix F (Interconnection Networks)]({{ '/Appendix_F_online.pdf' | relative_url}})
- Section F.2 and Section F.3 (skim through these sections quickly)
- Section F.4
- Section F.5
- Section F.6
- Section F.8
Project presentations.
This week, you will present a 5 minute "lightning" talk on the problem you are going to work on. See [the project page]({{"/project" | relative_url}}) for details.
Paper presentations on on-chip networks (OCNs). See paper list below.
-
Building Many-Core Processor-to-DRAM Networks with Monolithic CMOS Silicon Photonics
-
Kilo-NOC: a heterogeneous network-on-chip architecture for scalability and service guarantees
-
Network-on-Chip Microarchitecture-based Covert Channel in GPUs
-
CryoWire: wire-driven microarchitecture designs for cryogenic computing
-
User-selected paper. Ask Prof. Jason Lowe-Power at least 1 week in advance for approval.
![I hear you like virtual memory]({{"/assets/images/xzibit.jpg" | relative_url}})
Hardware support for virtualization
Required reading: Synthesis Lecture: Hardware and Software Support for Virtualization
- Chapter 1
- Sections 2-2.2
- Sections 3.2, 3.3
- Chapter 4
- Chapter 5
Warehouse scale computing
Required reading: The Datacenter as a Computer Designing Warehouse-Scale Machines, Third Edition
- Chapter 1
- Sections 2-2.3, 2.6.1
- Sections 3-3.2
- Section 5-5.3.1
Paper presentations on hardware support for virtual machines. See paper list below.
-
The Direct-to-Data (D2D) cache: navigating the cache hierarchy with a single lookup
-
CHERI: A Hybrid Capability-System Architecture for Scalable Software Compartmentalization
-
Every walk’s a hit: making page walks single-access cache hits
-
Parallel virtualized memory translation with nested elastic cuckoo page tables
-
User-selected paper. Ask Prof. Jason Lowe-Power at least 1 week in advance for approval.
Paper presentations on warehouse-scale computers. See paper list below.
- Software-Defined Far Memory in Warehouse-Scale Computers
- Attack of the killer microseconds
- Cores that don't count
- Architectural Implications of Function-as-a-Service Computing
- SoftSKU: optimizing server architectures for microservice diversity @scale
- Clearing the clouds: a study of emerging scale-out workloads on modern hardware
- Profiling a warehouse-scale computer
- AsmDB: understanding and mitigating front-end stalls in warehouse-scale computers
Required reading: General-Purpose Graphics Processor Architectures
- Chapter 1
- Sections 2-2.1
- Sections 3-3.1.1
- Sections 4-4.3.3
Required reading: Data Orchestration in Deep Learning Accelerators
- Chapter 1
- Sections 2-2.3
- Sections 3-3.2.1
- Sections 6-6.2.2
Paper presentations on GPUs. See paper list below.
-
Energy-efficient mechanisms for managing thread context in throughput processors
-
MCM-GPU: Multi-chip-module GPUs for continued performance scalability
-
Chimera: Collaborative preemption for multitasking on a shared GPU
-
Efficient Multi-GPU Shared Memory via Automatic Optimization of Fine-Grained Transfers
-
User-selected paper. Ask Prof. Jason Lowe-Power at least 1 week in advance for approval.
Paper presentations on DNN accelerators. See paper list below.
-
Buffets: An Efficient and Composable Storage Idiom for Explicit Decoupled Data Orchestration
-
TVM: An Automated End-to-End Optimizing Compiler for Deep Learning
-
Simba: Scaling Deep-Learning Inference with Multi-Chip-Module-Based Architecture
-
Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks
-
SIGMA: A Sparse and Irregular GEMM Accelerator with Flexible Interconnects for DNN Training
-
Timeloop: A Systematic Approach to DNN Accelerator Evaluation
-
User-selected paper. Ask Prof. Jason Lowe-Power at least 1 week in advance for approval.
Project presentations.
This day, you will present 10 minute presentations on your project proposal. This is a pitch to see if you can get the class to buy your solution. See [the project page]({{"/project" | relative_url}}) for details.