Curated list of miscellaneous stuff that will probably eventually help me.
A very incomplete mishmash of occasionally-hard-to-separate stuff about operating systems, databases, distributed systems, etc. I also have a blog where I talk about this kind of stuff.
- Readings in Database Systems - aka the red book (for databases, not the other one), new and in website form. Awesome introductory readings on databases.
- Architecture of a Database System
- The Gamma Database Machine Project - popularized -- but did not introduce -- the concepts of sharding, and hash join algorithms.
DHTs are different from databases because they generally assume a much more decentralized environment, but they're important for understanding some techniques used by modern distributed databases.
- Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications -
- Kademlia: A Peer-to-peer Information System Based on the XOR Metric
- Dynamo: Amazon’s Highly Available Key-value Store - The original NoSQL datastore. Uses a bunch of really neat tricks. Inspiration for Riak and Cassandra, and a bunch of others.
- Bigtable: A Distributed Storage System for Structured Data - The other original NoSQL datastore. Bigtable is interesting mainly because of its single-machine I/O optimizations.
- Megastore: Providing Scalable, Highly Available Storage for Interactive Services - a database built on top of Bigtable. It is probably better known in the form of Google Cloud Datastore.
- Also see Spanner: Google’s Globally-Distributed Database, which has a general design that's clearly inspired by Megastore, but has a bunch of interesting techniques. It literally uses atomic clocks to do multi-version concurrency control.
- The Google File System - an early example of a modern distributed file system.
- MapReduce: Simplified Data Processing on Large Clusters - A very simple abstraction for doing large-scale data processing.
- Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing - an abstraction that improves over (and subsumes) MapReduce in a few ways. It is probably more well-known as the core of a system called Spark.
- The Part-Time Parliament - Leslie Lamport's super confusing paper on the Paxos protocol.
- Paxos Made Simple - A somewhat simpler to understand paper on the same topic, by the same author.
- Paxos Made Live - An Engineering Perspective - Google's experiences building Chubby, and why Paxos is so hard in real life.
- In Search of an Understandable Consensus Algorithm - introduced Raft, a consensus algorithm that seems to be becoming more popular than Paxos.
- Calvin: Fast Distributed Transactions for Partitioned Database Systems
- MaaT: Effective and scalable coordination of distributed transactions in the cloud
- Operating Systems: Three Easy Pieces - Very nice introductory to operating systems stuff, with good coverage of new technology. The whimsical tone makes it a fun read. Better than the Dinosaur Book in my opinion.
- Showstopper! - About the development of Windows NT. Given the amount of computing analogies, I think this is intended for a nontechnical audience. Still a great read regardless, with lots of insight into various OS design decisions.
- The Datacenter as a Computer
- The Datacenter Needs an Operating System - Zaharia et al. makes a very good case for why we need OS-like functionality for large clusters of machines.
- Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center - Also by Zaharia et al.; introduced the idea of a two-level datacenter scheduler.
- Large-scale cluster management at Google with Borg - Unbeknownst to everyone, Google has had a datacenter operating system for more than a decade. This is the precursor to the open-source Kubernetes.
- Omega: flexible, scalable schedulers for large compute clusters - documents a project that attempted to replace Borg. Interestingly, this was published before the Borg paper.
- The Chubby lock service for loosely-coupled distributed systems - describes how Google does metadata storage for distributed systems.
- The Night Watch by James Mickens
This section has some overlap in terms of coverage with the Systems sections above, but is more focused on a real-life software engineering persepctive.
- Architecture of Open Source Applications - Contains short chapters on the architectures of various open source software.
- ZeroMQ - The Guide - Good intro to messaging patterns and ZeroMQ.
- Scaling Pinterest
- How We've Scaled Dropbox
- Facebook and Memcached
- Storage Systems at a Rapidly Scaling Startup with a Small Team - A story from Data@Scale of how Instagram scaled their datastores.
- Zookeeper Resilience at Pinterest - How Pinterest implements ZooKeeper for service discovery.
- How Does a Relational Database Work?
- Why You Should Never Use MongoDB
- Enterprise Integration Patterns - General primer on distributed messaging.
- Adventures in message queues
- Learn to stop using shiny new things and love MySQL by Marty Weiner - Great advice about technology decisions for a startup/new product, especially on where you should put your data.
- TAO: The Power of the Graph - on Facebook's distributed graph-oriented datastore.
- Dropbox's Edgestore
- Composing Music With Recurrent Neural Networks
- Machine Learning is Way Easier Than It Looks
- Machine Learning is Fun
- Good and Bad Reasons to Become an Entrepreneur by Dustin Moskovitz
- Founders at Work by Jessica Livingston et al. - Collection of interviews with various tech entrepreneurs.
- Secrets of the Rockstar Programmers by Ed Burns et al. - Interviews of software experts.
- Zero to One by Peter Thiel - notes from Peter Thiel's Startups class at Stanford.
- The Hard Thing About Hard Things by Ben Horowitz - on the unglamorous parts of running a company.
- Viral Loop by Adam L. Penenberg - How successful tech startups grow virally.
- Once You're Lucky, Twice You're Good by Sarah Lacy - A look at the startup world just before the 2008 crisis.
- What’s Your Hour in ‘Silicon Valley Time’? - explains the concept of Silicon Valley Time, i.e. stages of a tech startup.
- Startup Product Development
- Are "Better" Ideas More Likely to Succeed? An Empirical Analysis of Startup Evaluation
- Work Hard, Live Well by Dustin Moskovitz
- Early Excite History by Ryan McIntyre - slides on the history of Excite. Has some discussions near the end about why Google survived and Excite didn't.
- Meta-Programming: A Software Production Method by Charles Simonyi - a paper on organizing software developers for productivity.
- Three Stories by Justin Kan
- How to Build Great Products - A blog post on identifying good features to build for a product, and how to allocate resourcces.
- 10x not 10%
- Breakout Career Notes - Collections of career tips from tech entrepreneurs and VCs.
- Pmarca Guide to Career Planning - Career guide from Marc Andreesen.
- Google's Guide for Technical Development
- Brian Bi on K&R C - Blog post about early installment weirdness in the C language.
- Codepen Thing - A really awesome visualization of a Fourier transform.
- Paul Graham - founder of YC and Viaweb.
- Jamie Zawinski - well-known Netscape/Mozilla engineer.
- Jacques Mattheij - cool stuff about tech
- Evan Miller - guy writes about applied math, programming languages, random tech stuff...
- Ben Horowitz - one half of a16z.
- Purple Motes - Interesting stuff from history
- The Codeless Code - Zen koans, but for software development.
- Bay Area to Standard American English Translator