When & Where
The third exam will be held in our regular classroom on Monday, April 27, 2026.
It will take up about half the lecture, starting approximately during the second half of the class period. Please arrive on time and do not plan on coming in just to take the exam. If you arrive after the exam has started, you will not be allowed to take it.
Exam rules
Be sure to arrive on time. If you arrive after the exam starts, you will not be allowed to take it.
This will be a closed book, closed notes exam. Calculators, phones, augmented reality glasses, laptops, and tablets are neither needed nor permitted. If you have these devices, you must turn them off, put them out of sight, and not access them for the duration of the exam.
No other electronic devices are permitted except for hearing aids, pacemakers, electronic nerve stimulators, other implanted medical devices, or electronic watches that function only as timekeeping devices or chronographs.
Bring a couple of pens or pencils with you.
Plan to use a pen only if you are supremely confident in not changing your mind about your answers. . Check here for information about pencils, sharpeners, and the craft of pencil sharpening.
Past exams
You can use my past exams as a guide to what this exam may look like. Some material has changed, so do not worry about questions that appear to relate to topics we have not covered. We covered topics I didn’t cover in past classes and removed some topics that seemed too detailed or are no longer relevant.
Expect around 25 multiple-choice questions. I do not refer to old exams when I come up with a new one, so it is likely that some of the topics that I considered important in past exams will show up on future exams.
Study guide
You are responsible for the material since the last exam (the four lectures and recitations starting from week 9).
The study guide is a concatenation of the study guides from the past lectures. It attempts to cover most of the material you should know. It is not a substitute for the lectures, lecture material, and other reading matter. All the material may not be in the guide. My goal is to put most of the information you need to know a concise with fewer elaborations.
You can also prepare your own guide, which would be a much better way to prepare for the exam!
Topics
Here is a list of topics you are expected to know for the exam. It should be viewed as a coverage guide rather than a catalog of detailed algorithms or definitions to memorize.
Use it as a checklist. Go through each item and ask yourself whether you understand the concept and could recognize or apply it in context. If not, review the relevant notes.
Because the exam is multiple choice, your focus should be on understanding and being able to reason about the material, not on memorizing or reciting formal definitions.
Topics that you be familiar with and may be on the exam include:
Distributed Databases
-
NoSQL models
-
Key-value stores
-
Column-family or wide-column stores
Bigtable
-
Row keys, column families, column qualifiers
-
Timestamps
-
Sparse storage
-
Tablets, splitting, versions
-
Tablet servers, master server
-
Memtable, SSTable
-
Consistency model
-
Single-row atomicity; no transactions across rows
Cassandra
-
Peer-to-peer design
-
Distributed hash table, hash ring, virtual nodes
-
Data model: Partition key, Clustering columns
-
Replication factor
-
Tunable consistency
-
Memtable, SSTable, compaction
Spanner
-
Splits and Paxos groups
-
Two-phase commit
-
Strict two-phase locking
-
Wound-wait
-
Multiversion concurrency control
-
TrueTime and commit wait
-
Read-only transactions
-
Snapshot reads
-
External consistency
Database Comparisons
-
Key-range partitioning vs. hash partitioning
-
Bigtable vs. Cassandra
-
Paxos vs. two-phase commit
-
Serializability vs. external consistency
Distributed Computation
MapReduce
-
Master and workers
-
Shards
-
Map tasks, reduce tasks
-
Intermediate key-value pairs
-
Partitioning by key
-
Shuffle and sort
-
Locality
-
Stragglers, speculative execution
-
Failure handling
BSP
-
Supersteps
-
Barrier synchronization
-
Message passing
-
Checkpointing
Pregel / Giraph
-
Vertex-centric computation
-
Supersteps
-
Message passing
-
Vote to halt
-
Active and inactive vertices
-
Global termination
Spark
-
Driver program, workers, executors
-
RDDs, immutability
-
Partitioning
-
Transformations and actions
-
Lazy evaluation
-
Lineage
-
Caching
-
Narrow vs. wide dependencies
-
Shuffle
Distributed Machine Learning
- I won’t ask about Distributed Machine Learning
Comparisons
-
MapReduce vs. Spark
-
BSP vs. Pregel
-
Batch vs. iterative workloads
Data in Motion
-
Event streaming
-
CDNs
-
Peer-to-peer distribution
Kafka and event streaming
-
Queuing vs. publish-subscribe
-
Brokers
-
Producers, consumers, consumer groups
-
Topics and partitioned logs
-
Offsets
-
Ordering within a partition
-
Replication: Leaders and followers
-
Scale and fault tolerance
-
Retention and replay
Stream prcessing
-
Stateless vs. stateful processing
-
Idempotent sinks
-
Backpressure
-
Event time vs. processing time
-
Tumbling, sliding, session windows
-
Watermarks
-
Spark Structured Streaming
-
Micro-batch model
-
Unbouned table
-
Checkpointing
-
-
Apache Flink
- Just know that it does record-at-a-time streaming
CDNs
-
Flash crowd problem
-
Browser caching, caching proxies, load balancing, mirroring
-
Origin servers and edge servers
-
Push CDN, pull CDN
-
Static content, dynamic content, streaming video
Akamai
-
Overlay network, routing
-
Caching hierarchy
-
Request mapping
-
Edge caching
BitTorrent
-
Peer-to-peer architecture
-
.torrent file
-
Swarm
-
Trackers, seeders, and leechers
-
Pieces, Rarest-first
-
Tit-for-tat incentive model
-
Use of DHT (not how it is implemented)
-
Scalability and fault tolerance
Edge computing
- I won’t ask you about edge computing
Comparisons
-
RabbitMQ vs. Kafka
-
Event streaming vs. traditional messaging
-
CDN delivery vs. direct origin delivery
-
DNS-based routing vs. anycast
-
CDN vs. BitTorrent distribution
Security in Distributed Systems
-
Security goals: Confidentiality, integrity, authentication, authorization, non-repudiation
-
Lateral movement
Cryptography
-
Symmetric and asymmetric cryptography
-
Hashes, MACs, digital signatures
-
Replay attacks, freshness mechanisms
TLS and certificates
-
TLS vs. mTLS: what they do, what they authenticate, not the protocol steps
-
Certificates, certificate authorities (CA), certificate chain concept: purpose of each
-
Expiration, rotation, revocation
Identity and access
-
Authentication vs. authorization
-
OAuth, OpenID Connect: what is each used for?
-
Access tokens, ID tokens, refresh tokens
-
JWTs and claims: just the concept
-
Workload identity, SPIFFE - just what it does
-
Principle of least privilege
Architecture and operations
-
Zero Trust concept
-
Micro-segmentation
-
API gateway: purpose
-
Service meshes: purpose
-
Don’t bother remembering north-south, east-west
-
You don’t have to know secret management
-
Short-lived credentials vs. long-lived keys
Design mistakes
-
Trusting the internal network
-
Authenticating users but not services
-
Long-lived shared secrets
-
Missing authorization checks
-
Over-privileged roles
-
Storing secrets
Last update: Sun Apr 19 23:49:11 2026