Assignment 5

Due Wednesday November 21, 2018 23:59:59.99999 via sakai


Please answer the questions precisely and concisely. Every question can be answered in one or at most a few sentences. I will not have the patience to read long paragraphs or essays and you may lose credit for possibly correct answers.

Note: submissions must be be plain text or pdf files or HTML within sakai. Other formats, such as Microsoft Word, Apple Pages, or Adobe InDesign will not be accepted.


Wilson Hsieh, Google, Inc:
Spanner: Google’s Globally-Distributed Database 10th USENIX Symposium on Operating Systems Design and Implementation, October 8–10, 2012.
The linked page contains a presentation video as well as a link to the full paper (cited below) and the slides. Watch the video. It’s about 30 minutes long and provides a great overview of Spanner.


(1) Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber. Bigtable: A Distributed Storage System for Structured Data. ACM Trans. Comput. Syst. 26, 2, Article 4 (June 2008).
This is the definitive paper on Google Bigtable. Read the whole thing if you have time but otherwise, you only need to read the first five pages.
(2a) Hadoop Team, Introduction to MapReduce, Hadoop in Real World, February 20, 2017.
(2b) Hadoop Team, Dissecting MapReduce Components, Hadoop in Real World, February 23, 2017.
These are two really short articles that describe how Hadoop MapReduce works.
(3) (optional) Jeffrey Dean and Sanjay Ghemawat, MapReduce: Simplified Data Processing on Large Clusters, USENIX Symposium on Operating Systems Design and Implementation, 2014.
This is the definitive paper that introduces MapReduce. It is 13 pages long. You don’t need to read this to answer the questions but, if you have the time, it will give you extra insight on the motivation and design behind MapReduce.
(4) (optional) James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, JJ Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh, Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura, David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szymaniak, Christopher Taylor, Ruth Wang, Dale Woodford, Spanner: Google’s globally-distributed database, OSDI’12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation. Pages 251–264.
This is the definitive paper on Spanner. You don’t have to read it for this assignment but I recommend reading it if you’re curious about the design.


Question 1 (Bigtable)

What is an SSTable in Bigtable?

Question 2 (Bigtable)

What is a memtable in Bigtable and how when does it become an SSTable?

Question 3 (MapReduce)

How does an Input Split differ from an HDFS block?

Question 4 (MapReduce)

What two operations take place during the shuffle phase of MapReduce?

Question 5 (Spanner)

What does spanner mean by external consistency?

Question 6 (Spanner)

What is a commit wait in Spanner?