CS417 Exam 2

Fall 2013

Paul Krzyzanowski

Part I – 25 points

4 points
Why did Dropbox add notification servers to their architecture?

4 points
As evidenced by the benchmarks from the Google GFS paper (shown), read data rates are much higher than write data rates in a GFS cluster. To what do you attribute this performance disparity?
4 points

A logical ring of five (5) processes is shown. Process P₁ is the leader but dies. Process P₂ detects the death of P₁ and starts a ring election in a counter-clockwise direction.
(a) Describe the message that Process P₄ receives.
(b) What message does P₄ forward and to whom?
4 points
Suppose a Paxos cluster of eight acceptors along with a bunch of clients gets partitioned in half. Some clients end up communicating with one set of acceptors while others communicate with the other set. Four acceptors agree to one value while four others agree to a different value. Consensus is not achieved. What is wrong with this scenario and how does Paxos avoid this problem of network partitioning?
4 points
In MapReduce, the results of each map worker are stored in intermediate files at that worker. It would be more efficient to send a (key, value) pair to the reduce worker immediately, so the reduce worker can sort the keys as it receives each message and does not have to fetch the data from the map workers when they terminate. What is a problem with this proposal?
5 points

You have a database of records representing the inventory of a huge store (e.g., Amazon), each of which contains the fields {item, category, item-price, quantity-in-stock}. Show how you can use MapReduce to compute the total value of the inventory of each category. Outline in pseudocode the map and reduce functions. Assume that each map function gets a single record as input. An emit(key, value) function will emit a {key, value} pair from the map function. A reduce function is called with a key and an array of items that match the key. The reduce function can call print to print values.
map(item, category, item-price, quantity-in-stock):
reduce(key, value[]):

Part II – 75 points – 3 points each

For each statement, select the most appropriate answer. I will give you credit for two incorrect answers.

Which statement best describes a client's interaction with Chubby?
(a) A client contacts any Chubby server, which then forwards the request to the appropriate server in the Chubby cell.
(b) A client contacts any Chubby server, which can process the request since all Chubby servers are replicated.
(c) A client contacts the Chubby master, which then forwards the request to the appropriate server in the Chubby cell.
(d) A client contacts the Chubby master, which handles all client requests.
Dropbox differs from many other Internet-based services (e.g., Twitter, Facebook, Reddit) in that:
(a) Clients send almost as many write requests as read requests.
(b) Client requests are shorter than with most other services but very frequent.
(c) Client requests for data are far larger but very infrequent.
(d) Dropbox is almost exclusively a read-only service, making it easy to replicate servers for load balancing.
Which file systems place file metadata on separate servers from file data?
(a) Dropbox and GFS
(b) Chubby, Dropbox, and GFS
(c) AFS and GFS
(d) GFS and Chubby
A lease is:
(a) A lock that has a timeout.
(b) A lock that multiple processes can share.
(c) A fine-grained lock.
(d) A coarse-grained lock.
Compared with Lamport's mutual exclusion algorithm, Ricart & Agrawala's requires approximately:
(a) 2/3 the messages of Lamport's.
(b) 3/2 (
5x) the messages of Lamport's.
(c) 1/2 the messages of Lamport's.
(d) The same number of messages of Lamport's.
For Lamport's mutual exclusion algorithm to work correctly:
(a) No two requests can have the same timestamp.
(b) No two requests for the same resource should be sent at the same time.
(c) A majority of processes must be running.
(d) One process must be elected to run as a coordinator.
Which one of these approaches works best for choosing a winner in the ring election algorithm?
(a) Choose the first process in the list.
(b) Choose the last process in the list.
(c) Choose the process with the lowest process ID in the list.
(d) Choose a random process from the list
The Chang & Roberts optimization to the ring algorithm:
(a) Optimizes the ring to find a leader quicker.
(b) Drops certain messages to try to stop concurrent elections.
(c) Adds a coordinator to keep track of the progress of the election.
(d) Avoids searching through a list at the end by having every process replace the process ID in an incoming message with its own.
Which is the most accurate statement about active-passive replication?
(a) Only one server processes client requests; it propagates updates to others that are on standby in case it fails.
(b) Client requests are load-balanced among all servers; the server that handles a given request propagates updates to all replicas.
(c) The system consists of only one server, but it backs up its state onto permanent (durable) storage.
(d) The client sends the request to a master server, which then forwards the request to an available passive server.
In virtual synchrony, a state transfer takes place when:
(a) A process joins a group.
(b) A process leaves a group.
(c) A view change takes place.
(d) A replicated backup process takes over for a primary process.
A message is unstable when:
(a) It is not certain whether it has been delivered to all group members.
(b) It is associated with an expiration time.
(c) It has errors in it due to a byzantine fault.
(d) It has been received but not acknowledged.
Which statement is FALSE? In virtual synchrony, flush messages:
(a) Ensure that all unstable messages are delivered to their process group.
(b) Implement a barrier.
(c) Must be acknowledged before a view change completes.
(d) Discard all undelivered messages.
By sending messages through Paxos, processes can ensure that concurrent messages become:
(a) Global time ordered multicasts.
(b) Totally ordered multicasts.
(c) Causally ordered multicasts.
(d) Unordered multicasts.
With Paxos, message sequence numbers are assigned by the:
(a) Acceptor
(b) Client
(c) Learner
(d) Proposer
A write-ahead log does NOT enable a process to:
(a) Undo changes in case of an abort.
(b) Record that a transaction has committed.
(c) Record the response that was sent to a vote in a commit protocol.
(d) Achieve higher performance by prefetching data.
To abort a distributed transaction:
(a) At least one participant must vote to abort.
(b) The majority of participants must vote to abort.
(c) All of the participants must vote to abort.
(d) All live participants must vote to abort.
The three-phase commit protocol adds this to the two-phase commit protocol:
(a) It communicates the result of the "can you commit?" vote to every replica.
(b) It enables a committed transaction to roll back if any participant dies.
(c) It enables a committed transaction to roll back if the coordinator dies.
(d) The coordinator contacts all participants to ask if they agree to commit their transaction.
The motivation for an eventual consistency model was:
(a) It is impossible to have highly available replicated data that is fully consistent in a system that can survive network partitioning.
(b) Data inconsistencies are highly undesirable, so the primary focus should be on ensuring that all replicas are consistent.
(c) Distributed commit protocols can never work reliably, so data is bound to become inconsistent on some participants.
(d) Data might be in an inconsistent state during the execution of transactions but will be made consistent when they commit.
Which of these is NOT an advantage of the Chandy-Misra-Hass probe deadlock detection algorithm on a reliable but asynchronous network?
(a) There is little computation needed at each node.
(b) There is no need to construct and maintain a graph of resource dependencies.
(c) The algorithm is not subject to false deadlock.
(d) The algorithm makes it easy for a node to detect that deadlock does not exist.
The wound-wait deadlock prevention algorithm prevents deadlock by ensuring this condition does not occur:
(a) Mutual exclusion
(b) Hold and wait
(c) Non-preemption
(d) Circular wait
In contrast to two-phase locking, strict two-phase locking:
(a) Uses a two-phase commit protocol to get locks.
(b) Uses a three-phase commit protocol to get locks.
(c) Ensures that no transaction will read uncommitted data.
(d) Preserves serializability (the Isolated part of ACID).
With optimistic concurrency control, transactions:
(a) Do not lock the data that they modify.
(b) Agree to commit at the start of the transaction and do not need to run a commit protocol.
(c) Only lock data that they modify, but not data they read.
(d) Use shared locks instead of exclusive locks on data.
In Bigtable, each server is responsible for serving tablets. A tablet is:
(a) A subset of rows and column families.
(b) A subset of rows but stores all column family data for those rows.
(c) A set of one or more column families but it stores all row data for those column families.
(d) Exactly one row and one column family.
In the Bulk Synchronous Parallel (BSP) framework:
(a) Processes may have different amounts of work to do and a long running process may span multiple supersteps.
(b) One long-running process will cause every other process to wait before starting the next superstep.
(c) Processes are load-balanced so that no one process takes substantially more time to run than any other.
(d) A process may be distributed among multiple processors if it is taking a long time to complete.
A failed process in Pregel means that:
(a) The entire job has to be restarted from the beginning.
(b) The entire job has to be restarted from the last checkpoint.
(c) Only the failed process has to restart from the beginning.
(d) Only the failed process has to restart from the last checkpoint.