CS 417 Exam 3

80 POINTS + 20 FREE POINTS: 20 QUESTIONS - 4 POINTS EACH
For each statement, select the most appropriate answer.

Which of the following is a key characteristic of the architecture of Google Bigtable?
(a) Relational schema.
(b) Wide-column storage.
(c) Key-value store.
(d) Time-series database.

In Google Bigtable, how is data partitioned and distributed?
(a) Partitioning by time-series; the order in which data was added.
(b) Partitioning by hashing row keys.
(c) Sharding by ranges of column families.
(d) Sharding by ranges of row keys.

Which partitioning strategy does Apache Cassandra use to distribute data across nodes?
(a) Consistent hashing.
(b) Range partitioning.
(c) Hash partitioning.
(d) Round-robin partitioning.

What is a property of a wide-column database?
(a) Each table in the database can be defined to have a large number of columns.
(b) A column can hold arbitrary-size binary data instead of strings or numbers as in traditional databases.
(c) The number and names of columns can vary from row to row.
(d) Each table in the database contains more columns than rows.

Cassandra supports partition and clustering keys for locating data. A clustering key:
(a) Identifies the set of servers holding replicas of the row.
(b) Determines the machine that the row containing that key belongs to.
(c) Controls the replication factor of the row associated with that key.
(d) Defines how data is sorted within a partition (node).

Spanner’s TrueTime API:
(a) Provides an accurate time of day timestamp by using a combination of atomic clocks and GPS receivers.
(b) Provides a range of time that includes the current time of day.
(c) Provides logical timestamps to ensure that each transaction gets a totally-ordered unique timestamp.
(d) Uses a distributed consensus protocol to enable multiple servers to agree on the current time.

Differing from a regular commit, the purpose of Spanner’s commit wait operation is to:
(a) Ensure that transactions get commit timestamps that reflect their observed commit sequence.
(b) Wait until a transaction has been committed before releasing any locks held by the transaction.
(c) Coordinate with other active transactions to agree on a sequence of commits.
(d) Wait for confirmation that all locks associated with the transaction have been released before committing.

DISTRIBUTED COMPUTATION

In the context of MapReduce, what is the primary role of the master node?
(a) Perform map operations.
(b) Perform reduce operations.
(c) Coordinate worker nodes and handle failures.
(d) Store intermediate results.

In the Bulk Synchronous Parallel (BSP) model, what is a superstep?
(a) A unit of parallel computation followed by a synchronization barrier.
(b) A method for partitioning data across nodes.
(c) A technique for dynamic load balancing.
(d) An approach to fault tolerance using checkpoints.

In a MapReduce problem that identifies unique visitors to a website from a large dataset of web logs, what is the primary role of the map function?
(a) Combining the counts of unique visitors for each IP address.
(b) Generating a list of IP addresses associated with unique visits.
(c) Emitting a key-value pair with the IP address as the key and a unique identifier as the value.
(d) Filtering out duplicate IP addresses before processing the data.

In Pregel, when does a vertex become inactive?
(a) After processing all incoming messages.
(b) When it explicitly votes to halt and receives no messages in the following superstep.
(c) When it explicitly votes to halt and has sent no messages for the next superstep.
(d) When all vertices in the graph have voted to halt.

What does it mean when we say that Spark transformations are lazy?
(a) A transformation is applied only when its result is needed.
(b) Transformations create a new dataset from an existing one.
(c) A transformation is executed place in parallel because an RDD is partitioned across multiple workers.
(d) Transformations only process the subset of data that is needed to generate the RDD.

Which of the following is a key property of a Spark RDD?
(a) Immutability.
(b) Efficient indexing.
(c) Fault-tolerant storage.
(d) A tight coupling between data and embedded code.

In Spark, what operation is used to create an RDD?
(a) Action.
(b) Serialization.
(c) Transformation.
(d) Mapping.

CONTENT DELIVERY

How does a content delivery network (CDN) primarily reduce latency for end users?
(a) By compressing data before transmitting it.
(b) By using more efficient routing protocols.
(c) By caching content on edge servers located closer to users.
(d) By providing dedicated bandwidth to each user.

How does Akamai’s overlay network reduce latency and improve user experience in content delivery?
(a) By utilizing a high-capacity data center to store all cached content.
(b) By dynamically selecting the optimal route based on real-time network conditions.
(c) By compressing content to reduce data size.
(d) By employing encryption algorithms to ensure secure delivery.

How does dynamic DNS benefit content delivery networks?
(a) By providing end-to-end encryption for secure communication.
(b) By directing users to the nearest edge server.
(c) By prioritizing network traffic based on content type.
(d) By offloading domain name lookups from the origin server.

What is a Kafka topic?
(a) A unique identifier for a Kafka cluster
(b) A set of records sequenced in a specific order.
(c) A process that receives records from producers and stores them.
(d) A category to which records are published.

What is the purpose of a Kafka partition?
(a) To store metadata about Kafka topics.
(b) To synchronize the ordering of records.
(c) To ensure delivery of records to consumers.
(d) To split the records of a topic across multiple brokers for parallelism and fault tolerance.

Memcached is likely to be used for:
(a) Handling real-time data streams.
(b) Processing and transforming large datasets.
(c) Ensuring data durability and persistence.
(d) Storing frequently queried database responses.