CS 417 Final Exam info

The final exam will be held in our regular classroom on Monday, December 19, 2016 from 8:00-10:00pm. Note that the exam is scheduled for 8:00 pm, not the usual class time. Please be sure to arrive on time! Expect around 50 multiple-choice questions in the style of those on mid-semester exams.

Remember that the final is optional and will only serve to displace a lower normalized grade on one of the three exams.

Exam rules

Be sure to arrive on time. If you arrive after the exam starts, you will not be allowed to take it.

This will be a closed book, closed notes exam. Calculators, phones, augmented reality glasses, laptops, and tablets are neither needed nor permitted. If you have these devices, you must turn them off, put them out of sight, and not access them for the duration of the exam.

No other electronic devices are permitted except for hearing aids, pacemakers, electronic nerve stimulators, other implanted medical devices, or electronic watches that function only as timekeeping devices or chronographs.

Bring a couple of pens or pencils with you. An extra pencil is affordable fault tolerance. If you want to splurge, the Palomino Blackwing 602 is considered by many to be one of the finest pencils available. The company advertises its key virtue as "half the pressure, twice the speed." If that claim is really true, using this product might help you complete the exam quicker. If you do not choose bring several extra pencils, you may want to bring a pencil sharpener. Palamino makes a companion Blackwing Long Point Sharpener. This, too, is pricey at $11.00. For a bit less money, you can get what looks like a clone: the Alvin Kum Long Point Pencil Sharpener. Both of these feature two-step sharpening: one for the wood case and another for the graphite core of the pencil. A truly beautiful sharpener is the El Casco Pencil Sharpener, but bringing this to class is really overkill, as is spending over $300 on a pencil sharpener. If you would like to learn the craft of pencil sharpening, there are several books available. The best of these may be How to Sharpen Pencils: A Practical & Theoretical Treatise on the Artisanal Craft of Pencil Sharpening for Writers, Artists, Contractors, Flange Turners, Anglesmiths, & Civil Servants by David Rees. Do not be intimidated by the omission of "students" in the title. You can read more about it at artisinalpencilsharpening.com. A video by David Rees is here. You are welcome to bring a full pencil sharpening travel kit to the exam but be aware that a proper sharpening routine may consume too much time during the exam and may be messy.

Past exams

Get past 417 exams and information about exam taking here.

Study Guide

The study guide is a concatanation of the previous three study guides. It is about 64 pages long if you print it out.

Topics

You are responsible for the material from since the start of the semester.

Topics that you should know and may be on the exam include:

Introductory material

  • Metcalfe's law
  • Classification: Flynn's and beyond
    • Flynn's taxonomy: SISD, SIMD, MIMD
    • Multiprocessors vs. multicomputers (networks of computers)
  • Symmetric multiprocessing (SMP)
  • Bus-based multiprocessors
    • Cache coherency problems
    • Snoopy cache
  • Scalability beyond bus-based multiprocessors: switched multiprocessors
    • Crossbar switch connectivity
    • NUMA
      • Processor affinity
      • Directory
      • Home snoop vs. source snoop, home agent
  • Single system image
  • Transparency goals (very high-level understanding)
    • Location
    • Migration
    • Replication
    • Concurrency
  • Service models
    • Centralized
    • Client-server
    • Layered architectures
    • Peer-to-peer
    • Hybrid

Networking

  • Infrastructure sharing
    • Multiple access problem
      • Channel partitioning (TDM, FDM)
      • Taking turns (token passing, polling)
      • Random access (packet switching)
  • End-to-end principle
  • Fate-sharing principle
  • Understand the concept of protocols and layering
  • I will not ask you to enumerate the OSI protocol stack but you should understand the functions of the physical network, transport, presentation, session, and application layers.
  • Know that ethernet gives us: unreliable, connectionless communication
  • Know that IP provides us with unreliable, connectionless communication
  • Clients and servers and services
  • Transport endpoints (addresses) vs. machine endpoints
  • Connection-oriented (virtual circuit) vs. connectionless (datagram) protocols
    • Don't get confused by virtual circuit service at the transport layer, which provides the illusion of a circuit versus a virtual circuit network, which provides end-to-end latency guarantees.
  • Data link addressing vs. network addressing: MAC address vs IP address
    • hub vs. switch
  • Reliable data transfer
    • resulting jitter from implementing reliable delivery
    • stop and wait: problems
    • cumulative acknowledgements
  • IP (Internet Protocol)
    • Internet principles: unreliable (best-effort) packet delivery, routers
    • IP address vs. ethernet address
    • Port number: what is its purpose?
    • TCP vs. UDP
    • What does TCP offer?
    • flow control and congestion control
  • Protocol encapsulation concept
    • Ethernet device driver: encapsulate IP within an ethernet packet
    • IP: encapsulates UDP or TCP packets within an IP packet
    • TCP/UDP: encapsulates user data within a TCP or UDP packet
  • Sockets
    • Concept
    • Operations: create, bind, connect, listen, shutdown, communication (know what they do, not what the parameters are)

Remote procedure calls

  • Concept of a remote procedure call (RPC)
  • Language vs. OS construct
  • Stub functions
  • Marshalling
    • Data representation
    • Pointerless representation
    • Serialization
    • Pass by reference
    • Explicit versus implicit typing
    • Multi-canonical data representation
  • Semantics of remote procedure calls
    • Failure of RPCs
    • at most once and at least once semantics
  • Interface definition language (or notation) – purpose
  • Purpose of an RPC compiler
  • Purpose of an RPC name service

Remote procedure calls: case studies

  • Sun (ONC) RPC
    • Service registration and discovery (portmapper)
    • Need for Interface definition language, IDL
    • Need for RPC compiler, rpcgen
    • Need for versioning
  • DCE RPC (improvements over SUN RPC)
    • Cells and cell directory server
    • Unique universal ID: UUID
    • DCE Network Data Representation: multi-canonical data representation
  • Microsoft COM+/DCOM/ORPC: high-level concepts
    • Object RPC (ORPC): add interface pointer identifier to DCE RPC
    • Build local COM object that uses ORPC to talk to server
    • Surrogate process (loads objects at server)
    • Marshaling: DCE Network Data Representation
    • garbage collection: remote reference counting and pinging
  • Java RMI (general concepts, service registration and lookup, serialiazable class, remote class)
    • rmiregistry
    • Remote interface (for remote references)
    • Serializable interface (for parameters)
    • Garbage collection in RMI: leasing (dirty/clean)
  • Web services (general concepts)
    • Documents and document exchange
    • What is a Service Oriented Architecture
    • XML-RPC and SOAP (very general concepts)
    • SOAP & WSDL: what is the role of WSDL?
    • Microsoft .NET remoting (just garbage collection concepts)
      • Leasing distributed garbage collector (LDGC): just understand leasing, no details
      • Garbage collection: lease timer
    • I will not ask about the Windows Communication Framework
    • REST (very general concepts)
    • Marshaling only: Google Protocol Buffers and JSON

Clock synchronization

  • What is UTC (don't need to know the other UT* versions)
  • Clock drift and skew
  • Linear compensation function
  • What's a time server?
  • Cristian's algorithm
    • Understand the goal and formula for Cristian's algorithm
    • Error bounds. Remember that errors are cumulative (additive).
  • Berkeley algorithm
    • Understand the goal and formula for the Berkeley algorithm
    • Role of master and server
    • Fault tolerant average
  • NTP
    • NTP Synchronization subnet
    • NTP strata
    • Offset
    • Message delay
    • Jitter
    • You don't need to memorize the NTP(SNTP) formula but be sure to understand how it gives you the same result as Cristian's
    • You don't need to know the symmetric mode NTP algorithms
    • You don't need to know how dispersion and jitter are calculated
    • You don't need to know the NTP message structure or validation tests
  • PTP
    • Goal
    • Role of master
    • Don't memorize the formula but understand the reason that there are three messages

Logical clocks

  • Causal vs. concurrent events
  • Lamport's algorithm
    • Goals
    • Happened-before relationship
    • Partial ordering
    • Algorithm
    • Deficiencies
    • Remedy for generating total ordered (unique) timestamps from Lamport timestamps
  • Vector clocks
    • Goals
    • Algorithm
    • Know how to identify concurrent events by comparing timestamps

Group communication

  • I won't ask about anycast
  • Closed vs. open groups; hierarchical groups
  • Don't memorize the list but understand each of these failure modes:
    • Crash failure
    • Omission failure
    • Byzantine failure
    • Partition failure
  • Use of a coordinator
  • Reliability
    • atomic: understand the fault-tolerant nature and the possible need for a persistant log. I will not ask about implementation
    • reliable: understand what it does
      • Positive vs. negative acknowledgements and feedback implosion
    • unreliable
  • Ordering
    • Good vs. bad ordering
    • Sending versus delivering messages, hold-back queue
    • Global-time ordered: understand what this means
    • total ordered: understand how to achieve this
    • causal ordered
      • understand what this is
      • precedence vector: understand how to use it
      • understand when to hold back
    • sync ordered: understand what this means
    • FIFO ordered per source: understand how to achieve this
    • unordered multicasts

Mutual exclusion

  • Types: centralized, token-based, contention-based
  • central-server algorithm
  • distributed mutual exclusion algorithms:
    • Token ring
    • Lamport
    • Ricart & Agrawala

Election algorithms

  • Bully algorithm
  • Ring algorithm
  • Chang and Roberts ring algorithm optimizations
  • network partitioning problem (split-brain/segmentation)

Virtual Synchrony

  • Faults
    • Fail silent: fail-stop vs. fail-recover
    • Byzantine failures
    • Synchronous vs. asynchronous systems
    • Two-army problem: impossibility of consensus in asynchronous systems
  • How does redundancy help achieve scalability and availability?
  • State machine replication
  • Process group
  • Virtual synchrony
    • Group view
    • View change
    • Group Membership Service
    • State transfer
    • stable vs unstable message
    • flushing messages on a view change

Consensus

  • Understand consensus requirements
    • Validity
    • Uniform agreement
    • Integrity
    • Termination (= progress)
  • Paxos
    • Goal: fault tolerant consensus algorithm
    • Quorum
    • Roles: proposer, acceptor, learner
    • Why does electing a leader simplify the role of proposers?
  • Raft (see homework)
    • Leader election
    • Committing log entries
  • Leasing vs. locking
  • Hierarchical leases

Distributed transactions

  • properties of transactions (ACID)
  • nested transactions: private work spaces, write-ahead logs
  • two-phase commit protocol
    • understand the need for a log
    • what happens in each of the phases?
  • three-phase commit protocol
    • precommit
    • coordinator recovery
    • understand the conditions of what happens to the protocol when the coordinator crashes (slide 33)
    • problems with 3PC
  • I will not ask you about Paxos commit
  • Scaling via replication and Brewer's CAP Theorem
    • Consistency
    • Availability
    • Fault tolerance
  • Eventual consistency: BASE approach vs. ACID

Concurrency control

  • schedules
  • lock manager
  • two-phase locking
    • What is the purpose? Preserves serializability ("Isolated" in ACID)
    • cascading aborts
  • strict two-phase locking
  • separating read and write locks
    • read locks, write locks, commit locks
    • two-version concurrency control
  • optimistic concurrency control
    • validation phase
    • update phase
    • timestamp ordering

Distributed deadlock

  • conditions for deadlock, deadlock resolution
  • centralized detection
    • Wait-for graph, Global wait-for graph
    • false deadlock (what's meant by this?)
  • detection: Chandy-Misra-Haas probing algorihm
    • Just the high level: a probe message circulates back to the originator
  • prevention: wait-die, wound-wait
    • Just the high level: cyclies are prevented because transactions can only wait on older (or younger) resources. I will not ask you to remember which is wait-die (old can only wait on young) or which is wound-wait (young can only wait on old)

Google Cluster Architecture

  • system design goals
  • areas of scalability: DNS load balancing, web load balancing, index & content partitioning (shards), replication of shards

Network Attached Storage

  • File service, file server
  • Service models: remote access vs. upload/downloa
  • Consistency: sequential semantics, session semantics, ambiguous semantics
  • Stateful vs. stateless design
  • Caching considerations
  • Case studies
    • NFS
      • Goals
      • Mounting filesystems, goals of automounter
      • UDP vs TCP transport
      • Directory and file access protocol (don't memorize the list of RPC calls but understand why they exist, what lookup does, why there's no open)
      • Caching
      • Inconsistencies because of caching & validation
      • Know that a user-level lock manager was added to NFS but otherwise don't bother with features of successive versions.
      • Don't memorize features of NFS v4 but know:
        • the protocol is now stateful
        • supports better caching (similar to oplocks)
        • compound RPC support added
    • AFS (versions 1,2)
      • Goals
      • Caching
      • Service model, semantics
      • Cache coherence (callbacks)
    • Coda
      • Goals
      • Replicated volumes
      • Accessible Volume Storage Group
      • Client modification log
      • Resolution
      • Disconnected operation
      • Reintegration
    • DFS (AFS v.3)
      • Goals
      • Token mechanism: don't memorize tokens but understand the goals
    • SMB/CIFS
      • Goals
      • High-level protocol (RPC-like)
      • Caching model: oplocks (don't memorize oplocks)
      • Microsoft Dfs: just know that it adds consisten naming, similar to AFS
      • SMB2 – just understand a few concepts that improved performance
        • pipelining
        • credit-based flow control
        • compounding

Distributed file systems

  • Chubby
    • Purpose: lock manager, name server, and simple file system
    • Architecture of a Chubby cell: master and replicas
    • Named locks
    • Event notification
    • Whole file reads and writes
    • Don't try to memorize the API
    • Client cache consistency
    • Use of Paxos
  • Dropbox
    • Goal
    • How does server traffic load differ from mosts other sites that store data (facebook, twitter, ...)?
    • What's a metadata server (metaserver)?
    • Why was a notification server added? Polling vs. notifications
  • GFS
    • Goal
    • chunkservers - what do they do?
    • master - what does it do?
    • Google cluster environment: colocate computation and data on same set of machines
    • chunk handle
    • operation log
    • why large chunks?
    • replication: data flow vs. control flow
  • HDFS
    • HDFS is practically a clone of GFS. Focus on GFS terminology. If I use any HDFS terminoloyg, I will identify the GFS equivalent
    • NameNode = Master
    • DataNode = Chunkserver
    • blocks = chunks
    • Transaction log = Operation log
    • You do not have to know any of the differences between GFS and HDFS (they're minor)
    • Rack-aware distribution - client tries to read from the closest replica

Naming and binding

  • Terminology (don't bother with directory service vs. name service)
  • Understand naming, binding, what a name is, what an address is
  • Pure name vs. impure name
  • Compound name
  • Static, early, and late binding
  • I will not ask you to define naming convention or context
  • DNS
    • IP addresses versus domain names
    • what does the DNS server do?
    • I will not ask you about Regional Internet Registries
    • I will not ask you about the domain name registry, registry operator, or registar but know that the registrar stores name server addresses for registered domain names.
    • How do multiple DNS servers together provide domain name service?
    • What is a zone?
    • What does a DNS resolver do?
    • What are root name servers?
    • Iterative vs. recursive name resolution

Distributed Lookup (Distributed Hash Tables)

  • Purpose: distributed, highly available key-value store
  • Central coordinator
  • Flooding
  • What is an overlay network?
  • Back propagation, time to live
  • Distributed hash table (DHT)
  • What is consistent hashing?
  • Content-Addressable Network
    • Multi-dimensional hashes, zone, node insertion
  • Chord
    • Ring, successor nodes
    • How do finger tables make query routing more efficient?

Amazon Dynamo

  • Purpose: distributed, highly available key-value store
  • distributed hash table (DHT)
  • consistent hashing
  • functional interface: get, put
  • consistency model for replication: optimistic replication
  • goals: incremental scalability, symmetry, decentralization, heterogenous systems
  • conflict resolution
    • who does it?
    • use of vector timestamps for conflict detection
  • partitioning among multiple nodes
  • virtual node

MapReduce

  • Master vs. Worker
  • shards
  • Map worker
    • Map function
    • Role of key
    • Intermediate file
    • Partition operation
  • Reduce worker
    • Sort operation
    • Reduce operation

Bigtable

  • Tablets, column families, columns, column family versions
  • Table splitting,
  • master server, tablet servers
  • role of chubby
  • eventual consistency

Google Spanner

  • Goals of Spanner
  • Techniques used: two-phase commit, strict two-phase locking, multiversion concurrency control
  • Snapshots
  • TrueTime - why do we need it?
  • Commit wait
  • Lock-free reading of snapshots

Bulk Synchronous Parallel & Pregel

  • Bulk Synchronous Parallel (BSP) framework: what is it?
  • What is a superstep?
  • What is a barrier? How does communication work in BSP?
  • What is the purpose of Pregel?
  • Understand that computing is done from the point of view of each vertex
  • What does vote to halt mean?
  • What is a combiner? How does it make messaging more efficient?
  • What is an aggregator? Global state
  • What does it mean to modify the topology of a graph?
  • What does a Pregel master do?
  • What is checkpointing? What happens after the failure of a worker?

Spark

  • Goal
  • What are Resilient Distributed Data (RDD) sets?
  • Transformations and Actions
  • Cluster manager, Executor, Jobs and Tasks
  • Fault tolerance via recomputation

Content Delivery Networks

  • Flash crowd problem
  • Load balancing, multihoming, mirroring, caching proxies
  • Goals: nearest, available, likely
  • Use and advantages of an caching overlay network in a CDN
  • Mapping system, transport system
  • Mapping system and transport system
  • Use of dynamic DNS for mapping
  • Load shedding
  • CDN benefits: caching, routing, security
  • I will not ask you about Amazon Cloudfront or Limelight

Clusters

  • Definition, single system image
  • Clustering types: high performance, batch processing, high availability, load balancing. storage (just the goals)
  • Cluster membership and split-brain
  • Cluster interconnect: understand there's an overhead to leaving the rack and to leaving the datacenter
  • System Area Network vs. a Local Area Network
  • Storage in a clustered system: distributed (network) file systems vs. clustered file systems
  • Shared disk, shared nothing, distributed lock manager
  • What is a heartbeat network?
  • Active/active vs. active/passive
  • Failover types: cold, warm, hot; multidirectional, cascading
  • What is a cluster file system? How does it differ from a network (or distributed) file system?
  • What is a storage area network?
  • What is fencing?
  • What are some of the uses of a load balancer?

Fault tolerance

  • Types of faults (transient, intermittent, permanent)
  • Fail-silent (fail-stop) vs. byzantine faults
  • Fail-restart vs. fail-stop
  • Synchronous vs. asynchronous communication systems
  • Types of redundancy: information, time, physical
  • What does it mean for a system to be k-fault tolerant
  • Active vs. passive replication
  • How many systems to you need to achieve k-fault tolerance with silent faults vs. byzantine faults
  • TMR (what does it do?)
  • Active-active vs. active-passive configurations

Cryptographic systems (just what we covered in class)

  • Types of ciphers:
    • Restricted ciphers: what are they?
    • Public key cryptography: use of two keys
    • Symmetric cryptography
      • Purpose of a secret key
      • Impact of key length
  • Key exchange (using third party, Diffie-Hellman, and public Keys)
    • Diffie-Hellman (know what it does and that it uses an (ab)mod c one-way function; don't memorize the algorithm)
    • Diffie-Hellman common key
    • Know that Diffie-Hellman public and private "keys" are not encryption keys
  • Public key cryptography (how do you use it?)
    • Understand the difference between public key and symmetric algorithms; what private keys are used for, what public keys are used for.
    • How do you use it for encryption, signatures, and key exchange?
    • Do not memorize algorithms: I will not ask you to recite the RSA or Diffie-Hellman algorithms. You should know that RSA is based on the difficulty of factoring large products of primes and that it works via an abmod c function (take a look at the slides for Diffie-Hellman and RSA key generation).
  • Hybrid cryptosystem (understand advantage and how key generation/exchange works)
  • Difference between public key and symmetric cryptography
  • One-way functions (what are they good for?, McCarthy's puzzle)
  • Hash functions (what are they good for?)
  • Digital signatures (with a public key cryptosystem)
  • Secure communciation
    • With symmetric cryptography
    • With public key cryptography
    • With a hybrid cryptosystem

Authentication

  • The three A's
    • Authentication: versus identification
    • Authorization: granting access control (based on user ID and access control list, permissions in a ticket such as Kerberos, or consulting a trusted authorization server)
    • Accounting: logging
  • Multi-factor authentication
    • What are the three factors?
  • Reusable passwords (PAP, password authentication protocol)
    • Hashed passwords, salt
    • Dictionary attacks
  • One-time passwords
    • Types: sequence-based, time-based, challenge-based
    • S/key
    • Challenge handshake authentication prtocol (CHAP)
    • SecurID: just understand that both sides have a PIN and seed and the hash includes the time of day
  • Replay attacks
  • Session key
  • Authentication with symmetric cryptography
    • Kerberos (know how it works - don't worry about memorizing details, know how tickets work and the purpose of the server). See the lecture slides. Don't worry about the distinction between the authentication server and ticket granting server.
    • Ticket Granting Server: a part of Kerberos that allows you to use a session key to request access to services instead of using your personal secret key over and over
  • I will not ask about RADIUS
  • Public key authentication
    • How does it work? Understand the use of the nonce
    • Digital (X.509) certificates (don't memorize fields but understand the purpose and how a certificate works)
  • Understand the principles of SSL - it's just a hybrid cryptosystem

OAuth & OpenID Connect

  • OAuth
    • Purpose of OAuth vs. OpenID Connect
    • Role of service provider and consumer
    • Role of access tokens: an authorization code is sent back via a browser redirect. Since the browser may not be fully trusted, the server uses the code to contact the service provider to get look up the real permission grant and get the access token.
  • OpenID Connect
    • Role of an identity provider
    • Use of HTTP redirection
    • Interaction with OAuth

Virtual Private Networks

  • Principles, not details
  • What is a network tunnel?
  • How does tunnel mode communication differ from transport mode?
  • IPsec AH (authentication header) = signature of encapsulated packet
  • IPsec ESP (encapsulating security payload) = encrypted encapsulated packet + signature
    • Don't memorize the names - just know that there are two modes: one that signs the packet and the other that encrypts it and signs it.