Distributed Transactions

Transactions

Transaction: A sequence of read and write operations treated as a single logical unit of work that either commits (all changes made permanent) or aborts (all changes rolled back).
Commit: The successful completion of a transaction, making all of its changes permanent and visible to other transactions.
Abort (rollback): The cancellation of a transaction, undoing all of its changes and returning the system to the state it was in before the transaction began.
Write-ahead log (WAL): A durability mechanism in which changes are written to a sequential log before being applied to the data, enabling crash recovery by replaying the log to redo committed and undo incomplete transactions.
Stable storage: Storage that survives crashes, power outages, and reboots; typically a file system with writes flushed to disk before returning.
ACID: The set of properties that define a correct database transaction: Atomicity, Consistency, Isolation, and Durability.
Atomicity: The property that a transaction is all-or-nothing: either all of its operations commit or none do.
Isolation: The property that concurrent transactions do not observe each other’s intermediate state; serializability is the standard isolation guarantee.
Durability: The property that committed transactions persist across crashes, implemented using write-ahead logging.

Concurrency Control

Concurrency control: The mechanism by which a system ensures that concurrent transactions do not interfere with each other, enforcing the isolation property.
Serializability: An isolation property for multi-operation transactions requiring that the outcome of concurrent transactions be equivalent to some serial execution of those transactions, with no real-time constraint on which serial order is chosen.
Schedule: A sequence of interleaved read and write operations from multiple concurrent transactions; a serializable schedule is one whose outcome is equivalent to some serial execution.
Pessimistic concurrency control: An approach that assumes conflicts between transactions are likely and prevents them proactively using locks.
Optimistic concurrency control (OCC): An approach that assumes conflicts are rare, allows transactions to proceed without locks, and checks for conflicts only at commit time, aborting and restarting the transaction if a conflict is found.
Two-phase locking (2PL): A pessimistic concurrency control protocol in which a transaction has a growing phase (acquires locks, releases none) followed by a shrinking phase (releases locks, acquires none), guaranteeing serializability.
Read lock (shared lock): A lock that allows multiple transactions to hold it simultaneously on the same data item, but prevents any transaction from acquiring a write lock on that item.
Write lock (exclusive lock): A lock that grants exclusive access to a data item; no other transaction may hold any lock on the item while a write lock is held.
Cascading abort: A failure condition in plain 2PL where one transaction’s abort forces other transactions that read its uncommitted data to also abort; prevented by strict or strong strict 2PL.
Strict two-phase locking: A variant of 2PL in which write locks are held until the transaction commits or aborts, preventing cascading aborts.
Strong strict two-phase locking (SS2PL): A variant of 2PL in which all locks, both read and write, are held until the transaction commits or aborts; the standard implementation in most commercial databases.
Multi-Version Concurrency Control (MVCC): A concurrency control technique in which the system maintains multiple versions of each data item, allowing readers to see a consistent snapshot without blocking writers.
Snapshot isolation: A property of MVCC systems in which each transaction reads from a consistent snapshot of the data taken at the transaction’s start time, so reads never block and are unaffected by concurrent writes.
Lease: A lock with a time limit that is automatically released when the lease expires, allowing the system to recover from the failure of a lock holder without waiting for explicit release.

Deadlock

Deadlock: A situation in which a set of transactions each hold locks needed by another transaction in the set, forming a cycle of dependencies with no transaction able to proceed.
Wait-for graph (WFG): A directed graph used for deadlock detection where each node represents a transaction and a directed edge from T1 to T2 means T1 is waiting for a resource held by T2; a cycle indicates deadlock.
Phantom deadlock: A false positive in deadlock detection caused by asynchronous collection of wait-for graph snapshots; the cycle appears in the merged graph but does not exist in the current system state.
Centralized deadlock detection: A deadlock detection approach in which one designated node collects local wait-for graphs from all nodes, merges them into a global graph, and searches for cycles.
Edge chasing: A distributed deadlock detection approach in which probe messages propagate along wait-for graph edges; a deadlock is confirmed when a probe returns to its origin.
Chandy-Misra-Haas algorithm: A distributed edge-chasing deadlock detection algorithm that uses probe messages to detect cycles in a distributed wait-for graph without requiring a central coordinator.
Wait-die: A deadlock prevention scheme in which an older transaction waits for a younger one to release a resource, but a younger transaction requesting a resource held by an older one is aborted and retried.
Wound-wait: A deadlock prevention scheme in which an older transaction preempts a younger one holding a needed resource, but a younger transaction requesting a resource held by an older one waits.

Atomic Commit Protocols

Two-Phase Commit (2PC): A distributed protocol that coordinates a commit-or-abort decision across multiple nodes by separating a voting phase from a decision phase, requiring unanimous agreement to commit.
Three-Phase Commit (3PC): An extension of 2PC that adds a pre-commit phase to eliminate blocking under single-node failure, at the cost of requiring bounded network delays; not used in practice.
Coordinator: The node in a distributed transaction that drives the commit protocol, collects votes from participants, and broadcasts the final commit or abort decision.
Participant (cohort): A node in a distributed transaction that executes part of the transaction, votes on whether it can commit, and carries out the coordinator’s final decision.
Prepare phase: The first phase of 2PC in which the coordinator requests that all participants vote on whether they can commit, and each participant writes a durable prepare record.
Uncertain state: The state a 2PC participant enters after voting yes but before receiving the coordinator’s decision; the participant cannot unilaterally commit or abort while in this state.
Blocking protocol: A commit protocol in which the failure of one node can prevent other nodes from making progress; 2PC is a blocking protocol because coordinator failure leaves participants in the uncertain state.
Fail-recover model: A failure model in which nodes that crash eventually recover and resume normal operation; assumed by 2PC.

Consistency Models

Consistency model: A specification of what values a read operation is permitted to return given a history of writes; stronger models provide more intuitive guarantees at the cost of coordination.
Linearizability: The strongest practical consistency model, requiring that every operation appear to take effect instantaneously at some point between its invocation and completion, in an order consistent with real time.
Sequential consistency: A consistency model requiring that all operations appear in some total order consistent with each process’s program order, without requiring agreement with real-time order.
Causal consistency: A consistency model that requires causally related operations to appear in the same order for all processes; causally independent operations may be observed in different orders.
Eventual consistency: A weak consistency model that guarantees only that, if no new updates are made, all replicas will eventually converge to the same value; reads may return stale or divergent data in the interim.
Strong Eventual Consistency (SEC): A strengthening of eventual consistency guaranteeing that any two nodes that have received the same set of updates will be in identical states, regardless of the order updates were received.
CRDT (Conflict-Free Replicated Data Type): A data structure designed so that concurrent updates from any replica can be merged automatically in any order without conflicts, enabling strong eventual consistency.

CAP Theorem and PACELC

CAP theorem: The theorem stating that when a network partition occurs, a distributed system cannot simultaneously guarantee both consistency (linearizability) and availability.
Partition tolerance: The ability of a distributed system to continue operating correctly despite arbitrary message loss or delay between nodes; required of any real-world distributed system.
CP system: A distributed system that prioritizes consistency over availability during a network partition, preferring to return an error rather than serve potentially stale data.
AP system: A distributed system that prioritizes availability over consistency during a network partition, preferring to serve potentially stale data rather than return an error.
PACELC: A framework extending CAP by stating that during normal operation, a distributed system must also trade off between latency and consistency.
Latency-consistency trade-off: The tension between responding quickly from a local replica (low latency, possibly stale) and coordinating with a quorum of replicas before responding (higher latency, strongly consistent).

BASE

BASE: A design philosophy for large-scale distributed systems trading strict consistency for availability and scale: Basically Available, Soft State, Eventually Consistent.
Basically Available: The property that a system prioritizes responding to requests even if the response may be stale or incomplete, rather than refusing requests to preserve consistency.
Soft state: The property that a system’s state may change over time even without new input, because replicas are asynchronously reconciling diverged data.

Distributed Transactions

Transactions

Concurrency Control

Deadlock

Atomic Commit Protocols

Consistency Models

CAP Theorem and PACELC

BASE

Back to CS 417 Documents