NoSQL and Data Models
- NoSQL
- A broad category of databases designed for horizontal scalability, flexible schemas, or high availability.
- Key-value store
- A database that maps each key to one value and offers little structure beyond key lookup.
- Column-family store
- A database that groups columns into families and allows different rows to have different columns.
- Wide-column store
- Another name for a column-family store, emphasizing that rows may contain many columns.
- Document store
- A database that stores structured documents, usually in a JSON-like form.
- Graph database
- A database optimized for storing entities and traversing relationships among them.
- Distributed hash table (DHT)
- A decentralized structure that partitions data across nodes by hashing keys.
- Hash ring
- The logical ring used in DHT-style systems to map hashed keys to nodes.
- Key-range partitioning
- Partitioning data by contiguous ranges of sorted keys.
- Hash partitioning
- Partitioning data by hashing keys so that load is spread across nodes.
Bigtable
- Bigtable
- Google’s distributed wide-column database for large-scale structured data.
- Row key
- The primary key for a row in Bigtable; rows are stored in lexicographic order by row key.
- Column family
- A named group of related columns that must be declared when the table is created.
- Column qualifier
- The part of a column name after the family, allowing dynamic columns within a family.
- Column name
- A full Bigtable column identifier of the form
family:qualifier. - Timestamp
- A version identifier that lets Bigtable store multiple versions of one cell.
- Sparse layout
- A storage model in which empty cells are not stored.
- Tablet
- A contiguous range of row keys in Bigtable; the basic unit of partitioning.
- Tablet server
- A server that stores and serves tablets.
- Master server
- A server that tracks tablet placement and reassigns tablets on failure.
- Memtable
- An in-memory sorted buffer holding recent writes before they are flushed to disk.
- SSTable
- An immutable sorted file storing key-value data on disk.
- Compaction
- The background process that merges on-disk files and removes obsolete data.
- GFS
- The Google File System, which stores Bigtable’s persistent files.
- Chubby
- Google’s coordination service, used by Bigtable for metadata and master coordination.
Cassandra
- Cassandra
- A decentralized wide-column database that combines ideas from Bigtable and Dynamo.
- Peer-to-peer architecture
- A design in which all nodes are equal and there is no master node.
- Token
- A position on Cassandra’s hash ring that defines part of a node’s key range.
- Partition key
- The key whose hash determines which partition and replica nodes store a row.
- Clustering column
- A column that determines the sorted order of rows within a partition.
- Partition
- The set of rows sharing the same partition key.
- Replication factor
- The number of replica copies Cassandra stores for each partition.
- Tunable consistency
- Cassandra’s ability to let the application choose how many replicas must respond to a read or write.
- Coordinator
- The node that receives a client request and coordinates the replicas involved in that operation.
- Commit log
- Cassandra’s durable write log used for crash recovery.
- Consistency level
- The read or write policy specifying how many replicas must respond before the operation is considered complete.
Spanner
- Spanner
- Google’s globally distributed relational database with strong transactional guarantees.
- NewSQL
- A category of systems that retain relational semantics and SQL while scaling horizontally.
- Split
- A contiguous key range in Spanner; the basic unit of partitioning.
- Spanserver
- A server that stores and serves replicated Spanner data.
- Zone
- A large deployment or failure domain in Spanner, roughly at the datacenter level.
- Paxos group
- The set of replicas that manage one split and use Paxos to agree on updates.
- Directory
- A group of related keys in Spanner that can be moved together for placement and load balancing.
- Snapshot read
- A read that returns the database as it existed at a chosen timestamp.
- External consistency
- Spanner’s guarantee that transaction order respects real time; also called strict serializability.
- TrueTime
- Spanner’s time API, which returns a bounded interval rather than one exact clock value.
- Commit wait
- The delay Spanner uses before exposing a commit so that the commit timestamp is definitely in the past.
- Universe
- The top-level name for a Spanner deployment.
Transactions and Concurrency
- Two-phase commit (2PC)
- A protocol that coordinates commit or abort across multiple participants.
- Strict two-phase locking (strict 2PL)
- A locking protocol in which locks needed for writes are held until commit or abort.
- Shared lock
- A lock that allows multiple readers but blocks writers.
- Exclusive lock
- A lock that allows one writer and blocks all other access.
- Wound-wait
- A deadlock-prevention scheme in which older transactions abort younger ones that block them.
- MVCC
- Multi-Version Concurrency Control; a scheme that keeps multiple committed versions of data.
- Serializability
- The property that concurrent transactions produce a result equivalent to some serial order.
- Linearizability
- A consistency model requiring operations to appear in an order that respects real time.
- Strict serializability
- Serializability plus real-time ordering; the transaction-level form of linearizability.
- Read-write transaction
- A transaction that may both read and modify data.
- Read-only transaction
- A transaction that only reads data and can often use a snapshot instead of locks.
Replication and Fault Tolerance
- Paxos
- A consensus protocol that lets replicas agree on one sequence of updates despite failures.
- Leader
- The replica in a consensus group that coordinates updates.
- Replica
- A copy of data stored on another server for fault tolerance or availability.
- Fault tolerance
- The ability of a system to continue operating despite failures.
- Availability
- The property that a system continues responding to requests.
- Consistency
- The rules that define what values reads are allowed to return.