Goal: provide a way to order and reason about events across multiple machines when physical clocks are skewed or uncertain, so distributed systems can preserve causality and make consistent decisions about “what happened before what.”

Time is an illusion. Lunchtime doubly so.
– Douglas Adams, The Hitchhiker’s Guide to the Galaxy.

Why Physical Timestamps Are Not Enough

Synchronizing clocks cannot solve all timing problems in distributed systems. Even perfectly synchronized clocks face fundamental limits.

Consider a distributed database processing hundreds of thousands of transactions per second. If each transaction updates replicated data on multiple machines, those updates happen in microseconds or less. Even with nanosecond-resolution timestamps, multiple events can occur simultaneously from the clock’s perspective. When events occur faster than clock resolution, timestamps alone cannot order them.

More fundamentally, network transmission delays mean that the time an event is timestamped at one machine may not reflect when it becomes visible at another machine. If machine A timestamps an event at 12:00:00.000000 and sends it to machine B, which receives it at 12:00:00.000500 (500 microseconds later according to B’s clock), we cannot determine from timestamps alone whether A’s event happened before B’s local events at 12:00:00.000200. Even with perfect synchronization, network delays obscure the true ordering of events.

Consider a distributed database where two clients concurrently update the same record. What matters is not when the updates occurred in absolute time, but whether one update could have seen the other. If the updates are truly concurrent (neither saw the other), the system needs conflict resolution. If one happened after the other, the later one supersedes (overwrites) the earlier.

This is a question of causality, not chronology. Logical clocks answer this question without relying on synchronized physical clocks.

The Happened-Before Relationship

Leslie Lamport’s 1978 paper, Time, Clocks, and the Ordering of Events in a Distributed System, revolutionized how we think about time in distributed systems. Lamport realized that physical time does not matter. What matters is the potential causal relationship between events.

Lamport defined the happened-before relationship, written as \(\rightarrow\), as a partial ordering on events. A partial ordering allows some pairs of elements to be incomparable; in our case, concurrent events have no ordering. This contrasts with a total ordering, where every pair of elements must be ordered (we’ll see how to create a total ordering from Lamport timestamps later).

Definition: For events \(a\) and \(b\):

If \(a\) and \(b\) occur on the same process and \(a\) occurs before \(b\) in that process’s execution, then \(a \rightarrow b\)
If \(a\) is the event of sending a message and \(b\) is the event of receiving that message, then \(a \rightarrow b\)
If \(a \rightarrow b\) and \(b \rightarrow c\), then \(a \rightarrow c\) (transitivity)

If neither \(a \rightarrow b\) nor \(b \rightarrow a\), we say \(a\) and \(b\) are concurrent, written \(a \parallel b\).

The happened-before relationship captures potential causality. If \(a \rightarrow b\), then \(a\) could have caused \(b\). Information from event \(a\) could have reached event \(b\) through the system’s communication channels. If \(a \parallel b\), then neither event could have influenced the other. They happened independently.

This might seem abstract, but it solves concrete problems. A version control system needs to know if two edits to the same file are concurrent (conflicting) or causally ordered (one supersedes the other). A distributed database needs to know if two writes are concurrent (requiring conflict resolution) or if one came after seeing the other.

Lamport Timestamps

Lamport timestamps assign each event a logical clock value such that if \(a \rightarrow b\), then the timestamp of a is less than the timestamp of b.

Note the direction of this implication. We can guarantee that causally ordered events have increasing timestamps. However, the converse is not true: if timestamp(a) < timestamp(b), we cannot conclude that \(a \rightarrow b\). The events might be concurrent and merely assigned different timestamps. For instance, think of one process that creates many more events than another process, but never communicates. Its counter will increase faster.

The Algorithm

Each process maintains a counter, initially set to zero.

On an internal event: Increment the counter.

When sending a message: Increment the counter and include the counter value in the message.

When receiving a message with timestamp T: Set counter = max(counter, T) + 1.

This simple protocol ensures the happened-before property. If \(a \rightarrow b\), then timestamp(a) < timestamp(b).

Proof sketch: The counter never decreases. On internal events and sends, it increases. On receipt, it is set to at least one more than the sender’s counter. If \(a \rightarrow b\) through a chain of events and messages, each step increases the counter, so timestamp(a) < timestamp(b).

Creating a Total Order

Lamport timestamps only partially order events. Two concurrent events might have the same timestamp. To create a total ordering (every pair of events is ordered), combine timestamps with process IDs.

Define the ordering: \((t_1, p_1) < (t_2, p_2)\) if:

\(t_1 < t_2\), or
\(t_1 = t_2\) and \(p_1 < p_2\)

This breaks ties by process ID. Now every pair of events has a definite order, even if they are concurrent.

Total ordering is useful for some algorithms. For example, a distributed mutual exclusion algorithm can use timestamps to order lock requests. The request with the lowest timestamp wins, with process ID breaking ties. Each timestamp must be a unique value since all systems must make the same comparisons and reach the same decision.

The Limitation

Lamport timestamps cannot detect concurrency. If you observe timestamp(a) < timestamp(b), you cannot determine whether:

\(a \rightarrow b\) (a causally precedes b), or
\(a \parallel b\) (a and b are concurrent)

For a replicated database, this is a problem. Suppose two replicas receive writes with timestamps 42 and 45. The system cannot tell if these writes are concurrent (conflicting) or if write 45 occurred after seeing write 42 (non-conflicting).

This limitation led to the development of vector clocks.

Where Lamport Timestamps Appear

Pure Lamport timestamps are rarely used in production systems because most applications need to detect concurrency, not just order events. However, the Lamport happened-before relationship is foundational to all distributed systems.

You will encounter Lamport-style ordering in:

Total-order broadcast: Some message queue implementations use Lamport-style ordering to guarantee consistent message delivery order across all consumers.

Distributed mutual exclusion: Algorithms that require processes to agree on resource access order use Lamport timestamps to arbitrate between competing requests.

Academic literature: Lamport’s 1978 paper fundamentally changed how computer scientists think about time and causality. Every logical clock mechanism builds on his insight that physical time is less important than the potential for causal influence.

While you may not directly implement Lamport timestamps in most systems, understanding the happened-before relationship is essential for reasoning about the correctness of distributed systems.

Vector Clocks

Vector clocks, independently developed by Colin Fidge and Friedemann Mattern in 1988, fully capture causal relationships. Unlike Lamport timestamps, vector clocks can detect concurrent events.

The Idea

Instead of maintaining a single counter, each process maintains a vector of counters, one for each process in the system. Think of it as each process tracking “what I know about everyone’s progress.”

For a system with n processes:

Process P₁ maintains vector V₁ = [c₁₁, c₁₂, …, c_1n]
Process P₂ maintains vector V₂ = [c₂₁, c₂₂, …, c_2n]
And so on

The entry V₁[2] represents “process P₁’s knowledge of how many events process P₂ has executed.” Initially, all entries are zero.

The Algorithm

On an internal event at process P_i: Increment V_i[i] (your own position in your vector).

When sending a message from process P_i: Increment V_i[i] and include the entire vector V_i in the message.

When process P_i receives a message with vector V_msg:

For each j, set V_i[j] = max(V_i[j], V_msg[j])
Increment V_i[i]

This propagates knowledge: when you receive a message, you learn everything the sender knew, plus you advance your own position.

Comparing Vector Clocks

Vector clocks enable us to determine the relationship between any two events by comparing their vectors component-wise:

Vectors V_a and V_b are:

Causal (\(a \rightarrow b\)): V_a < V_b if:

V_a[i] \(\leq\) V_b[i] for all i, AND
V_a[i] < V_b[i] for at least one i

Same event: V_a = V_b if:

V_a[i] = V_b[i] for all i

Concurrent \(a \parallel b\): V_a || V_b if:

Neither V_a < V_b nor V_b < V_a
That is, there exist indices where V_a > V_b and other indices where V_a < V_b

This comparison tells us exactly the causal relationship. If two events have concurrent vector clocks, they truly are concurrent. Neither could have influenced the other.

An Analogy/Example: Group Lunch

Imagine a group of friends trying to decide where to eat lunch. They use a group chat to propose and discuss options.

Alice suggests pizza. Her message gets timestamp [1, 0, 0] (Alice is process 1).
Bob independently suggests sushi before seeing Alice’s message. His message gets timestamp [0, 1, 0] (Bob is process 2).
Carol sees both suggestions. Her message “How about we vote?” gets timestamp [1, 1, 1]. Why? She saw Alice’s proposal (timestamp 1 for Alice) and Bob’s proposal (timestamp 1 for Bob), and this is her first message (timestamp 1 for Carol).

Now compare:

Alice’s message [1, 0, 0] and Bob’s message [0, 1, 0]
- Alice[1] = 1 > 0 = Bob[1], but Alice[2] = 0 < 1 = Bob[2]
- Neither vector is less than the other: concurrent
- This makes sense: they sent independent proposals without seeing each other’s messages
Alice’s message [1, 0, 0] and Carol’s message [1, 1, 1]
- Alice[i] \(\leq\) Carol[i] for all i, with inequality for positions 2 and 3
- \(Alice \rightarrow Carol\): Alice’s message happened-before Carol’s
- This makes sense: Carol saw Alice’s message before replying

Vector clocks capture this intuition: if you have seen someone’s message, your vector reflects their progress. If you have not, it does not.

Vector Clocks in Actual Use: Sets of Tuples

The description above assumes a fixed set of processes with known IDs. In practice, distributed systems often do not know all participants in advance. Processes join and leave. How do vector clocks work in this environment?

Instead of maintaining a fixed-size array, processes maintain a set of (processID, counter) tuples. Each tuple tracks one process’s logical time.

For example:

{(A, 5), (B, 3), (D, 2)}

This means:

Process A has executed 5 events (that I know about)
Process B has executed 3 events
Process D has executed 2 events
I have not heard from processes C, E, or any others

When merging vectors on message receipt, you take the union of the two sets, keeping the maximum counter for each process ID:

Local: {(A, 5), (B, 3), (D, 2)}
Received: {(A, 4), (C, 7), (D, 3)}
Merged: {(A, 5), (B, 3), (C, 7), (D, 3)}

This representation naturally handles systems where:

Processes join dynamically (a new tuple appears when you first receive a message from that process)
Not all processes communicate with all others (you only track processes you have heard from)
Process IDs are globally unique but not sequentially numbered (they could be UUIDs or machine names)

This is how vector clocks are actually implemented in distributed storage systems like Amazon Dynamo and Riak. The vector is a dictionary or map, not an array.

Process Identification in Distributed Systems

In these algorithms, “process ID” refers to a globally unique identifier for a computation entity, not a local process ID. Common approaches include:

Hostname + local PID: Combining machine name with the operating system’s process ID (e.g., “server-3:42891”). This is simple but breaks if a process crashes and restarts with a new PID.

Node ID: Assigning a unique identifier when a node joins the system (e.g., a UUID or incrementing integer). This ID survives process restarts but requires coordination for assignment.

IP address + port: Using a network location as an identifier. This works for long-running services, but assumes stable addresses.

The challenge with process IDs is reincarnation: if a process crashes and restarts, should it be treated as the “same” process or a new one? Most systems treat it as a new process with a new ID to avoid causality violations from lost state. This is why vector clock implementations often use persistent node IDs rather than transient process IDs.

In practice, systems like Cassandra use node IDs, while version control systems like Git use commit hashes (which implicitly encode unique identifiers).

Applications of Vector Clocks

Vector clocks are used in:

Distributed databases: Dynamo, Riak, and some eventually consistent key-value stores use version vectors (a variant of vector clocks) to detect conflicting writes. If two replicas have concurrent versions of a record, the system knows a conflict exists and can invoke application-specific resolution logic.

Version control: Systems like Git and Mercurial use related concepts. Each commit knows its ancestors. If one commit’s ancestry includes another, one supersedes the other. If not, they are on divergent branches (concurrent).

CRDTs: Conflict-free replicated data types use vector clocks to determine when updates can be applied or merged automatically.

Causal consistency: Database systems providing causal consistency use vector clocks to ensure that if a read sees a write, it also sees all writes that causally precede that write.

Debugging: Distributed tracing systems use vector clock ideas to reconstruct causality from logged events across multiple services.

The Scalability Problem

The main drawback of vector clocks is the space they require. Each process maintains O(n) state, where n is the number of processes. In a system with thousands or millions of nodes, this becomes impractical.

For systems with many processes but limited concurrent activity, the tuple-based implementation helps: you only track processes you have actually communicated with, not all possible processes in the system. A node in a 10,000-node cluster might only have vector entries for the 50 nodes it has recently interacted with.

However, in truly massive systems or systems requiring very frequent synchronization across many nodes, even optimized vector clocks can be impractical. This is one reason why modern distributed databases increasingly adopt hybrid logical clocks, which provide O(1) space overhead while still capturing causality.

Advanced Topic: Matrix Clocks

For specialized applications requiring common knowledge tracking, matrix clocks extend vector clocks by maintaining what each process knows about every other process’s knowledge. Process P_i maintains an n×n matrix where M[i][j] represents “how many events I know that P_j has executed.”

Matrix clocks enable determining when all processes have seen an event, which is useful for distributed garbage collection and checkpointing. However, they require O(n²) space per process and are rarely used in practice. Vector clocks suffice for nearly all distributed systems.

Hybrid Logical Clocks

Hybrid Logical Clocks (HLC), proposed by Demirbas and Kulkarni in 2014, bridge the gap between physical and logical time.

The Motivation

Physical clocks provide real-world time but can drift apart or jump forward and backward. Logical clocks provide perfect causality but present sequence numbers rather than wall-clock time, and are thus disconnected from real time.

Many applications need both:

A database wants causal consistency (logical clocks) but also wants to answer “give me the data as of 2:00 PM yesterday” (physical time)
A monitoring system wants to correlate events across services using physical timestamps, but also needs causal ordering

HLC provides both properties in a single timestamp.

The Structure

An HLC timestamp consists of two components:

L (logical component) tracks the maximum physical time the process has seen—either from its own clock or from timestamps in received messages. Think of L as “the latest time I know about.”

C (counter) distinguishes events that occur within the same L value. When time advances (L increases), C resets to 0. When time stays the same (L unchanged), C increments.

Together, (L, C) acts like a logical clock but stays close to physical time.

How HLC Works

On local events: If your physical clock has advanced beyond L, update L to match and reset C to 0. Otherwise, keep L unchanged and increment C.

On receiving a message: Take the maximum of your current L, the message’s L, and your physical clock. That becomes your new L. The counter C updates based on which value “won”:

If both old and message L won: set C = max(C_old, C_msg) + 1
If only old L won: increment your counter (C = C_old + 1)
If only message L won: adopt message counter (C = C_msg + 1)
If physical clock won: reset to 0

This keeps L within clock synchronization error ε of true physical time while preserving the happened-before property: if event a happened-before event b, then (L, C) at a is lexicographically less than (L, C) at b.

The Algorithm (formally)

Let P denote the current physical clock reading. Each node maintains an HLC timestamp (L, C).

On any local event (including send)

Read the physical clock: P = now()
If P > L: set L = P and C = 0
Otherwise: increment the counter C = C + 1 (leave L unchanged)

When receiving a message with timestamp (L_msg, C_msg)

Read the physical clock: P = now()
Let L_old and C_old be the local values before the update
Set L = max(L_old, L_msg, P)
Update C:
- If L == L_old and L == L_msg: set C = max(C_old, C_msg) + 1
- Else if L == L_old and L > L_msg: set C = C_old + 1
- Else if L == L_msg and L > L_old: set C = C_msg + 1
- Else if P > L_old and P > L_msg: set C = 0

This keeps L close to physical time while guaranteeing monotonic, causality-respecting timestamps; C distinguishes events that share the same L.

The key insight is that when the physical clock P is greater than both the old logical clock and the received logical clock, time has advanced naturally, and we reset C to 0. This keeps the counter bounded.

Properties

HLC preserves the happened-before property: if \(a \rightarrow b\), then HLC(a) < HLC(b) (comparing tuples lexicographically).

HLC timestamps stay close to physical time. If clocks are synchronized to within \(\varepsilon\) (using NTP), then l is within \(\varepsilon\) of true physical time. This allows time-based queries while maintaining causality.

Applications of Hybrid Logical Clocks

HLC is used in modern distributed databases:

CockroachDB: Uses HLC for transaction timestamps. This provides serializable isolation without atomic clocks.

MongoDB: Uses hybrid timestamps for multi-version concurrency control (MVCC) in its replication protocol.

YugabyteDB: Uses HLC to timestamp transactions and maintain causal consistency.

The appeal is that HLC provides strong consistency guarantees without an approach like Google’s (which we’ll cover later), which requires waiting and atomic clocks. It works with commodity servers running NTP.

The Trade-off

HLC is more complex than pure logical clocks. The timestamp is larger (two components instead of one). The algorithm has more cases to handle.

But for distributed databases that need both causal consistency and time-based queries, HLC is an elegant solution. It combines the best properties of physical and logical time in a practical, deployable system.

Choosing a Logical Clock

Different applications need different logical clocks:

Use Lamport timestamps when:

You only need a total ordering of events
You do not need to detect concurrency
Space efficiency matters (one integer per event)
Examples: distributed mutual exclusion, totally ordered multicast

Use vector clocks when:

You need to detect concurrent conflicting updates
Causality is important for correctness
The number of processes is moderate (dozens to hundreds)
Examples: replicated databases with conflict resolution, CRDTs, version control

Use hybrid logical clocks when:

You need both causality and approximate real time
You want time-based queries with causal consistency
You can tolerate slightly larger timestamps
Examples: distributed databases with MVCC, systems requiring both consistency and timestamped logs

For most distributed systems, the choice is between vector clocks (when you need concurrency detection) and hybrid logical clocks (when you also need correlation with real time). Lamport timestamps provide the conceptual foundation but are rarely used directly in modern systems.

Summary

Logical clocks provide event ordering based on causality rather than physical time. This solves problems where what matters is potential causal influence, not absolute timestamps.

Lamport timestamps use a single counter per process, ensuring causally ordered events have increasing timestamps. They provide a total ordering but cannot detect concurrency. While rarely used directly in production, Lamport’s happened-before relationship is the foundation for all logical clock mechanisms.

Vector clocks use a vector of counters (one per process) to fully capture causal relationships. By comparing vectors component-wise, systems can determine if events are causally ordered or concurrent. In practice, vector clocks are implemented as sets of (processID, counter) tuples to handle dynamic systems. Vector clocks are widely used in replicated databases, CRDTs, and version control systems.

Hybrid logical clocks combine physical time and logical counters, providing both causal ordering and timestamps close to wall-clock time. Modern distributed databases like CockroachDB, MongoDB, and YugabyteDB use HLC to provide strong consistency without specialized hardware.

Understanding logical clocks is essential for building distributed systems that need to order events, detect conflicts, and maintain consistency without relying on perfectly synchronized physical clocks.

Logical Clocks

Why Physical Timestamps Are Not Enough

The Happened-Before Relationship

Lamport Timestamps

The Algorithm

Creating a Total Order

The Limitation

Where Lamport Timestamps Appear

Vector Clocks

The Idea

The Algorithm

Comparing Vector Clocks

An Analogy/Example: Group Lunch

Vector Clocks in Actual Use: Sets of Tuples

Process Identification in Distributed Systems

Applications of Vector Clocks

The Scalability Problem

Hybrid Logical Clocks

The Motivation

The Structure

How HLC Works

The Algorithm (formally)

Properties

Applications of Hybrid Logical Clocks

The Trade-off

Choosing a Logical Clock

Summary

Next: Clock Synchronization Study Guide

Back to CS 417 Documents