Distributed Agreement
Mutual Exclusion
Paul Krzyzanowski
March 2, 2021
Goal: Allow processes to request and be granted exclusive access to named resources. The algorithms should be designed to avoid starvation and ensure that exactly one requesting process gets the resource even if multiple processes make requests concurrently.
Mutual exclusion is responsible for making sure that only one process or thread accesses a resource at a time. In non-distributed systems, it is accomplished through mechanisms such as semaphores and monitors and, at a low level, through instructions such as test-and-set locks. For higher-level objects, such as files, the operating system acts like a gatekeeper to disallow access if a file is locked by another process. None of these mechanisms work in processes that are distributed across networks of computers.
We need an algorithm that will allow processes to compete for access to a resource and ensure that at most one process will be granted that access. Such an algorithm needs to ensure that all processes that want a resource will eventually get a chance to get access to it. If this is not the case, starvation occurs, where a process will have to wait indefinitely for a resource.
The “resource” is an arbitrary name that all processes agree to use. It may refer to a critical section of code, a file, a field in a database, a physical device, or some network service. We do not care what the underlying component is but rather that we can get processes to agree on exclusive access to it.
A mutual exclusion algorithm allows a process to request access to this resource and be granted exclusive access to it. This is known as a lock. If the lock has a timeout associated with it, then it is known as a lease.
Two important properties that we need from a mutual exclusion algorithm (and other algorithms too, of course) are:
Safety: This simply means the algorithm does its job and only one process can hold a resource at a time.
Liveness: The algorithm should make progress and no process should wait forever for a message that will never arrive.
We also would like a less tangible property, fairness. Not only should the algorithm make progress but every process that wants a resource should get a fair chance to get it. For example, two processes cannot continuously hand the resource back and forth to each other, disallowing a third from being able to get it or having to wait an unduly long time for it.
Distributed algorithms fall into three categories.
A centralized approach uses a central coordinator that is responsible for granting access permissions.
A token-based approach allows a process to access a resource if it is holding a “token”; that is, it received an unsolicited message where ownership of the message allows the process to access the resource. Sending the message (token) means giving up the token and forfeiting access to the resource.
A contention-based algorithm is one where all processes coordinate together on deciding who gets access to the resource.
The centralized algorithm runs on a single server that accepts REQUEST messages for a resource. If nobody is using the resource, it responds with a GRANT message and places the request on a FIFO (first-in, first-out) queue of requests for that resource. If another process is using the resource, then the server does not respond with a GRANT but adds the request to the queue of requests.
When a process is done with a resource, it sends the server a RELEASE message. The server removes the process' ID from the queue and sends a GRANT message to the next process in the queue (if there is one).
The token ring algorithm creates a logical communication ring among the processes in the group, where each process communicates with a single neighbor. A message, called a token, is created for each resource. A process is not allowed to access the resource until it has the token. This token is passed from process to process along the ring. If a process receives a token and does not need to access that resource, it simply sends the token to the next process. Otherwise, it will hold on to the token until it is done with the resource.
Lamport’s mutual exclusion algorithm requires each process that wants a resource to send a timestamped request for the resource to all processes in the group and processes the message itself (effectively sending it to itself). The message contains the resource ID, the requesting process ID, and a totally-ordered (unique) Lamport timestamp. Receipt of each message is immediately acknowledged with a timestamped Reply message by every process.
Each process, including the sender, places the received message in a local priority queue that is sorted by the Lamport timestamp that is in each message. A process decides whether it can access the resource by checking whether its own request is the earliest (first) one in the queue of all requests that have been received. If this is the case, it can access the resource. When done, the process sends a release message to all members. The receipt of a release message causes each process, including the one that sent the message, to remove that process ID from the queue.
If a process now finds itself as the earliest process in the queue, it – and everyone else – knows that it is now that process' turn to access the resource.
Total order is important in Lamport’s mutual exclusion algorithm because it ensures a fair and deterministic way to resolve conflicts when multiple processes are requesting access to the same shared resource.
In a distributed system, processes may have different physical clocks, and network latency can cause messages to arrive out of order. Lamport’s algorithm uses logical clocks and timestamps to establish a total order of events in the system, independent of the physical clocks. Recall that total order with Lamport timestamps is a way of ensuring each timestamp is unique and can be achieved by treating a machine address and process ID as decimal components of the timestamp. This total order ensures that all processes have a consistent view of the order in which requests for the shared resource are made. Total ordering is important for three reasons:
First, total order provides a fair and deterministic way to resolve conflicts when multiple processes are trying to access the same shared resource. By ensuring that all processes have a consistent view of the order in which requests for the shared resource are made, the algorithm can prevent any process from never getting a chance to obtain the shared resource, a situation known as starvation.
Second, this total order guarantees that the system remains deadlock-free for the task of getting a lock on a resource. Deadlocks can occur when two or more processes are waiting for each other to release a resource, causing a circular dependency. By establishing a deterministic order in which processes gain access to the resource, Lamport’s algorithm eliminates the possibility of such a circular wait.
Lastly, total order maintains the safety property, ensuring that only one process can access the shared resource at a time. By enforcing a consistent order of requests, the algorithm can grant access to the resource in a sequential manner, guaranteeing mutual exclusion.
Lamport’s mutual exclusion algorithm summary: send a request message to everyone in the group and process the message yourself as well. Get a reply from everyone. Each process stores all request messages in a priority queue that is sorted by message timestamps. The process ID that finds itself at the head of the queue is allowed to access the resource. When it is done, it sends a release message to everyone, which causes all group members to remove it from the queue.
The Ricart & Agrawala algorithm, like Lamport’s, is also based on using reliable multicasts. A process that wants to access a resource sends a request to all other processes in the group and waits for all of them to respond. As with Lamport’s mutual exclusion algorithm, the request message contains the resource ID, process ID, and a unique Lamport timestamp.
If another process is currently using the resource, that process delays its response_reply until it is done. If process A and process B sent out requests concurrently (i.e., two systems want the same resource at approximately the same time), each of those systems compares the timestamp of the received request with that of its own request. If process A received a request from process B that is older (a lower timestamp) than the one it sent, then process A will give process B priority to access the resource by sending a reply message to process B . Otherwise, if process A has the earlier timestamp, it will queue the request from B and continue to wait for all replies to come in, sending a response to B (and any other processes who wanted the resource) only when it is done using the resource. The key point is that processes A and B will make the same comparison and exactly one will hold back on sending the response.
Ricart & Agrawala’s algorithm summary: send a request message to everyone in the group and wait for reply messages from everyone. Any process using the resource will delay sending a reply and queue it for sending after it is done with the critical
The Lamport and Ricart & Agrawala algorithms are similar in that they are both contention-based and truly distributed. Ricart & Agrawala’s requires fewer messages since there is no explicit release message that needs to be sent; acknowledgments are simply delayed. Neither are efficient compared with the centralized algorithm … but they are distributed.
There are many, many more mutual exclusion algorithms. These illustrated main categories. For instance, the Suzuki-Kasami algorithm adds a token to the Ricart & Agrawala algorithm to improve the performance to that of (N-1) requests and 1 reply for a group of N processes. Maekawa’s algorithm partitions the group into subgroups such that each subgroup has at least one process in common with another subgroup and avoids the need to communicate with all group members. It improves the bandwidth to sending between 3√𝑁 and 6√𝑁 messages.