Distributed Transactions
Distributed deadlock
Paul Krzyzanowski
November 2, 2022
Goal: Determine if a transaction will wait for resources in such a manner as to create an indefinite wait, called deadlock. Find ways to ensure that this will not happen.
To ensure consistency, a transaction get locks for resources (e.g., objects, database rows, files) it needs to access. This ensures that other transactions will not modify the data it needs during its execution or read intermediate results. A deadlock occurs when there is a circular dependency on processes holding and requesting locks on resources.
Note that we will use processes and transactions interchangeably. A transaction is simply a process that can be aborted. That is, it can revert any modifications that it made before exiting.
The four conditions that must hold are:
- mutual exclusion: A resource can be held by at most one process.
- hold and wait: Processes that already hold resources can wait for another resource.
- non-preemption: A resource, once granted, cannot be taken away.
- circular wait: Two or more processes are waiting for resources held by one of the other processes.
Three approaches can be used for managing deadlock in distributed systems.
- Detection
- A centralized deadlock detection approach uses a central coordinator to manage a resource graph of processes and the resources they are using. Each time a process gets a lock or releases a lock on a resource, it sends a message to this coordinator (waiting-for or releasing). The coordinator builds a graph of all the processes and the resources they are holding or waiting for. This is called a wait-for graph. If a cycle is detected, in the graph then the coordinator knows a deadlock exists. In some cases, if release and waiting-for messages are received out of order, they can lead the coordinator to believe that there is a deadlock cycle when none really exists. In reality, the release message should have been processed first and would cause the deadlock to not happen. This condition is known as phantom deadlock.
- The Chandy-Misra-Haas distributed deadlock detection algorithm has a process send a probe message to a process that is holding a resource prior to waiting for the resource. The receiving process forwards the probe to every process that contains resources it is waiting for. This is called edge chasing. If the original process receives its own probe message then it knows that a dependency cycle, and hence deadlock, will exist if it waits for the resource it wants.
- Prevention
- Deadlock prevention requires the system to ensure that at least one of the conditions that is necessary for deadlock cannot occur. This can be implemented by making decisions based on the timestamps of each transaction competing for resources.
- The wait-die algorithm states that if a younger process is using the resource, then the older process (that wants the resource) waits. If an older process is holding a resource, then the younger process that wants the resource kills itself (that’s ok; transactions are designed to be restartable). This forces the resource utilization graph to be directed from older to younger processes, making cycles impossible.
- The wound-wait algorithm ensures that the graph flows from young to old and cycles are again impossible. an old process will preempt (kill) the younger process that holds a resource. If a younger process wants a resource that an older one is using, then it waits until the old process is done. Avoidance
- This requires knowing which locks will be needed by transactions and then managing resource allocation or transaction scheduling to ensure that deadlock will not happen. This requires a priori knowledge of which resources each transaction will need and when each transaction is scheduled to execute. As such, it is usually not a practical approach in distributed environments (it can be used in operating systems; the banker’s algorithm tries to find an execution sequence that will avoid deadlock … but we will not cover it here).
). One way in which this can be implemented is that a transaction will try to get all the locks it needs before it starts to execute. If it cannot get every one of those locks, it will abort rather than wait.