Explain deadlock detection and recovery in detail
Deadlock Detection and Recovery in Operating Systems
Deadlock occurs when a set of processes are permanently blocked because each one is holding resources while waiting for other resources held by others. In a detection-and-recovery approach, the operating system allows deadlocks to occur, periodically checks for them, and then takes action to recover. This is common in general-purpose systems where maximum resource utilization is desired.
Why Use Deadlock Detection?
- Prevention and avoidance can be overly restrictive and reduce throughput.
- Detection lets the system run with high utilization and resolves deadlocks only when they actually happen.
- Useful when resource requests are unpredictable and dynamic.
Key Ideas and Terminology
- Resource Allocation Graph (RAG): A directed graph showing which process holds or requests which resource.
- Wait-For Graph (WFG): A simplified graph with only processes as nodes; an edge Pi → Pj means Pi is waiting for a resource currently held by Pj.
- Available: Vector showing how many instances of each resource type are free.
- Allocation: Matrix showing how many resource instances are currently allocated to each process.
- Request (or Need): Matrix showing how many instances each process is currently requesting.
When to Run Deadlock Detection
- Periodically: Run the detector at fixed intervals.
- On demand: Trigger when CPU utilization drops, when processes wait too long, or when resource contention spikes.
- Per resource class: Some systems detect per subsystem (e.g., I/O vs. memory) for faster checks.
Detection with Single Instance per Resource Type (Using Wait-For Graph)
If every resource type has only one instance, detection reduces to cycle finding in the Wait-For Graph. A cycle implies deadlock.
- Build WFG: draw Pi → Pj if Pi is waiting for a resource currently held by Pj.
- Run a cycle-detection algorithm (DFS or topological reduction).
- If a cycle exists, all processes on the cycle are deadlocked.
// Detect cycle in Wait-For Graph (WFG) using DFS
DFS(node u):
mark u as visiting
for each v in Adj[u]:
if v is visiting: cycle found (deadlock)
else if v is unvisited: DFS(v)
mark u as visited
for each process p:
if p is unvisited: DFS(p)
if any DFS reported a cycle: deadlock present
Time complexity is roughly O(V + E), where V is the number of processes and E is the number of wait-for edges.
Detection with Multiple Instances per Resource Type (Matrix-Based Algorithm)
When each resource type may have multiple instances (e.g., several printers, memory frames), detect deadlock using a safety-like test.
- Let Available, Allocation, and Request represent the current state.
- Initialize:
- Work = Available
- Finish[i] = false for all processes that hold resources (Allocation[i] ≠ 0); otherwise true
- Find a process i such that:
- Finish[i] = false, and
- Request[i] ≤ Work (component-wise)
- If found, pretend i completes: Work = Work + Allocation[i]; set Finish[i] = true; repeat step 3.
- If no such i exists, then any process with Finish[i] = false is deadlocked.
// Multi-instance deadlock detection
Work = Available
for each process i:
Finish[i] = (Allocation[i] == 0) ? true : false
repeat:
found = false
for each process i:
if !Finish[i] and Request[i] <= Work:
Work = Work + Allocation[i]
Finish[i] = true
found = true
// stop if we cannot make progress
until found == false
// Processes with Finish[i] == false are deadlocked
Quick Examples
- Single-instance (WFG): P1 waits for a file locked by P2; P2 waits for a device held by P1. Cycle P1 → P2 → P1 means a deadlock.
- Multi-instance: If no process’s Request fits into Available and some processes still hold resources (Finish = false), those processes are deadlocked.
Deadlock Recovery Techniques
Once a deadlock is detected, the OS must break it. Common strategies:
1) Process Termination
- Abort all deadlocked processes: Fast but wastes work.
- Abort one process at a time (select a victim), re-run detection after each abort until the cycle is broken.
2) Resource Preemption
- Temporarily take resources from some processes and give them to others to resolve the wait cycle.
- Requires choosing:
- Victim resource/process to preempt
- Rollback method to return the victim to a consistent state
- Starvation control to ensure the same process is not always chosen
3) Rollback and Checkpointing
- Periodically checkpoint process state to stable storage.
- On deadlock, rollback one or more processes to a safe checkpoint (freeing resources) and restart them later.
Victim Selection: How to Choose What to Kill or Preempt
- Priority: Prefer to preempt/abort lower-priority processes.
- Cost of lost work: Avoid killing a process that has done a lot of computation or is near completion.
- Resources held: Prefer victims holding many or critical resources (to free more capacity).
- Rollback distance: Choose processes with recent checkpoints.
- Process type: Interactivity, user importance, or SLA requirements can influence choice.
- Starvation prevention: Use aging or rotation so the same process isn’t always the victim.
Comparing Recovery Methods
- Terminate all: Simple, but heavy loss of progress.
- Terminate one-by-one: Balanced; repeated detection adds overhead.
- Preempt resources: Preserves more work, but needs checkpointing and careful consistency handling.
Practical Tips for BTech CSE Students
- Use WFG cycle detection for single-instance resource models.
- Use the matrix-based detection for multi-instance resource models.
- After detection, apply victim selection with clear criteria to minimize cost and avoid starvation.
- Run detection at sensible intervals to balance overhead and responsiveness.
Summary
Deadlock detection identifies circular waits either by finding cycles in a Wait-For Graph (single-instance) or by simulating allocation using Available/Allocation/Request (multi-instance). After detection, the OS recovers by terminating processes, preempting resources, or rolling back to checkpoints. Careful victim selection and reasonable detection frequency ensure good performance and fairness in a real operating system.
