Skip to content

block/blk-mq: use atomic_t for quiesce_depth to avoid lock contention on RT#802

Open
blktests-ci[bot] wants to merge 1 commit intolinus-master_basefrom
series/1090278=>linus-master
Open

block/blk-mq: use atomic_t for quiesce_depth to avoid lock contention on RT#802
blktests-ci[bot] wants to merge 1 commit intolinus-master_basefrom
series/1090278=>linus-master

Conversation

@blktests-ci
Copy link
Copy Markdown

@blktests-ci blktests-ci Bot commented May 6, 2026

Pull request for series with
subject: block/blk-mq: use atomic_t for quiesce_depth to avoid lock contention on RT
version: 6
url: https://patchwork.kernel.org/project/linux-block/list/?series=1090278

… on RT

In RT kernel (PREEMPT_RT), commit 6bda857 ("block: fix ordering
between checking QUEUE_FLAG_QUIESCED request adding") causes severe
performance regression on systems with multiple MSI-X interrupt
vectors.

The above change introduced spinlock_t queue_lock usage in
blk_mq_run_hw_queue() to synchronize QUEUE_FLAG_QUIESCED checks
with blk_mq_unquiesce_queue(). While this works correctly in
standard kernel, it causes catastrophic serialization in RT kernel
where spinlock_t converts to sleeping rt_mutex.

Problem in RT kernel:
- blk_mq_run_hw_queue() is called from IRQ thread context
- With multiple MSI-X vectors, all IRQ threads contend on
  the same queue_lock
- queue_lock becomes rt_mutex (sleeping) in RT kernel
- IRQ threads serialize and enter D-state waiting for lock
- Throughput drops from 640 MB/s to 153 MB/s

Solution:
Convert quiesce_depth to atomic_t and use it directly for quiesce
state checking, eliminating QUEUE_FLAG_QUIESCED entirely. This
removes the need for any locking in the hot path.

The atomic counter serves as both the depth tracker and the quiesce
indicator (depth > 0 means quiesced). This eliminates the race
window that existed between updating the depth and the flag.

Memory ordering is ensured by:
- smp_mb__after_atomic() after modifying quiesce_depth in
  blk_mq_quiesce_queue_nowait() and blk_mq_unquiesce_queue()
- smp_rmb() in blk_mq_run_hw_queue() before re-checking the
  quiesce state, paired with the writer-side barriers above

Performance impact:
- RT kernel: eliminates lock contention, restores full throughput
- Non-RT kernel: atomic ops are similar cost to the previous
  spinlock acquire/release, no regression expected

Test results on RT kernel:
Hardware: Broadcom/LSI MegaRAID 12GSAS/PCIe Secure SAS39xx
  (megaraid_sas driver, 128 MSI-X vectors, 120 hw queues)
- Before: 153 MB/s, IRQ threads in D-state
- After:  640 MB/s, no IRQ threads blocked

Suggested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Fixes: 6bda857 ("block: fix ordering between checking QUEUE_FLAG_QUIESCED request adding")
Cc: stable@vger.kernel.org
Signed-off-by: Ionut Nechita <ionut.nechita@windriver.com>
@blktests-ci
Copy link
Copy Markdown
Author

blktests-ci Bot commented May 6, 2026

Upstream branch: 6d35786
series: https://patchwork.kernel.org/project/linux-block/list/?series=1090278
version: 6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant