block/blk-mq: use atomic_t for quiesce_depth to avoid lock contention on RT#802
Open
blktests-ci[bot] wants to merge 1 commit intolinus-master_basefrom
Open
block/blk-mq: use atomic_t for quiesce_depth to avoid lock contention on RT#802blktests-ci[bot] wants to merge 1 commit intolinus-master_basefrom
blktests-ci[bot] wants to merge 1 commit intolinus-master_basefrom
Conversation
… on RT In RT kernel (PREEMPT_RT), commit 6bda857 ("block: fix ordering between checking QUEUE_FLAG_QUIESCED request adding") causes severe performance regression on systems with multiple MSI-X interrupt vectors. The above change introduced spinlock_t queue_lock usage in blk_mq_run_hw_queue() to synchronize QUEUE_FLAG_QUIESCED checks with blk_mq_unquiesce_queue(). While this works correctly in standard kernel, it causes catastrophic serialization in RT kernel where spinlock_t converts to sleeping rt_mutex. Problem in RT kernel: - blk_mq_run_hw_queue() is called from IRQ thread context - With multiple MSI-X vectors, all IRQ threads contend on the same queue_lock - queue_lock becomes rt_mutex (sleeping) in RT kernel - IRQ threads serialize and enter D-state waiting for lock - Throughput drops from 640 MB/s to 153 MB/s Solution: Convert quiesce_depth to atomic_t and use it directly for quiesce state checking, eliminating QUEUE_FLAG_QUIESCED entirely. This removes the need for any locking in the hot path. The atomic counter serves as both the depth tracker and the quiesce indicator (depth > 0 means quiesced). This eliminates the race window that existed between updating the depth and the flag. Memory ordering is ensured by: - smp_mb__after_atomic() after modifying quiesce_depth in blk_mq_quiesce_queue_nowait() and blk_mq_unquiesce_queue() - smp_rmb() in blk_mq_run_hw_queue() before re-checking the quiesce state, paired with the writer-side barriers above Performance impact: - RT kernel: eliminates lock contention, restores full throughput - Non-RT kernel: atomic ops are similar cost to the previous spinlock acquire/release, no regression expected Test results on RT kernel: Hardware: Broadcom/LSI MegaRAID 12GSAS/PCIe Secure SAS39xx (megaraid_sas driver, 128 MSI-X vectors, 120 hw queues) - Before: 153 MB/s, IRQ threads in D-state - After: 640 MB/s, no IRQ threads blocked Suggested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Fixes: 6bda857 ("block: fix ordering between checking QUEUE_FLAG_QUIESCED request adding") Cc: stable@vger.kernel.org Signed-off-by: Ionut Nechita <ionut.nechita@windriver.com>
Author
|
Upstream branch: 6d35786 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pull request for series with
subject: block/blk-mq: use atomic_t for quiesce_depth to avoid lock contention on RT
version: 6
url: https://patchwork.kernel.org/project/linux-block/list/?series=1090278