Agc improvements and improve gain control stability#882
Agc improvements and improve gain control stability#882UnknownSuperficialNight wants to merge 7 commits into
Conversation
…bility - Replace coefficient-based `attack/release` with direct `Duration` types - Reduce `RMS_WINDOW_SIZE` from `8192` to `512` samples to lower latency - Switch RMS calculation from mean-based buffer (`CircularBuffer`) to sum-of-squares approach in `CircularBufferRMS` for accurate root-mean-square values - Introduce `SlowDownState` struct that manages timing and caching: counts samples in 2ms blocks, computes adaptive `slowdown_factor` using `compute_slowdown_factor` and caches the result for reuse - Implement `fast_exp` using Horner's method for efficient exponential approximation of release coefficients (third-order Taylor polynomial) - Add `NaN` handling in RMS calculation to prevent invalid values - Add rate limiting to gain changes: clamp gain change per sample based on dynamic attack/release duration to prevent overshooting - Add new `peak_tracking_window` setting to control peak level smoothing - Tune default timing parameters: 500ms attack, 0.5ms release, 10ms peak tracking window for balanced behaviour
…calculation - Replace hardcoded `1.0` fallback with `self.current_gain` when `RMS` equals `0.0` - Add comment explaining this keeps gain stable or allows gradual decay instead of sudden drops
- Cap peak tracking at 1.0 to handle out-of-bounds decoder samples - Ensure samples from decoders that are not normalised like `libopus` do not track out-of-bounds values
- Cap rms tracking at 1.0 to handle out-of-bounds decoder samples - Ensure samples from decoders that are not normalised like `libopus` do not track out-of-bounds values
- Change `RMS_WINDOW_SIZE` constant from `512` to `1024` - 1024 samples provides ~23ms window at 44.1kHz / ~21ms at 48kHz for stable RMS estimation
| release_time: Duration::from_secs(0), // Recommended release time | ||
| absolute_max_gain: 7.0, // Recommended max gain | ||
| target_level: 1.0, // Default to original level | ||
| attack_time: Duration::from_millis(500), // Recommended attack time |
There was a problem hiding this comment.
This might be too low I found 500ms or 800ms to be quite nice would like some feedback on this is if possible
There was a problem hiding this comment.
Sorry I have no idea what works best for the new algorithm. For speech quiet fast was useful
yara-blue
left a comment
There was a problem hiding this comment.
I also like the idea for multiple profiles. Ideally we also give the "current" default a name, maybe "Music" and "Speech"?
| release_time: Duration::from_secs(0), // Recommended release time | ||
| absolute_max_gain: 7.0, // Recommended max gain | ||
| target_level: 1.0, // Default to original level | ||
| attack_time: Duration::from_millis(500), // Recommended attack time |
There was a problem hiding this comment.
Sorry I have no idea what works best for the new algorithm. For speech quiet fast was useful
Values outside of Long story short, we should deal with such values without clipping them. |
Any ideas on this? First thing that comes to mind though I could be wrong is something like this let full_scale = self.peak_level.max(1.0);
// Calculate max gain change per sample based on dynamic attack/release times
let max_attack_gain_change_per_sample = full_scale / (dynamic_attack_time * sample_rate);
let max_release_gain_change_per_sample = full_scale / (release_duration * sample_rate);Basically go through and compute a new max per sample and scale for that. Just throwing ideas out there. Would probably have to go through it all again possibly and remove the |
AGC maps input to the range [-1.0, 1.0]. To do so without clipping it needs the width of the input range. It can't look ahead to see what other samples will be emitted and thus what the peak is. All I can think of is to assume the input range to be unrealistically big, say: [-1.5, 1.5]? Is that unrealistically big?
Lets look at some extreme, what would happen if halfway through playback one single sample peaks really high, lets say 10.0? Would everything get quieter after that sample? |
|
Please excuse me responding a bit theoretically without recent study of the current implementation: The gain calculation should fundamentally be the ratio If the root cause is that the code assumes 1.0 as ceiling, then ideally we should remove those assumptions. |
I was thinking about a running maximum where we track each sample and if we get a sample that exceeds, we will replace the old running maximum with the new one. That was my original idea anyway.
Possibly I guess that would depend on the implementation maybe there is a peak decay after Though neither of these seem like a proper solution I must admit. |
This is what happens without the Though this could be an issue as we could get, for example, dips to My approach was to, by default, limit the gain to One thing we could do here is remove the
It would be ideal though. However, how would we scale the |
|
The ratio arithmetic handles 1.1 as naturally as 0.8; that's not the issue. The issue is pumping: the AGC attenuates correctly on a spike, then takes |
So in other words you think that it should allow spikes below the current gain but then right after bounce back into 1.0 range when we are not peaking? Into something more like this |
|
Not quite. The goal isn't to snap current gain back; it's to smooth the desired gain so it doesn't swing as wildly. Something like this (illustrative numbers): Current gain climbs gradually as the tracked peak decays without abrupt jumps. You might be able to just remove the |
|
Great to see that analysis. Yeah, I agree that the ramp-up should already work as needed, with fast attack and slow release. Without that The "artist's intention" argument assumes AGC is the only filter in the chain, but that's not a valid assumption. An If a user wants a ceiling at source level, wouldn't that already be the existing |
- Extract `fast_exp` function from `agc.rs` to `math.rs` - Export `fast_exp` as `pub(crate)` for reuse across the codebase - Update imports in `agc.rs` to use the shared `fast_exp`
Do you think we should bother with a true peak or just use peak as is, since it's good enough to double as true peak?
True.
True, it should and would make both use cases possible. Turns out I have already fixed it though what do you think about the default floor value at Personally, I prefer What do you think?
What do you think of that being an option just to make it more flexible and modularised? As |
Using peak as-is should be fine. True peak could be an option if users would know it, for example from ReplayGain metadata. I'm saying could be because this could also be moving into "You Ain't Gonna Need It" territory. Arguably, for music with ReplayGain, a limiter may be preferred over AGC.
What's your rationale for 1.0? Not knowing that, I'd lean 0.0. With
Could be interesting. At the same time I'm thinking if and how we could relate it to |
I use it for all my music and with the This comes down mostly to what is the most popular use case. Would people rather source as a default and manually change it otherwise, or are more people going to need it uncapped by default whichever is the more popular use case if say limit should be that. For me, I see my use case as a pretty popular one, but I may be wrong.
At that point its kind of is, but this is only for music that is mixed/mastered well for songs that are not it acts like an AGC giving headroom to adjust the gain up or down to maintain a consistent level.
I tried that with Originally, I made it If we had a lookahead buffer, this would be a good addition, as then we could see a peak coming and lower in time smoothly to not peak. |


This PR focusses mostly on adding stability to AGC through the
slowdown_factorand miscellaneous improvements.I've been experimenting with the AGC to find ways to stabilise it. This is the result.
The
compute_slowdown_factorfunctions as a third control layer that measures proximity to the target gain alongside standardRMSandpeakmetrics. It acts as a dynamic throttle, adjusting the AGC rate of change based on how close the signal is to the desired level. The slowdown logic activates only when the current gain falls within the combinedRMS+peaktolerance window relative to the target. When the input is loud, the tolerance window widens; with quieter signals, it contracts.Inside this boundary, exponential scaling prevents the harsh jumps and oscillations that occurred with fixed-rate adjustments. As the signal approaches the target, the slowdown increases to reduce the AGC rate of change and produce smoother behaviour. Outside this zone, the AGC uses normal responsiveness, which allows for more rapid correction when needed. The tolerance window is bounded by the combined
RMS+peakmetric.By managing these ranges, the system enables faster attack times without flattening audio dynamics. Previously, aggressive speeds would normalise all sounds to a flat line. Now the AGC can accelerate adjustments when far from the target but slows down exponentially as it approaches the goal. This preserves audio depth while maintaining stability: quick reactions when needed, with gradual stabilisation near the final level, preventing gain overshoot and sudden volume spikes that can occur with fixed-rate adjustments.
update_peak_levelOptimisationThis function was a performance hotspot due to per-sample allocation and branching. Previously, we computed a conditional coefficient for each sample: a fast attack coefficient (0.0) when the sample exceeded the peak, and a slow release coefficient otherwise.
I've replaced this with a branchless implementation that uses a fixed release_coefficient (which is always cached), eliminating the per-sample if branch and allocation.
Before (Slow, Branching + Allocation):
Other changes in this PR
CircularBufferRMSnow uses sum-of-squares internally and is cleaned up.div_or_fallbackhelper to safely divide by non-NaN, non-infinite, positive values.fast_exphelper using Horner's method forexp(x)approximation incompute_slowdown_factor.Benchmarks
Benchmarks before:
Benchmarks after the changes and redesign:
Concerns
The Libopus decoder can output samples above
1.0, such as1.1,1.064, and similar values, for bothRMSandpeakreadings depending on the track. This behaviour is not observed with the FLAC decoder.These out-of-range samples cause errors downstream, particularly when offsetting the current gain below
1.0while targeting1.0. I've added.min(1.0)to ensure the gain never exceeds the cap/limit for RMS and peak.The root cause is with the Libopus decoder, as far as I can tell, which should not output values above
1.0in the first place.This is probably worth investigating: is this behaviour by design in Libopus, or is there something wrong upstream of the effect?
Potential Improvements
2048for96kHz). This ensures the buffer remains consistent.Pseudocode Example:
AutomaticGainControlSettingsmight be a good idea.Video Comparison
Before:
before_normal.mp4
After:
after_normal.mp4
Before near the loudness limit
near_limit_before.mp4
After near the loudness limit
near_limit_after.mp4
Additional notes
This can be tuned back to how it worked originally if users preferred the more normalised sound.
It might even be worth adding a toggle for the slowdown then we can disable it.