Skip to content

Fast-path progressbar: ~24 ns/iter (was ~254), behavior-preserving#316

Open
wolph wants to merge 3 commits into
developfrom
fast-progressbar
Open

Fast-path progressbar: ~24 ns/iter (was ~254), behavior-preserving#316
wolph wants to merge 3 commits into
developfrom
fast-progressbar

Conversation

@wolph

@wolph wolph commented Jun 22, 2026

Copy link
Copy Markdown
Owner

Fast-path progressbar: ~24 ns/iter (was ~254)

Makes wrapping a loop with progressbar2 an order of magnitude cheaper, without changing observable behavior.

Result — per-iteration overhead (the headline metric)

library ns/iter
rich 19
progressbar2 24 (was 254)
tqdm 56
alive-progress 247
click 1878

progressbar2 goes from mid-pack to 2nd-fastest — ~11× faster than before, ~2.3× faster than tqdm, and within a few ns of rich. (Beating rich outright is the goal of a planned optional native extension, deliberately scoped as a separate follow-up.)

How

An integer "next-update gate": the common iteration is just value += 1; self.value = value; if value >= next_update: update(); yield. The expensive redraw machinery (clock read + widget formatting) only runs when a redraw could actually happen (rate-limited to ~20×/sec). The gate calibrates from a real timing measurement and self-corrects via a tqdm-style closed loop.

  • Iterator path: __iter__ rewritten as a single peek-first generator; the shortcuts.progressbar wrapper generator is collapsed away.
  • Manual path: update() / += skip the per-call clock read below the gate threshold.
  • The gate only ever skips iterations — whenever it enters the slow path, the unchanged _needs_update() makes the real redraw decision. It can never force a wrong redraw, only defer a check.

Correctness & backward compatibility

  • Public API unchanged. bar.value stays live every iteration (== current item index). Same widgets, same redraw cadence.
  • previous_value now tracks the value at the last redraw (needed for the pixel check once intermediate update()s are skipped).
  • A PROGRESSBAR_DISABLE_FASTPATH env var (and min_poll_interval=0) reverts to the original per-iteration behavior.
  • Equivalence tests assert the gated bar draws the same frame cadence as an ungated bar (catch any dropped redraw); developed TDD-first against characterization tests of the pre-change behavior.
  • Full suite: 471 passed, 100% branch coverage. pyright clean; zero new mypy errors.

Benchmark + regression guard

  • benchmarks/ — reproducible suite (bench.py + report.py) pitting progressbar2 against tqdm/rich/alive-progress/click across iteration overhead, forced-render cost, and import time, all rendered to a real pseudo-terminal.
  • tests/test_perf_budget.py + a CI job — fail the build if per-iteration overhead regresses toward the old regime.
  • README documents the claim with the generated chart.

Secondary (honest scope notes)

  • Render cost: 28.5 → 25.7 µs/update via a safe no_color/len_color fast path (skip the ANSI-strip regex on plain text). Getting under tqdm's ~11 µs needs deeper widget-render rework (deferred; negligible in real use at ~20 redraws/sec).
  • Import time: investigated and left unchanged — measurement shows the ~45 ms is dominated by the always-needed python_utils dependency (asyncio + typing_extensions ~30 ms), not by anything lazy-loadable here. Reducing it needs trimming the python_utils dependency or an upstream fix (separate effort).

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings June 22, 2026 01:23
@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

Comment thread benchmarks/bench.py Fixed
Comment thread benchmarks/bench.py Fixed
Comment thread tests/test_fastpath.py Fixed

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a performance optimization ("fast-path" gate) to progressbar2 to reduce per-iteration overhead, along with benchmarks, reports, and tests to verify performance and correctness. The review feedback identifies two critical issues: first, _gate_step can grow exponentially before calibration on fast loops, causing severe performance degradation; second, changing the semantics of the public previous_value attribute breaks backward compatibility, which should be resolved by introducing a private _last_drawn_value attribute instead. Additionally, a test update is suggested to align with the calibration fix.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread progressbar/bar.py Outdated
Comment thread progressbar/bar.py Outdated
Comment thread tests/test_fastpath.py Outdated

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a “next-update gate” fast path in ProgressBar to dramatically reduce per-iteration overhead by skipping expensive redraw checks until a computed threshold is reached, while aiming to preserve redraw cadence and public API behavior. It also adds performance regression tests/CI coverage and a reproducible benchmark suite plus documentation of the performance claim.

Changes:

  • Implement a calibrated integer gate in progressbar/bar.py to avoid per-iteration clock reads/redraw predicate work on the common path (iterator + manual update() paths).
  • Add characterization/equivalence tests and a CI “performance budget” test to prevent regressions in iterator-wrap overhead.
  • Add benchmarks tooling/artifacts and document the performance results in README.rst.

Reviewed changes

Copilot reviewed 15 out of 16 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
progressbar/bar.py Adds gated fast-path logic in __iter__ and update(), plus gate calibration/recompute machinery and env-based disabling.
progressbar/shortcuts.py Collapses the shortcut wrapper generator layer by returning the bar’s iterator directly.
progressbar/utils.py Adds a fast path in no_color to skip regex work when no ANSI escape is present.
tests/test_fastpath.py Adds extensive behavioral parity and fast-path characterization tests (iterator + manual update paths).
tests/test_perf_budget.py Adds a tight perf regression guard for iterator-wrap overhead (skips assertion under coverage tracing).
tests/conftest.py Adds a no_freezegun marker escape hatch so timing-dependent perf tests run with real clocks/intervals.
benchmarks/bench.py Adds a reproducible pty-based benchmark runner comparing progressbar2 vs alternatives.
benchmarks/report.py Adds chart/report generation from benchmark results.
benchmarks/requirements.txt Pins benchmark-only dependencies.
benchmarks/results.json Adds a captured benchmark results snapshot.
benchmarks/report.md Adds a generated benchmark report snapshot.
README.rst Documents the performance claim and how to reproduce benchmarks.
.github/workflows/main.yml Adds a dedicated CI job to enforce the iterator overhead performance budget.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/test_fastpath.py Outdated
@wolph wolph force-pushed the fast-progressbar branch from 2f3aa7d to 7a0e2d2 Compare June 22, 2026 01:43
@wolph

wolph commented Jun 22, 2026

Copy link
Copy Markdown
Owner Author

Review resolution summary

All inline review comments have been addressed and resolved. Recording the resolution here since the bot summary reviews above were written before the fixes and don't auto-update.

Finding Reviewer Resolution
_gate_step could grow exponentially before calibration gemini (high) The back-off doubling is now elif self._gate_calibrated: — it never runs before a real timing measurement, and the gate isn't even consulted pre-calibration. Regression test test_recompute_gate_no_backoff_before_calibration added.
Public previous_value semantics / backward compatibility gemini (high) previous_value keeps its original meaning (the value before the current update() call) and is now updated on every iteration of for x in bar too — verified byte-identical to the pre-gate implementation on the manual path, the iterator path, increment()/+=, and with the fast path disabled. The gate uses a separate private _last_drawn_value for its pixel check, so no public attribute changed meaning. A per-iteration previous_value assertion was added to the liveness test.
Back-off test should set _gate_calibrated=True gemini (medium) Done, plus a complementary no-growth-before-calibration test.
gen.__qualname__ is brittle Copilot Now compares gen.gi_code is ProgressBar.__iter__.__code__.
Empty except without comment (×2) CodeQL Explanatory comments added to the pty-teardown except clauses.
Module imported with both import and import-from CodeQL Test module no longer mixes import styles (uses progressbar.bar).

Backward compatibility: byte-identical value/previous_value on every iteration; same widgets and same redraw cadence; an env var (PROGRESSBAR_DISABLE_FASTPATH) fully reverts to the original path.

Performance (updated after the back-compat hardening): ~31 ns/iter wrapping a loop (was ~254), ~1.8× faster than tqdm, second only to rich. Full suite green at 100% branch coverage; ruff and pyright clean; CI green.

🤖 Generated with Claude Code

@wolph wolph force-pushed the fast-progressbar branch 2 times, most recently from 9fd1a66 to 433062a Compare June 23, 2026 13:49
Wrapping a loop with progressbar2 dropped from ~254 ns/iter to ~31 ns
(~11x faster, ~1.8x faster than tqdm, 2nd only to rich), with no change
to observable behavior.

How: an integer "next-update" gate. The common iteration is just an
increment, a compare, and the value/previous_value liveness stores; the
expensive redraw machinery (clock read + widget formatting) only runs at
rate-limited crossings (~20x/sec). The gate calibrates _gate_step from a
real timing measurement and self-corrects via a tqdm-style closed loop,
so it can only skip iterations, never force a wrong redraw. The iterator
path is a single inlined generator (the shortcut wrapper layer is
collapsed); the manual update()/+= path skips its per-call clock read
below the threshold.

Backward compatibility:
- Public API unchanged; bar.value and previous_value stay byte-identical
  to the pre-gate behavior on every iteration.
- Same widgets, same redraw cadence, same finish/break/exception handling.
- PROGRESSBAR_DISABLE_FASTPATH (and min_poll_interval=0) revert to the
  original per-iteration path.

Also:
- Reproducible benchmark suite (benchmarks/) vs tqdm/rich/alive-progress/
  click, all rendered to a real pseudo-terminal; documented in README.
- CI per-iteration performance budget guard (machine-independent ratio)
  to prevent regressions.
- no_color/len_color skip the ANSI-strip regex on plain text (cuts the
  forced-redraw render cost).
- Full suite green at 100% branch coverage; ruff and pyright clean.
@wolph wolph force-pushed the fast-progressbar branch from 433062a to c89cb10 Compare June 23, 2026 16:06
Comment thread tests/test_native_accelerator.py Fixed
@wolph wolph force-pushed the fast-progressbar branch from a6dc76c to 4f97f5b Compare June 23, 2026 23:20
… rich

`ProgressBar.__iter__` now dispatches to `speedups.progressbar.FastBarIterator`
(the `progressbar2[fast]` extra) when it is importable, falling back to the
pure-Python generator otherwise. The native iterator counts items in a C field
and only calls back into Python at redraw crossings via a small protocol
(`_fast_begin`/`_fast_tick`/`_fast_end`/`_fast_end_dirty`), reusing the existing
gate/redraw/calibration machinery so the redraw cadence is identical. The only
behavioural difference is that `value`/`previous_value` are synced at crossings
rather than every iteration, so reads between redraws lag slightly (like
tqdm.n); `PROGRESSBAR_DISABLE_FASTPATH=1` forces the pure-Python path.

This makes progressbar2 the fastest progress bar measured: ~5 ns/iter vs rich
19, tqdm 55. Pure Python stays ~30 ns (no native build), still ~1.8x faster
than tqdm and 2nd to rich.

Also:
- hoist `_gate_enabled` to a local in the pure-Python iterator (free, no
  behaviour change), trimming the fallback hot path a few ns.
- conftest `disable_native_accelerator` autouse fixture forces the pure-Python
  path for the rest of the suite; native behaviour is covered explicitly in
  tests/test_native_accelerator.py (dispatch + hooks covered without the
  compiled package via a fake/direct calls, so CI stays at 100% coverage; real
  end-to-end equivalence + issue #212 break/exception cleanup tests run where
  speedups is installed).
- refresh benchmark artifacts + README performance section.
@wolph wolph force-pushed the fast-progressbar branch from 4f97f5b to c2308c2 Compare June 23, 2026 23:42
Import: with the companion python_utils lazy-import change, `import progressbar`
no longer eagerly pulls in asyncio or typing_extensions, dropping cold import
from ~48ms to ~24ms (net of interpreter startup) -- on par with tqdm/click and
roughly half of rich. (Requires the python_utils release that defers those
imports; progressbar itself imports python_utils lazily where it can.)

Render: FormatLabel.__call__ no longer wraps every mapping entry in a
contextlib.suppress on the redraw hot path -- a missing key (the only common
miss) is tested directly and only the value transform is guarded. The bulk of
the forced-per-update render cost (~24us) is inherent to the richer default
widgets (gradient bar, time widgets), so this is a modest trim, not a headline.

Benchmark artifacts + README refreshed: import ~24ms, iteration ~5ns (fastest),
forced render ~24us.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants