Skip to content

[BUGFIX] Ingester: release TSDB appender on early-return paths in Push#7528

Open
sandy2008 wants to merge 3 commits into
cortexproject:masterfrom
sandy2008:fix-ingester-push-appender-leak
Open

[BUGFIX] Ingester: release TSDB appender on early-return paths in Push#7528
sandy2008 wants to merge 3 commits into
cortexproject:masterfrom
sandy2008:fix-ingester-push-appender-leak

Conversation

@sandy2008
Copy link
Copy Markdown
Contributor

@sandy2008 sandy2008 commented May 20, 2026

What this PR does

Ingester.Push acquires a TSDB appender via app := db.Appender(ctx) at the top of the request-processing block, then on certain early-return paths returns without calling Commit or Rollback. Concrete leak path the issue this PR targets: the isLabelSetOutOfOrder check at pkg/ingester/ingester.go:1454-1456 returns before Append/Commit, leaving the appender dangling. The two existing app.Rollback() blocks inside the per-sample / per-histogram failure loops would still leak if a future return path were added between them and Commit.

Symptom: monotonic growth of cortex_ingester_tsdb_head_active_appenders plus retained head series references, mmap'd chunks, and pending appender state per leaked request.

Fix: register a defer immediately after the appender is acquired that calls app.Rollback() unless a local committed flag is true. committed is flipped to true immediately before app.Commit() because Prometheus's headAppender.Commit closes the appender even on failure (it self-rolls-back on WAL error) — calling Rollback again would produce a spurious ErrAppenderClosed warn log. The two now-redundant explicit app.Rollback() calls inside the loop are removed in favor of the single defer.

This was found as part of a wider audit of CPU / memory / goroutine leaks in this codebase.

Which issue(s) this PR fixes

Fixes #7530

Checklist

  • CHANGELOG.md updated — [BUGFIX] Ingester: entry under master / unreleased. PR number will be added once this PR is assigned one.
  • Documentation updated — not applicable, no flags or config changed (make doc not required).
  • Tests added: TestIngester_Push_OutOfOrderLabels_AppenderNotLeaked asserts cortex_ingester_tsdb_head_active_appenders == 0 after repeated out-of-order-label-set pushes (the failing-without-fix behaviour was verified by reverting only the defer and observing cortex_ingester_tsdb_head_active_appenders = 5 after 5 leaky pushes).

Test plan

  • go build -tags "netgo slicelabels" ./pkg/ingester/... — clean
  • go vet -tags "netgo slicelabels" ./pkg/ingester/... — clean
  • gofmt -l / goimports -local github.com/cortexproject/cortex -l — clean
  • go test -tags "netgo slicelabels" -run "^TestIngester_Push" ./pkg/ingester/... -count=1 -timeout 300s — PASS
  • go test -race -tags "netgo slicelabels" -run "TestIngester_Push|TestIngester_OutOfOrder|TestIngester_Append" ./pkg/ingester/... -count=1 -timeout 600s — PASS, 0 data races
  • go test -tags "netgo slicelabels" -run TestIngester_Push_OutOfOrderLabels_AppenderNotLeaked ./pkg/ingester/... -count=3 — PASS x3 (no flakiness)

🤖 Generated with Claude Code

Push acquired a TSDB appender at the start of the function and, on early
returns (e.g. out-of-order label set, hard append errors, Commit failure),
could return without calling Commit or Rollback. This leaked TSDB head
series references, mmap'd chunks, and pending appender state, observable
as monotonic growth of cortex_ingester_tsdb_head_active_appenders.

Defer Rollback right after the appender is acquired and skip it on the
success path. `committed` is flipped to true before app.Commit() because
Prometheus closes the appender even on Commit failure (internal
self-rollback on WAL error), so the deferred Rollback must not run
afterwards. The two now-redundant explicit Rollback calls inside the
sample / histogram failure paths are removed and rely on the same defer.

Add a regression test asserting cortex_ingester_tsdb_head_active_appenders
stays at 0 after repeated out-of-order-label-set pushes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Sandy Chen <Yuxuan.Chen@morganstanley.com>
@dosubot dosubot Bot added component/ingester go Pull requests that update Go code type/bug labels May 20, 2026
@siddarth2810
Copy link
Copy Markdown
Contributor

Hi, might be better to open an issue first and get maintainer confirmation before making code changes

Address `make check-modernize` lint failure by using Go 1.22+ `for range N`
syntax instead of `for n := 0; n < numLeakyPushes; n++`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Sandy Chen <Yuxuan.Chen@morganstanley.com>
@sandy2008
Copy link
Copy Markdown
Contributor Author

Hi, might be better to open an issue first and get maintainer confirmation before making code changes

@siddarth2810 made the issue: #7530

Copy link
Copy Markdown
Member

@friedrichg friedrichg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@friedrichg friedrichg requested a review from Copilot May 21, 2026 12:48
@dosubot dosubot Bot added the lgtm This PR has been approved by a maintainer label May 21, 2026
Comment thread CHANGELOG.md Outdated

This comment was marked as low quality.

Co-authored-by: Friedrich Gonzalez <1517449+friedrichg@users.noreply.github.com>
Signed-off-by: Sandy Chen <sandy19890604@gmail.com>
@sandy2008
Copy link
Copy Markdown
Contributor Author

LGTM

@friedrichg thx, updated CHANGELOG.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component/ingester go Pull requests that update Go code lgtm This PR has been approved by a maintainer size/M type/bug

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Ingester: TSDB appender leaked on early-return paths in Push (out-of-order label sets etc.)

4 participants