Skip to content

perf(contests): partial index for shadowban CTE on aggregate_user.score#813

Closed
dylanjeffers wants to merge 1 commit into
mainfrom
dylan/aggregate-user-score-partial-idx
Closed

perf(contests): partial index for shadowban CTE on aggregate_user.score#813
dylanjeffers wants to merge 1 commit into
mainfrom
dylan/aggregate-user-score-partial-idx

Conversation

@dylanjeffers
Copy link
Copy Markdown
Contributor

Problem

/v1/events/remix-contests?limit=12&offset=0&status=all is timing out / hanging the contests page in production.

Measured from my machine against api.audius.co (presumably hitting whatever replica is cold per request):

Call First (cold) Subsequent (warm)
status=all 24.7s 0.12s
status=ended 24.0s 0.12s
status=active 1.96s 0.08s

The contests page lands cold and hangs for ~22s.

Root cause

The shadowban filter PR (#803, api/v1_events_remix_contests.go:46-47) added:

WITH
  low_abuse_score AS (
    SELECT user_id FROM aggregate_user WHERE score < 0
  )
SELECT ...
WHERE e.user_id NOT IN (SELECT user_id FROM low_abuse_score)

aggregate_user has one row per user (millions of rows). There is no index covering score — only idx_aggregate_user_follower_count (user_id, follower_count) exists (ddl/migrations/...0126). So the CTE runs a full sequential scan of aggregate_user on every cold call, then the pages stay in shared_buffers and warm calls are fast.

The same CTE is used in v1_event_comments, v1_fan_club_feed, v1_track_comments, v1_track_comment_count — all pay the same cost on cold cache. Contests is hit hardest because status=all/status=ended keep most events past the WHERE filter and the sort then forces a per-row LATERAL entry_count count, but the seq scan is the dominant fixed cost.

Fix

Add a partial index on the score < 0 predicate so the planner can resolve the shadowban set with an index scan instead of touching the heap for non-shadowbanned rows:

create index concurrently if not exists idx_aggregate_user_score_negative
    on aggregate_user (user_id)
    where score < 0;

Only a small fraction of users have score < 0 (shadowbanned only), so the partial form is dramatically smaller than a full btree on score — size budget is tens of KB.

CREATE INDEX CONCURRENTLY is used so the migration does not hold ACCESS EXCLUSIVE on aggregate_user. Following the 0197_playlists_albums_partial_idx.sql pattern: not wrapped in BEGIN/COMMIT, and idempotent via IF NOT EXISTS.

Test plan

  • Migration applies cleanly on staging
  • EXPLAIN ANALYZE on SELECT user_id FROM aggregate_user WHERE score < 0 shows Index Only Scan using idx_aggregate_user_score_negative instead of Seq Scan
  • Time /v1/events/remix-contests?status=all cold on staging — expect sub-second
  • Hit /v1/events/remix-contests, /v1/event-comments/..., /v1/fan-club-feed/... and confirm shadowbanned authors still excluded
  • Production rollout: keep an eye on aggregate_user write throughput; this index only updates on rows that flip score across zero, which is rare

🤖 Generated with Claude Code

….score

Cold-cache /v1/events/remix-contests?status=all takes ~22s end-to-end
(warm: ~100ms). The dominant cost is the sequential scan of
aggregate_user introduced by the shadowban filter:

  SELECT user_id FROM aggregate_user WHERE score < 0

aggregate_user has one row per user (millions), and only a small number
have score < 0, so a partial index covers the filter cheaply.

The same CTE is used in v1_event_comments, v1_fan_club_feed,
v1_track_comments, and v1_track_comment_count — all benefit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@dylanjeffers
Copy link
Copy Markdown
Contributor Author

Superseded by #814.

After discussion: we don't actually rely on aggregate_user.score for contest / comment moderation — the karma-muting CTE (muted_by_karma, sum of muters' follower_count crossing karmaCommentCountThreshold) already handles abuse filtering on comments, and remix-contest discovery can lean on the same signal. Removing the score < 0 filter entirely is cleaner than indexing around it: no extra index to maintain, fewer CTEs in every contest/comment query, same behavior in practice.

Leaving this open for now in case we want to fall back to the index approach, but the preferred fix is #814.

@dylanjeffers
Copy link
Copy Markdown
Contributor Author

Update on the supersession: the revised approach in #814 keeps the shadow-ban semantics — banned users are still filtered from contest discovery and comment streams — but sources the ban list from chat_ban.is_banned = true (the platform's user-ban table, already used by api/comms/validator.go) instead of aggregate_user.score < 0. chat_ban has a PK on user_id and the table is tiny, so the seq scan is gone without needing the partial index this PR proposes.

dylanjeffers added a commit that referenced this pull request May 15, 2026
`/v1/events/remix-contests?status=all` was hanging the contests page in
production with ~22s cold-cache calls (warm: ~100ms). PR #803's shadowban
filter added:

    low_abuse_score AS (SELECT user_id FROM aggregate_user WHERE score < 0)

`aggregate_user` has one row per user (millions of rows) and no index
covering `score`, so the CTE ran a full sequential scan on every cold
call. Pages then stuck in shared_buffers, which is why warm calls were
fast.

The same CTE is reused in v1_event_comments, v1_fan_club_feed,
v1_track_comments, v1_track_comment_count — all pay the same cost. The
contests endpoint hits hardest because status=all/status=ended keeps
most events past the WHERE filter and the sort then forces a per-row
LATERAL entry_count count, but the seq scan is the dominant fixed cost.

Fix: partial index on (user_id) WHERE score < 0. The shadowban set is a
tiny fraction of users, so the index is tens of KB. CREATE INDEX
CONCURRENTLY avoids holding ACCESS EXCLUSIVE on aggregate_user; the
migration follows the existing 0197_playlists_albums_partial_idx.sql
pattern (no BEGIN/COMMIT, IF NOT EXISTS for idempotency).

aggregate_user.score is the canonical shadowban signal — driven by the
AAO `anti_abuse_blocked_users` admin list and the AAO score formula, and
written back to aggregate_user.score by refresh_all_user_scores(). It is
the correct table for this filter; the only issue was the missing index.

Closes #813.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@dylanjeffers
Copy link
Copy Markdown
Contributor Author

Superseded by #814 — same partial-index fix, carried forward there. Closing this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant