Skip to content

Fix background worker pruning#507

Merged
msaroufim merged 1 commit into
mainfrom
fix-background-worker-pruning
Jul 1, 2026
Merged

Fix background worker pruning#507
msaroufim merged 1 commit into
mainfrom
fix-background-worker-pruning

Conversation

@msaroufim

@msaroufim msaroufim commented Jul 1, 2026

Copy link
Copy Markdown
Member

Summary

  • prune finished background worker tasks before counting worker capacity
  • keep worker loops alive when a submission task crashes and record failed job status
  • add regression coverage for stale worker pruning and early status update failure

Root cause

Production showed queued submissions with busy=0, active=24, and hundreds of enqueued jobs. Finished or failed worker tasks could remain in _workers, so autoscaling believed capacity was full and did not start replacement workers.

Validation

  • uv run --with pytest --with pytest-asyncio pytest tests/test_background_submission_manager.py -q
  • uv run --with pytest --with pytest-asyncio pytest tests/test_admin_api.py tests/test_background_submission_manager.py -q
  • uv run --with ruff ruff check src/libkernelbot/background_submission_manager.py tests/test_background_submission_manager.py --line-length 120
  • local Uvicorn smoke test against POST /submission/local-lb/A100/test using the real BackgroundSubmissionManager, with two injected dead worker tasks occupying max_workers; verified submissions reached succeeded and queue drained to zero

CI

  • lint: passing
  • unit-tests: passing
  • integration-tests-github: passing
  • integration-tests-modal: passing

@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  src/libkernelbot
  background_submission_manager.py 98-99, 109, 182-183, 187-197, 236-243, 255-257, 266-268, 271, 292-293
  utils.py
Project Total  

This report was generated by python-coverage-comment-action

@msaroufim msaroufim marked this pull request as ready for review July 1, 2026 04:26
@msaroufim msaroufim merged commit 625660a into main Jul 1, 2026
4 checks passed
@msaroufim msaroufim deleted the fix-background-worker-pruning branch July 1, 2026 04:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant