Skip to content

Make Harbor library downloads async-safe#1887

Draft
rasdani wants to merge 1 commit into
harbor-library-downloadfrom
harbor-library-download-safe
Draft

Make Harbor library downloads async-safe#1887
rasdani wants to merge 1 commit into
harbor-library-downloadfrom
harbor-library-download-safe

Conversation

@rasdani

@rasdani rasdani commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Keep Use Harbor library for taskset downloads #1884's in-process Harbor download path, but import Harbor lazily only when a download is needed.
  • Run Harbor's Typer download callback in a worker thread so its internal asyncio.run(...) does not collide with Verifiers' async eval/validate loop.
  • Capture Harbor's Rich console output and include it in the raised Verifiers RuntimeError on download failures.

Validation

  • uv run --frozen --python 3.12 --extra harbor async smoke: uncached harbor/hello-world download inside asyncio.run(...) loaded hello-world/hello-world.
  • uv run --frozen --python 3.12 --extra harbor selector-error smoke: repo + registry_url raised a Verifiers RuntimeError containing Harbor's own Error: --repo and --registry-url are mutually exclusive.
  • uv run --frozen --python 3.12 --extra harbor local registry smoke: registry_path=Path("~/harbor-registry-test-1884.json") expanded and loaded 3d_print_shop_t0 from general-agent@2026-06-25.
  • Fresh core uv run --frozen --python 3.12 python: import verifiers.v1.tasksets succeeds without Harbor installed.
  • Fresh core uv run --frozen --python 3.11 python: import verifiers.v1.tasksets succeeds without Harbor installed.
  • uv lock --check --python 3.11
  • uv lock --check --python 3.12
  • uv run --python 3.13 ty check verifiers
  • uv run --frozen --python 3.12 pre-commit run --files verifiers/v1/tasksets/harbor/taskset.py
  • git push pre-push hook: ruff check, ruff format, sync check, ty (ci parity)

Note

Make Harbor library downloads async-safe by running them in a dedicated thread

  • Harbor downloads in taskset.py are now executed in a separate thread via a new download_with_harbor helper, making them safe to call from async contexts.
  • The module no longer eagerly imports harbor.cli.download at import time; Harbor presence is validated at call time, raising ImportError with an install hint if missing.
  • Download failures now raise a RuntimeError with the exit code and captured console output for easier debugging.
  • Behavioral Change: code that previously caught ImportError at import time must now handle it at call time.

Macroscope summarized 636a96f.


Note

Medium Risk
Changes how and when Harbor is loaded and how download errors surface; behavior is localized to Harbor taskset caching but affects any async path that triggers uncached downloads.

Overview
Harbor dataset fetching no longer imports Harbor at module load time, so core Verifiers can import tasksets without the optional harbor extra until a download actually runs.

Downloads are routed through a new download_with_harbor helper that lazy-imports Harbor, runs download_command on a dedicated harbor-download thread (so Harbor’s internal asyncio.run does not clash with Verifiers’ async eval loop), and wraps failures in a RuntimeError that includes exit codes and captured Rich console output when available.

Reviewed by Cursor Bugbot for commit 636a96f. Bugbot is set up for automated code reviews on this repo. Configure here.

finally:
output = capture.get().strip()

thread = threading.Thread(target=run_download, name="harbor-download")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Medium harbor/taskset.py:149

download_with_harbor() starts a non-daemon Thread and blocks on thread.join(). If the caller is interrupted (e.g. Ctrl-C) during the join, the main thread unwinds and TemporaryDirectory.__exit__() deletes output_dir, but the background download thread keeps running because it is neither stopped nor a daemon — the process can hang on shutdown waiting for it to finish, and the worker races against deletion of the directory it's writing to. Marking the thread as daemon=True lets the process exit without waiting for it.

Suggested change
thread = threading.Thread(target=run_download, name="harbor-download")
thread = threading.Thread(target=run_download, name="harbor-download", daemon=True)
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file @verifiers/v1/tasksets/harbor/taskset.py around line 149:

`download_with_harbor()` starts a non-daemon `Thread` and blocks on `thread.join()`. If the caller is interrupted (e.g. `Ctrl-C`) during the join, the main thread unwinds and `TemporaryDirectory.__exit__()` deletes `output_dir`, but the background download thread keeps running because it is neither stopped nor a daemon — the process can hang on shutdown waiting for it to finish, and the worker races against deletion of the directory it's writing to. Marking the thread as `daemon=True` lets the process exit without waiting for it.

@rasdani rasdani marked this pull request as ready for review June 26, 2026 20:14

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 636a96f. Configure here.


thread = threading.Thread(target=run_download, name="harbor-download")
thread.start()
thread.join()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worker thread failures not surfaced

Medium Severity

download_with_harbor starts Harbor in a worker thread and only records failures caught as Exception from download_command. Uncaught errors in that thread (including setup before the inner try, or BaseException subclasses) are not stored in error, so after thread.join() the caller can return normally even though the download never completed successfully.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 636a96f. Configure here.

@macroscopeapp

macroscopeapp Bot commented Jun 26, 2026

Copy link
Copy Markdown

Approvability

Verdict: Needs human review

1 blocking correctness issue found. Unresolved review comments identify potential threading bugs: non-daemon thread causing shutdown issues and uncaught BaseException failures. These substantive concerns warrant human review before merging.

You can customize Macroscope's approvability policy. Learn more.

@rasdani rasdani marked this pull request as draft June 26, 2026 20:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant