Use Harbor library for taskset downloads#1884
Conversation
ApprovabilityVerdict: Approved Mechanical refactor replacing subprocess-based Harbor CLI invocation with direct library function calls. Same parameters are passed; the change simplifies the code by removing CLI orchestration and relying on the library's native API. You can customize Macroscope's approvability policy. Learn more. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ee6d53fd9b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| registry_path = config.registry_path | ||
| if registry_path is not None and config.repo is None: | ||
| registry_path = registry_path.expanduser() | ||
| download_command( |
There was a problem hiding this comment.
Use an async-safe Harbor download path
On an uncached Harbor taskset this now invokes Harbor's Typer command in-process; that command dispatches through harbor.cli.utils.run_async, which calls asyncio.run(...). The v1 eval path calls env.taskset.load_tasks() from inside the already-running async run_eval, so first-time prime eval run/uv run eval for Harbor will raise RuntimeError: asyncio.run() cannot be called from a running event loop before any tasks load, whereas the previous subprocess-based download did not share the event loop.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
makingharbor.cli async safe introduces significant boilerplate
#1887
There was a problem hiding this comment.
i timed subprocess vs habor SDK
it's not that much of a difference
if you can report slow-down on your example, we can reconsider
Overview
Use Harbor's installed Python library directly for v1 taskset downloads, replacing the subprocess-based CLI execution introduced in #1882.
Details
ImportErrorwhen the optional dependency is unavailable.This is a focused follow-up stacked on #1882.
Note
Medium Risk
Changes the dataset fetch integration path for Harbor tasksets; behavior should match the CLI but errors and edge cases now depend on the library instead of wrapped subprocess output.
Overview
Harbor v1 taskset caching no longer shells out to the
harborCLI.dataset_dirnow calls Harbor’sharbor.cli.download.download_commandin-process with the same dataset/registry selectors (repo,registry_path,registry_url) and export options.The local
harbor_cli()helper and subprocess-based argv builder are removed. Missing the optional Harbor dependency fails at import time with anImportErrorpointing atuv sync --python 3.12 --extra harbor. Atomic cache publication and registry-specific cache keys are unchanged.Reviewed by Cursor Bugbot for commit ee6d53f. Bugbot is set up for automated code reviews on this repo. Configure here.
Note
Replace subprocess-based Harbor CLI calls with the Harbor Python library for taskset downloads
subprocessinvocations of the Harbor CLI with direct calls toharbor.cli.download.download_commandin taskset.py.harbor_clihelper and the localdownload_commandfunction are removed; the module-leveldownload_commandname now refers to the imported library function with a different signature.RuntimeErrorfrom subprocess output.ImportErrorimmediately if the Harbor library is not installed, rather than deferring the error to runtime.Macroscope summarized ee6d53f.