Skip to content

Use Harbor library for taskset downloads#1884

Open
xeophon wants to merge 1 commit into
harbor-cli-downloadfrom
harbor-library-download
Open

Use Harbor library for taskset downloads#1884
xeophon wants to merge 1 commit into
harbor-cli-downloadfrom
harbor-library-download

Conversation

@xeophon

@xeophon xeophon commented Jun 26, 2026

Copy link
Copy Markdown
Member

Overview

Use Harbor's installed Python library directly for v1 taskset downloads, replacing the subprocess-based CLI execution introduced in #1882.

Details

  • Import Harbor's Python download entrypoint and raise a clear ImportError when the optional dependency is unavailable.
  • Delegate package detection, selector handling, and downloads to Harbor without resolving or spawning a CLI executable.
  • Preserve the existing atomic cache publication and registry-specific cache keys.

This is a focused follow-up stacked on #1882.


Note

Medium Risk
Changes the dataset fetch integration path for Harbor tasksets; behavior should match the CLI but errors and edge cases now depend on the library instead of wrapped subprocess output.

Overview
Harbor v1 taskset caching no longer shells out to the harbor CLI. dataset_dir now calls Harbor’s harbor.cli.download.download_command in-process with the same dataset/registry selectors (repo, registry_path, registry_url) and export options.

The local harbor_cli() helper and subprocess-based argv builder are removed. Missing the optional Harbor dependency fails at import time with an ImportError pointing at uv sync --python 3.12 --extra harbor. Atomic cache publication and registry-specific cache keys are unchanged.

Reviewed by Cursor Bugbot for commit ee6d53f. Bugbot is set up for automated code reviews on this repo. Configure here.

Note

Replace subprocess-based Harbor CLI calls with the Harbor Python library for taskset downloads

  • Replaces subprocess invocations of the Harbor CLI with direct calls to harbor.cli.download.download_command in taskset.py.
  • The harbor_cli helper and the local download_command function are removed; the module-level download_command name now refers to the imported library function with a different signature.
  • Download errors now surface as exceptions from the Harbor library rather than a synthesized RuntimeError from subprocess output.
  • Risk: importing the taskset module now raises ImportError immediately if the Harbor library is not installed, rather than deferring the error to runtime.

Macroscope summarized ee6d53f.

@macroscopeapp

macroscopeapp Bot commented Jun 26, 2026

Copy link
Copy Markdown

Approvability

Verdict: Approved

Mechanical refactor replacing subprocess-based Harbor CLI invocation with direct library function calls. Same parameters are passed; the change simplifies the code by removing CLI orchestration and relying on the library's native API.

You can customize Macroscope's approvability policy. Learn more.

@xeophon xeophon requested a review from rasdani June 26, 2026 10:31

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ee6d53fd9b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

registry_path = config.registry_path
if registry_path is not None and config.repo is None:
registry_path = registry_path.expanduser()
download_command(

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Use an async-safe Harbor download path

On an uncached Harbor taskset this now invokes Harbor's Typer command in-process; that command dispatches through harbor.cli.utils.run_async, which calls asyncio.run(...). The v1 eval path calls env.taskset.load_tasks() from inside the already-running async run_eval, so first-time prime eval run/uv run eval for Harbor will raise RuntimeError: asyncio.run() cannot be called from a running event loop before any tasks load, whereas the previous subprocess-based download did not share the event loop.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makingharbor.cli async safe introduces significant boilerplate
#1887

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i timed subprocess vs habor SDK

it's not that much of a difference
if you can report slow-down on your example, we can reconsider

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants