Skip to content

Add TMax v1 taskset#1847

Open
rasdani wants to merge 1 commit into
mainfrom
add-tmax-v1-taskset
Open

Add TMax v1 taskset#1847
rasdani wants to merge 1 commit into
mainfrom
add-tmax-v1-taskset

Conversation

@rasdani

@rasdani rasdani commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Add built-in tmax-v1 under verifiers.v1.tasksets as a narrow Harbor specialization pinned to tmax/TMax-15K-Harbor@latest.
  • Require an explicit [taskset.image_map] entry for each selected TMax task and set the mapped Prime image on the loaded Harbor task.
  • Add configs/tmax-v1-smoke.toml with the five pushed public Prime smoke images and document Harbor as the runnable source.

Validation

  • uv run ruff check verifiers/v1/tasksets/tmax_v1/__init__.py verifiers/v1/tasksets/tmax_v1/taskset.py verifiers/v1/tasksets/__init__.py
  • uv run python -m py_compile verifiers/v1/tasksets/tmax_v1/__init__.py verifiers/v1/tasksets/tmax_v1/taskset.py verifiers/v1/tasksets/__init__.py
  • uv lock --check
  • uv run --python 3.13 ty check verifiers
  • git diff --check HEAD~1..HEAD
  • uv run eval @ configs/tmax-v1-smoke.toml --taskset.tasks task_000004_aeddda76 -m openai/gpt-5.5 -n 1 -r 1 -c 1 --rich false -o outputs/smokes/tmax-v1-20260623T211516Z
    • results.jsonl: tmax/task_000004_aeddda76, solved=1.0, completed, no errors

No unit tests were added, per the requested scope.


Note

Low Risk
Additive built-in taskset and docs; runtime behavior is inherited from harbor-v1 with explicit image injection and clear errors for missing mappings.

Overview
Adds a built-in tmax-v1 taskset that specializes harbor-v1 for the TMax Harbor dataset tmax/TMax-15K-Harbor@latest. TMaxConfig pins that dataset, sets ignore_dockerfile so Dockerfile-only tasks are not rejected, and requires [taskset.image_map] entries keyed by Harbor task directory id. TMaxTaskset.load_tasks loads Harbor tasks then injects the mapped Prime image on each row, failing fast if a selected task has no mapping.

Package exports and docs (GUIDE.md, README.md) describe Harbor as the execution source and the image-map workflow. configs/tmax-v1-smoke.toml wires five smoke task ids to public Prime images with bash harness on prime runtime.

Reviewed by Cursor Bugbot for commit 6201f27. Bugbot is set up for automated code reviews on this repo. Configure here.

Note

Add TMax v1 taskset backed by Harbor with per-task Prime image mapping

  • Adds TMaxTaskset and TMaxConfig in taskset.py, a HarborTaskset subclass that loads tasks from tmax/TMax-15K-Harbor@latest and injects a Prime image per task from a required image_map.
  • Missing image_map entries cause a hard failure at load time, so all selected tasks must have a mapped image before the taskset can run.
  • Adds a smoke config in configs/tmax-v1-smoke.toml with 5 tasks, 1 rollout, and max_concurrent=1.
  • Updates verifiers/v1/tasksets/__init__.py to export TMaxConfig and TMaxTaskset, and adds documentation in the v1 README and GUIDE.

Macroscope summarized 6201f27.

@macroscopeapp

macroscopeapp Bot commented Jun 23, 2026

Copy link
Copy Markdown

Approvability

Verdict: Needs human review

This PR adds a new tmax-v1 taskset capability with new Python classes and configuration. While the implementation cleanly extends existing Harbor abstractions, introducing new user-facing features warrants human review.

You can customize Macroscope's approvability policy. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant