Add TMax v1 taskset#1847
Open
rasdani wants to merge 1 commit into
Open
Conversation
ApprovabilityVerdict: Needs human review This PR adds a new tmax-v1 taskset capability with new Python classes and configuration. While the implementation cleanly extends existing Harbor abstractions, introducing new user-facing features warrants human review. You can customize Macroscope's approvability policy. Learn more. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
tmax-v1underverifiers.v1.tasksetsas a narrow Harbor specialization pinned totmax/TMax-15K-Harbor@latest.[taskset.image_map]entry for each selected TMax task and set the mapped Prime image on the loaded Harbor task.configs/tmax-v1-smoke.tomlwith the five pushed public Prime smoke images and document Harbor as the runnable source.Validation
uv run ruff check verifiers/v1/tasksets/tmax_v1/__init__.py verifiers/v1/tasksets/tmax_v1/taskset.py verifiers/v1/tasksets/__init__.pyuv run python -m py_compile verifiers/v1/tasksets/tmax_v1/__init__.py verifiers/v1/tasksets/tmax_v1/taskset.py verifiers/v1/tasksets/__init__.pyuv lock --checkuv run --python 3.13 ty check verifiersgit diff --check HEAD~1..HEADuv run eval @ configs/tmax-v1-smoke.toml --taskset.tasks task_000004_aeddda76 -m openai/gpt-5.5 -n 1 -r 1 -c 1 --rich false -o outputs/smokes/tmax-v1-20260623T211516Zresults.jsonl:tmax/task_000004_aeddda76,solved=1.0, completed, no errorsNo unit tests were added, per the requested scope.
Note
Low Risk
Additive built-in taskset and docs; runtime behavior is inherited from harbor-v1 with explicit image injection and clear errors for missing mappings.
Overview
Adds a built-in
tmax-v1taskset that specializesharbor-v1for the TMax Harbor datasettmax/TMax-15K-Harbor@latest.TMaxConfigpins that dataset, setsignore_dockerfileso Dockerfile-only tasks are not rejected, and requires[taskset.image_map]entries keyed by Harbor task directory id.TMaxTaskset.load_tasksloads Harbor tasks then injects the mapped Prime image on each row, failing fast if a selected task has no mapping.Package exports and docs (
GUIDE.md,README.md) describe Harbor as the execution source and the image-map workflow.configs/tmax-v1-smoke.tomlwires five smoke task ids to public Prime images with bash harness on prime runtime.Reviewed by Cursor Bugbot for commit 6201f27. Bugbot is set up for automated code reviews on this repo. Configure here.
Note
Add TMax v1 taskset backed by Harbor with per-task Prime image mapping
TMaxTasksetandTMaxConfigin taskset.py, aHarborTasksetsubclass that loads tasks fromtmax/TMax-15K-Harbor@latestand injects a Prime image per task from a requiredimage_map.image_mapentries cause a hard failure at load time, so all selected tasks must have a mapped image before the taskset can run.max_concurrent=1.verifiers/v1/tasksets/__init__.pyto exportTMaxConfigandTMaxTaskset, and adds documentation in the v1 README and GUIDE.Macroscope summarized 6201f27.