Add NeMo Gym#1886
Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 59b494f. Configure here.
| raise ValueError(f"unknown NeMo Gym tool: {name}") | ||
| response = await self.client.post(f"/{name}", json=arguments) | ||
| response.raise_for_status() | ||
| return response.json() |
There was a problem hiding this comment.
AsyncClient crosses event loops
High Severity
The in-process httpx.AsyncClient for the NeMo resource server is created in _NeMoGymTools.setup during the tool server’s pre-serve asyncio.run setup phase, while nemo_gym_call runs later under uvicorn’s separate event loop. Reusing that client after the first loop closes can break call even when /seed_session succeeded during setup.
Reviewed by Cursor Bugbot for commit 59b494f. Configure here.
ApprovabilityVerdict: Needs human review 1 blocking correctness issue found. This PR introduces a new NeMo Gym taskset integration with substantial new runtime behavior. Multiple unresolved review comments identify potential bugs, including a high-severity issue with AsyncClient crossing event loops that could cause runtime failures. You can customize Macroscope's approvability policy. Learn more. |
| system_prompt=( | ||
| "Call `nemo_gym_call` with the matching tool name and arguments. " | ||
| f"Available NeMo Gym tools: " | ||
| f"{json.dumps(row['responses_create_params'].get('tools', []))}" | ||
| ), |
There was a problem hiding this comment.
🟡 Medium nemo_gym/taskset.py:140
load_tasks() always sets system_prompt to instruct the model to call nemo_gym_call with the matching tool name, but for rows where responses_create_params.tools is missing or empty, tools() returns []. Those tasks expose no nemo_gym_call tool yet demand its use, making them unsatisfiable. Consider omitting or adjusting the system_prompt when no tools are present.
| system_prompt=( | |
| "Call `nemo_gym_call` with the matching tool name and arguments. " | |
| f"Available NeMo Gym tools: " | |
| f"{json.dumps(row['responses_create_params'].get('tools', []))}" | |
| ), | |
| system_prompt=( | |
| "Call `nemo_gym_call` with the matching tool name and arguments. " | |
| f"Available NeMo Gym tools: " | |
| f"{json.dumps(row['responses_create_params'].get('tools', []))}" | |
| ) if row['responses_create_params'].get('tools') else None, |
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file @verifiers/v1/tasksets/nemo_gym/taskset.py around lines 140-144:
`load_tasks()` always sets `system_prompt` to instruct the model to call `nemo_gym_call` with the matching tool name, but for rows where `responses_create_params.tools` is missing or empty, `tools()` returns `[]`. Those tasks expose no `nemo_gym_call` tool yet demand its use, making them unsatisfiable. Consider omitting or adjusting the `system_prompt` when no tools are present.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1d0eb27aaa
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| return response.json() | ||
|
|
||
|
|
||
| class NeMoGymTaskset(Taskset[NeMoGymTask, NeMoGymConfig]): |
There was a problem hiding this comment.
Score rollouts with NeMo Gym verification
As introduced, this taskset loads tasks and tools but never defines any @reward or group reward that calls the resource server's /verify endpoint. Taskset.score() only records decorated rewards, and Trace.reward is just the sum of trace.rewards, so uv run --with nemo-gym==0.3.0 eval nemo_gym ... will report 0 reward for every successful rollout even when NeMo Gym's verifier would return a nonzero reward. Add a reward/finalize path that sends the completed response plus responses_create_params to the NeMo Gym verifier and records the returned reward.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1850666644
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| name=f"{self.config.resource_server}:{idx}", | ||
| prompt=dialect.parse_request(row["responses_create_params"])[0], | ||
| system_prompt=( | ||
| "Call `nemo_gym_call` with the matching tool name and arguments. " |
There was a problem hiding this comment.
Avoid prompting for a missing NeMo tool
When a NeMo row has no responses_create_params.tools, tools() returns no MCP server, so the harness never exposes nemo_gym_call; however this unconditional system prompt still tells the model to call it. This affects answer-only NeMo resource servers/tasks and makes the prompt ask for an unavailable tool instead of letting the model answer normally, so condition the instruction on actually exposing the wrapper tool.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0fb0045a1a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| version = "0.1.0" | ||
| description = "nemo-gym-workplace-v1 - NeMo Gym workplace_assistant through the built-in taskset." | ||
| requires-python = ">=3.11" | ||
| dependencies = ["verifiers"] |
There was a problem hiding this comment.
Add the NeMo Gym runtime dependency
For packaged installs of this environment, this metadata only installs verifiers, but the taskset immediately delegates to NeMoGymTaskset.load_tasks(), which imports nemo_gym and raises if it is absent. The README's manual --with nemo-gym==0.3.0 workaround only helps that one dev command; prime env install/Hub consumers will hit an ImportError before tasks load. Add the NeMo Gym dependency here, and align requires-python with it if needed.
Useful? React with 👍 / 👎.
| # nemo-gym-workplace-v1 | ||
|
|
||
| NeMo Gym's `workplace_assistant` example through the built-in `NeMoGymTaskset`. | ||
| This environment pins the taskset config only; the standard Verifiers harness owns the | ||
| rollout loop, and NeMo Gym's packaged resource server owns tool execution. |
There was a problem hiding this comment.
Replace freeform README with generated sections
The root AGENTS.md says, "Environment READMEs must use the generated prime env init section structure; freeform environment READMEs are not allowed." This new README starts as a custom overview/Develop/Layout page instead of that required structure, so it does not satisfy the repository's documented environment README contract. Please regenerate/use the standard sections and fill them in.
Useful? React with 👍 / 👎.


Overview
Adds a compact v1 NeMo Gym taskset and a small environment package that shows how to use it with a harder packaged NeMo Gym example.
Details
nemo-gym-workplace-v1as a thin config wrapper over the reusable taskset, pinned to NeMo Gym'sworkplace_assistantresource server.Note
Medium Risk
New optional integration that bootstraps NeMo Gym servers in-process and proxies tool HTTP calls, but it is confined to the new taskset path and does not change core harness or auth flows.
Overview
Adds a built-in
NeMoGymTasksetso Verifiers v1 can run NVIDIA NeMo Gym packaged JSONL benchmarks through the standard MCP harness, without a NeMo-specific program.Tasks are built from each row’s
responses_create_params(viaResponsesDialect) while keeping the fullnemo_gym_rowpayload. Tooling spins up NeMo Gym’s in-process ASGI resource server for the configurednemo_env, seeds a session per task, and exposes a singlenemo_gym_callMCP tool that forwards only tools declared on that task.nemo-gym==0.3.0stays optional; missing installs get a clearImportErrorwith auv run --withhint.Also ships
nemo-gym-workplace-v1as a thin taskset that pinsworkplace_assistant, exportsNeMoGymConfig/NeMoGymTasksetfromverifiers.v1.tasksets, and documents CLI usage inenvironments/README.mdandverifiers/v1/README.md(replacing the oldernemo_gym_envlisting).Reviewed by Cursor Bugbot for commit 0fb0045. Bugbot is set up for automated code reviews on this repo. Configure here.
Note
Add NeMo Gym taskset and workplace environment to verifiers v1
NeMoGymTasksetandNeMoGymConfigtoverifiers/v1/tasksets/nemo_gym/taskset.py, which loads tasks from a NeMo Gym JSONL dataset and proxies tool calls through an in-process resource server viahttpx.AsyncClientwith ASGI transport.POST /seed_session) and per-tool endpoints (POST /{name}), exposing a singlenemo_gym_calltool to MCP-capable harnesses; tasks without declared tools return no tools.nemo-gym-workplace-v1environment package inenvironments/nemo_gym_workplace_v1/that pinsnemo_envtoworkplace_assistantviaNemoGymWorkplaceTaskset.NeMoGymConfigandNeMoGymTasksetfromverifiers.v1.tasksetsalongside the existingHarborTaskset.Macroscope summarized 0fb0045.