Skip to content

fix(configurator): make env_params first-class to fix the trajectory cache key#901

Merged
podkidyshev merged 14 commits into
NVIDIA:mainfrom
rutayan-nv:rpatro/env-params-first-class
Jun 30, 2026
Merged

fix(configurator): make env_params first-class to fix the trajectory cache key#901
podkidyshev merged 14 commits into
NVIDIA:mainfrom
rutayan-nv:rpatro/env-params-first-class

Conversation

@rutayan-nv

@rutayan-nv rutayan-nv commented May 26, 2026

Copy link
Copy Markdown
Contributor

Issue

Fix

  • Cache correctness: env_params becomes first-class schema (TestDefinition.env_params, TestRun.current_env_params, TrajectoryEntry.env_params); the cache key is now (action, current_env_params).
  • Artifact alignment: CloudAIGymEnv builds an EnvParamsObserver that samples deterministically and writes env.csv before the cache lookup, so it aligns 1:1 with trajectory.csv; workloads without env_params pay zero overhead.

Testing

Stack: main#893#900#901 (top). Depends on #900#893.

@coderabbitai

coderabbitai Bot commented May 26, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds per-trial env parameter sampling and validation, threads sampled env_params through TestRun and CloudAIGymEnv, persists env.csv alongside trajectories, tightens cache matching on env_params, and updates DSE CLI handling and tests for the new flow.

Changes

Env parameter sampling and DSE flow

Layer / File(s) Summary
Env parameter primitives
src/cloudai/configurator/env_params.py, tests/test_env_params.py
EnvParamSpec, EnvParam, EnvParams, and the env-parameter CSV helper are added for weighted sampling, deterministic per-trial draws, persistence, and scenario validation. Tests cover weight rules, sampling, CSV output, and building samples from test definitions.
Workload and TestRun overlays
src/cloudai/configurator/base_agent.py, src/cloudai/models/workload.py, src/cloudai/_core/test_scenario.py, tests/test_env_params.py, tests/test_handlers.py
BaseAgent gains the env-sampling capability flag. TestDefinition adds env_params validation and env-sampled detection. TestRun updates domain-randomization checks, action-space filtering, and parameter application to merge action values with env_params and revalidate the resulting test definition. Tests cover env-parameter overlay validation, DSE classification, and apply_params_set behavior.
CloudAIGymEnv env_params flow
src/cloudai/configurator/cloudai_gym.py, tests/test_cloudaigym.py
CloudAIGymEnv samples env_params per step, applies them with the agent action, records them in trajectory entries, writes env.csv alongside trajectory.csv, and matches cached trajectories on both action and env_params. Tests cover cache keying, step alignment, overlay application, and the no-env_params path.
DSE validation and execution
src/cloudai/cli/handlers.py, tests/test_handlers.py
CLI handlers validate domain-randomization readiness during dry-run, run verification, and DSE execution. Unexpected DSE failures are recorded in dse_failure.txt, reports accept an optional error payload, and the original exception is re-raised after report generation. Tests cover validation acceptance and rejection cases.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

A rabbit hopped by with a sampling grin,
And env_params twinkled within.
The cache said “match,” then softly sighed,
While env.csv kept pace beside.
\o/ 🐇

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: making env_params first-class to fix trajectory cache keying.
Description check ✅ Passed The description is directly related to the changeset and accurately describes the cache and env_params work.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/cloudai/configurator/cloudai_gym.py (2)

143-171: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Keep trajectory.csv aligned on the constraint-failure path.

observer.before_step() has already persisted the sampled env row by Line 147, but the early return on Lines 169-171 skips write_trajectory(). Any trial rejected by constraint_check() now leaves an orphan env.csv row, which breaks the PR’s stated 1:1 env.csvtrajectory.csv contract and makes per-trial artifacts inconsistent.

Suggested fix
         if not self.test_run.test.constraint_check(self.test_run, self.runner.system):
             logging.info("Constraint check failed. Skipping step.")
-            return [-1.0], self.rewards.constraint_failure, True, {}
+            observation = [-1.0]
+            reward = self.rewards.constraint_failure
+            self.write_trajectory(
+                TrajectoryEntry(
+                    step=self.test_run.step,
+                    action=action,
+                    reward=reward,
+                    observation=observation,
+                    env_params=dict(self.test_run.current_env_params),
+                )
+            )
+            for observer in self.observers:
+                observer.after_step(self.test_run, observation, reward)
+            return observation, reward, True, {}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cloudai/configurator/cloudai_gym.py` around lines 143 - 171, The early
return on constraint_check leaves an env row without a matching trajectory; to
fix, when test_run.test.constraint_check(test_run, runner.system) returns False,
write a TrajectoryEntry (use self.write_trajectory with step=self.test_run.step,
action=action, reward=self.rewards.constraint_failure, observation=[-1.0],
env_params=dict(self.test_run.current_env_params)) and call each
observer.after_step(self.test_run, observation, reward) before returning the
same quadruple (observation, reward, True, {}); this mirrors the cached-result
path and preserves the 1:1 env.csv ↔ trajectory.csv contract.

186-202: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Preserve current_env_params when recovering from runner failures.

On Line 189 the fallback rebuilds self.test_run from self.original_test_run, which resets current_env_params to its default. The trajectory row written on Line 202 then records {} even though env.csv already contains the sampled params, and the cached failure is stored under (action, {}) instead of (action, sampled_env_params). Repeating the same failing trial will miss the cache and produce disagreeing artifacts.

Suggested fix
         if self.runner.test_scenario.test_runs and self.runner.test_scenario.test_runs[0].output_path.exists():
             self.test_run = self.runner.test_scenario.test_runs[0]
         else:
-            self.test_run = copy.deepcopy(self.original_test_run)
+            self.test_run = copy.deepcopy(self.original_test_run).apply_params_set(action)
             self.test_run.step = new_tr.step
             self.test_run.output_path = new_tr.output_path
+            self.test_run.current_env_params = dict(new_tr.current_env_params)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cloudai/configurator/cloudai_gym.py` around lines 186 - 202, When
rebuilding self.test_run in the fallback branch (inside cloudai_gym.py) you must
preserve the sampled env params so the trajectory/cache keys match the actual
env.csv; after creating the deep-copied self.test_run from
self.original_test_run and after setting step/output_path, copy the sampled
params into self.test_run.current_env_params (e.g. assign a deepcopy of
self.original_test_run.current_env_params or the sampled params present on the
runner) before calling get_observation/compute_reward and write_trajectory; this
ensures dict(self.test_run.current_env_params) in the TrajectoryEntry reflects
the real sampled parameters.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/cloudai/configurator/base_agent.py`:
- Around line 116-149: In run(), replace the eager f-string logging calls with
%-style deferred logging so formatting happens only if the message will be
emitted; specifically update the two logging.info calls inside the run method
(the "Running step ..." and "Step {step}: Observation..." messages) to use
logging.info("Running step %s (of %s) with action %s", step, self.max_steps,
action) and logging.info("Step %s: Observation: %s, Reward: %.4f", step, ...,
reward) respectively, ensuring you format the observation list without using
f-strings so the logging framework can lazily interpolate.

In `@src/cloudai/configurator/env_params.py`:
- Around line 1-2: The file header years in
src/cloudai/configurator/env_params.py are incorrect for the touched year;
update the two top lines (the SPDX copyright line and the subsequent Copyright
line) by replacing "2024-2026" with the exact year string the repo's header test
expects for this file (e.g., "2026") so the header matches
tests/test_check_copyright_headers.py.

In `@src/cloudai/configurator/gymnasium_adapter.py`:
- Around line 133-165: Add a public close(self) -> None method to
GymnasiumAdapter that safely delegates to the wrapped environment's close
method: check for an attribute (e.g. getattr(self._env, "close", None)), call it
if callable, and return None; do not assume inheritance from gymnasium.Env. Add
the method near other lifecycle helpers (e.g. alongside
_sync_underlying_step_counter, _step_with_params) so callers that call
env.close() won't raise AttributeError.
- Around line 77-80: GymnasiumAdapter.__init__ currently filters action space
lists into _tunable_params (len>1) and _fixed_params (len==1) but silently drops
any keys with empty lists; modify GymnasiumAdapter.__init__ to detect any entry
in the raw_action_space where len(v) == 0 and raise a ValueError identifying the
offending key(s) so callers (e.g., TestRun.param_space) fail fast; add a unit
test that constructs a TestRun or raw_action_space containing an empty list for
a cmd_args key and asserts that GymnasiumAdapter(...) raises ValueError with a
message mentioning the empty key(s).

In `@tests/test_env_params.py`:
- Around line 86-87: The test uses a broad pytest.raises(ValueError) when
calling sink.write(0, {"drop_rate": 0.0}); narrow it by adding a match parameter
to pytest.raises that matches the specific ValueError message expected (e.g.,
the message about invalid drop_rate or index) so the test asserts the exact
error condition raised by sink.write; update the pytest.raises invocation around
sink.write to include match="expected message substring" referencing the
sink.write call and pytest.raises.
- Around line 1-2: The copyright header in the file contains the range
"2025-2026" which fails CI; update the header lines (the SPDX line and the
Copyright line that currently reads "2025-2026 NVIDIA CORPORATION & AFFILIATES")
to use a single year "2026" so they match the expected format for new 2026
files.

In `@tests/test_handlers.py`:
- Around line 287-288: The override of select_action in tests/test_handlers.py
has the wrong signature; change the function declaration for select_action in
that file to match the base class signature BaseAgent.select_action(self,
observation: list[float] | None = None) (i.e., add the observation parameter
with the same type and default) and keep the existing body (raising
AssertionError) so pyright/CI type-checking passes while preserving the test
intent.

---

Outside diff comments:
In `@src/cloudai/configurator/cloudai_gym.py`:
- Around line 143-171: The early return on constraint_check leaves an env row
without a matching trajectory; to fix, when
test_run.test.constraint_check(test_run, runner.system) returns False, write a
TrajectoryEntry (use self.write_trajectory with step=self.test_run.step,
action=action, reward=self.rewards.constraint_failure, observation=[-1.0],
env_params=dict(self.test_run.current_env_params)) and call each
observer.after_step(self.test_run, observation, reward) before returning the
same quadruple (observation, reward, True, {}); this mirrors the cached-result
path and preserves the 1:1 env.csv ↔ trajectory.csv contract.
- Around line 186-202: When rebuilding self.test_run in the fallback branch
(inside cloudai_gym.py) you must preserve the sampled env params so the
trajectory/cache keys match the actual env.csv; after creating the deep-copied
self.test_run from self.original_test_run and after setting step/output_path,
copy the sampled params into self.test_run.current_env_params (e.g. assign a
deepcopy of self.original_test_run.current_env_params or the sampled params
present on the runner) before calling get_observation/compute_reward and
write_trajectory; this ensures dict(self.test_run.current_env_params) in the
TrajectoryEntry reflects the real sampled parameters.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 0097c65c-5ab7-4bd2-8683-6e1ea26cc182

📥 Commits

Reviewing files that changed from the base of the PR and between 18ce72e and d076095.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (14)
  • pyproject.toml
  • src/cloudai/_core/test_scenario.py
  • src/cloudai/cli/handlers.py
  • src/cloudai/configurator/__init__.py
  • src/cloudai/configurator/base_agent.py
  • src/cloudai/configurator/cloudai_gym.py
  • src/cloudai/configurator/env_params.py
  • src/cloudai/configurator/gymnasium_adapter.py
  • src/cloudai/models/workload.py
  • tests/test_cloudaigym.py
  • tests/test_env_params.py
  • tests/test_gymnasium_adapter.py
  • tests/test_handlers.py
  • tests/test_test_scenario.py

Comment thread src/cloudai/configurator/base_agent.py
Comment thread src/cloudai/configurator/env_params.py Outdated
Comment thread src/cloudai/configurator/gymnasium_adapter.py Outdated
Comment thread src/cloudai/configurator/gymnasium_adapter.py Outdated
Comment thread tests/test_env_params.py Outdated
Comment thread tests/test_env_params.py Outdated
Comment thread tests/test_handlers.py Outdated
@rutayan-nv rutayan-nv force-pushed the rpatro/env-params-first-class branch from d076095 to 5c85e08 Compare June 15, 2026 17:39

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/cloudai/configurator/cloudai_gym.py (1)

169-171: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve env/traj alignment and env_params identity on non-happy paths.

Line 170 currently returns on constraint failure after before_step() has already persisted env.csv, so env.csv and trajectory.csv can diverge.
Also, in Lines 189-203, replacing self.test_run with original_test_run can drop current_env_params, so the trajectory row is written with empty env params and weakens (action, env_params) cache identity.

Proposed fix
         if not self.test_run.test.constraint_check(self.test_run, self.runner.system):
             logging.info("Constraint check failed. Skipping step.")
-            return [-1.0], self.rewards.constraint_failure, True, {}
+            observation = [-1.0]
+            reward = self.rewards.constraint_failure
+            self.write_trajectory(
+                TrajectoryEntry(
+                    step=self.test_run.step,
+                    action=action,
+                    reward=reward,
+                    observation=observation,
+                    env_params=dict(self.test_run.current_env_params),
+                )
+            )
+            for observer in self.observers:
+                observer.after_step(self.test_run, observation, reward)
+            return observation, reward, True, {}
@@
         if self.runner.test_scenario.test_runs and self.runner.test_scenario.test_runs[0].output_path.exists():
             self.test_run = self.runner.test_scenario.test_runs[0]
         else:
+            preserved_env_params = dict(new_tr.current_env_params)
             self.test_run = copy.deepcopy(self.original_test_run)
             self.test_run.step = new_tr.step
             self.test_run.output_path = new_tr.output_path
+            self.test_run.current_env_params = preserved_env_params

Also applies to: 189-203

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cloudai/configurator/cloudai_gym.py` around lines 169 - 171, The
constraint check at lines 169-171 returns after before_step() has already
persisted env.csv, causing env.csv and trajectory.csv to diverge. Move the
constraint check before the before_step() call so that early returns don't
create misalignment between the two files. Additionally, at lines 189-203,
ensure that self.test_run is used instead of original_test_run to preserve
current_env_params in the trajectory row, maintaining proper action-env_params
cache identity.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/cloudai/configurator/gymnasium_adapter.py`:
- Around line 151-163: The _sync_underlying_step_counter method has an
off-by-one error where it sets test_run.step to self._step_count + 1, but
CloudAIGymEnv.step() also increments test_run.step on entry, causing a double
increment. Fix this by changing the assignment to test_run.step =
self._step_count instead, since the increment will be handled by
CloudAIGymEnv.step() when it is called, ensuring the first adapter-driven step
is recorded as trial 1 rather than trial 2.

---

Outside diff comments:
In `@src/cloudai/configurator/cloudai_gym.py`:
- Around line 169-171: The constraint check at lines 169-171 returns after
before_step() has already persisted env.csv, causing env.csv and trajectory.csv
to diverge. Move the constraint check before the before_step() call so that
early returns don't create misalignment between the two files. Additionally, at
lines 189-203, ensure that self.test_run is used instead of original_test_run to
preserve current_env_params in the trajectory row, maintaining proper
action-env_params cache identity.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: ea1d6771-30cc-49dd-bc66-394901097b13

📥 Commits

Reviewing files that changed from the base of the PR and between d076095 and 5c85e08.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (12)
  • pyproject.toml
  • src/cloudai/_core/test_scenario.py
  • src/cloudai/cli/handlers.py
  • src/cloudai/configurator/__init__.py
  • src/cloudai/configurator/cloudai_gym.py
  • src/cloudai/configurator/env_params.py
  • src/cloudai/configurator/gymnasium_adapter.py
  • src/cloudai/models/workload.py
  • tests/test_cloudaigym.py
  • tests/test_env_params.py
  • tests/test_gymnasium_adapter.py
  • tests/test_handlers.py

Comment thread src/cloudai/configurator/gymnasium_adapter.py Outdated
@rutayan-nv rutayan-nv force-pushed the rpatro/env-params-first-class branch from 5c85e08 to 0efcd11 Compare June 15, 2026 22:49

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/cloudai/configurator/cloudai_gym.py (1)

146-171: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep env.csv and trajectory.csv aligned on constraint-failure steps.

before_step() runs before the constraint check, so env.csv is already appended. The early return on Line 171 skips write_trajectory(), creating step drift between artifacts when constraints fail.

Proposed fix
         if not self.test_run.test.constraint_check(self.test_run, self.runner.system):
             logging.info("Constraint check failed. Skipping step.")
-            return [-1.0], self.rewards.constraint_failure, True, {}
+            observation = [-1.0]
+            reward = self.rewards.constraint_failure
+            self.write_trajectory(
+                TrajectoryEntry(
+                    step=self.test_run.step,
+                    action=action,
+                    reward=reward,
+                    observation=observation,
+                    env_params=dict(self.test_run.current_env_params),
+                )
+            )
+            for observer in self.observers:
+                observer.after_step(self.test_run, observation, reward)
+            return observation, reward, True, {}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cloudai/configurator/cloudai_gym.py` around lines 146 - 171, The
constraint check failure on line 171 returns early without calling
write_trajectory(), while before_step() has already been called on line 146,
causing env.csv and trajectory.csv to become misaligned. Add a
write_trajectory() call before the early return on constraint failure, similar
to what is done in the cached result case, to write a TrajectoryEntry with the
current step, action, constraint_failure reward, an appropriate observation, and
current environment parameters to keep the CSV artifacts synchronized.
tests/test_cloudaigym.py (1)

165-194: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Add an env_params-enabled constraint-failure regression test.

Current constraint-failure coverage does not exercise observer-enabled runs. Add a variant with tdef.env_params declared and assert failed steps keep env.csv and trajectory.csv step-aligned.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_cloudaigym.py` around lines 165 - 194, The test_constraint_failure
function needs to be expanded to cover observer-enabled runs. Add a new test
variant by either parameterizing the existing test or creating an additional
test case where tdef.env_params is declared before creating the CloudAIGymEnv.
After the env.step() call, add assertions to verify that the env.csv and
trajectory.csv output files are kept step-aligned following the constraint
failure, ensuring that observer-enabled runs handle failed steps correctly
without misaligning the step records across output files.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@src/cloudai/configurator/cloudai_gym.py`:
- Around line 146-171: The constraint check failure on line 171 returns early
without calling write_trajectory(), while before_step() has already been called
on line 146, causing env.csv and trajectory.csv to become misaligned. Add a
write_trajectory() call before the early return on constraint failure, similar
to what is done in the cached result case, to write a TrajectoryEntry with the
current step, action, constraint_failure reward, an appropriate observation, and
current environment parameters to keep the CSV artifacts synchronized.

In `@tests/test_cloudaigym.py`:
- Around line 165-194: The test_constraint_failure function needs to be expanded
to cover observer-enabled runs. Add a new test variant by either parameterizing
the existing test or creating an additional test case where tdef.env_params is
declared before creating the CloudAIGymEnv. After the env.step() call, add
assertions to verify that the env.csv and trajectory.csv output files are kept
step-aligned following the constraint failure, ensuring that observer-enabled
runs handle failed steps correctly without misaligning the step records across
output files.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 804b42a1-6f0a-462b-b3b1-291ec8eea021

📥 Commits

Reviewing files that changed from the base of the PR and between 5c85e08 and 0efcd11.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (12)
  • pyproject.toml
  • src/cloudai/_core/test_scenario.py
  • src/cloudai/configurator/__init__.py
  • src/cloudai/configurator/base_agent.py
  • src/cloudai/configurator/cloudai_gym.py
  • src/cloudai/configurator/env_params.py
  • src/cloudai/configurator/gymnasium_adapter.py
  • src/cloudai/models/workload.py
  • tests/test_cloudaigym.py
  • tests/test_env_params.py
  • tests/test_gymnasium_adapter.py
  • tests/test_handlers.py

@rutayan-nv rutayan-nv force-pushed the rpatro/env-params-first-class branch from 0efcd11 to b3adf41 Compare June 15, 2026 23:20

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/cloudai/configurator/cloudai_gym.py`:
- Around line 146-148: The constraint-failure early return in the code is
preventing `trajectory.csv` from being written and `after_step()` hooks from
being called, which breaks trial artifact consistency. Additionally, the early
return provides a fixed one-element observation instead of one matching the
configured observation space. Restructure the constraint-failure handling at
both affected locations to ensure that after encountering a constraint failure,
the code still executes the necessary finalization steps including writing
`trajectory.csv` and calling `after_step()` on all observers, while also
returning an observation that matches the shape of the configured observation
space rather than a hardcoded single-element observation. This applies at
src/cloudai/configurator/cloudai_gym.py lines 146-148 (anchor) and lines 169-172
(sibling).

In `@tests/test_cloudaigym.py`:
- Around line 567-615: Add a new test function to validate step alignment when
constraint_check fails during the env_params flow. Create a test similar to
test_env_csv_is_step_aligned_with_trajectory but configure the environment and
test scenario to trigger a constraint violation during env.step() calls. The
test must verify that despite the constraint failure: (1) env.csv and
trajectory.csv maintain 1:1 step row alignment (same step values in both files),
and (2) the observation returned from env.step() has length matching the number
of entries in agent_metrics. Use the same CloudAIGymEnv, TestRun, TestScenario,
and BaseRunner mocking patterns as the existing alignment test, but introduce
constraint conditions that will cause failures and ensure the alignment
verification assertions apply to the failure case.

In `@tests/test_gymnasium_adapter.py`:
- Around line 70-76: The `_GymWithTestRun` class initializes a `test_run` object
with a step counter but never increments it in the `step()` method, unlike the
real `CloudAIGymEnv.step()`. This means test assertions won't catch counter-sync
regressions in the adapter. Override or update the `step()` method in
`_GymWithTestRun` to increment `self.test_run.step` each time the method is
called, ensuring the test model accurately reflects the behavior of
`CloudAIGymEnv` and allows the adapter's pre-step sync assertions to detect
regressions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 0af1c210-2370-43d6-90f0-330aa01c925f

📥 Commits

Reviewing files that changed from the base of the PR and between 0efcd11 and b3adf41.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (13)
  • pyproject.toml
  • src/cloudai/_core/test_scenario.py
  • src/cloudai/cli/handlers.py
  • src/cloudai/configurator/__init__.py
  • src/cloudai/configurator/base_agent.py
  • src/cloudai/configurator/cloudai_gym.py
  • src/cloudai/configurator/env_params.py
  • src/cloudai/configurator/gymnasium_adapter.py
  • src/cloudai/models/workload.py
  • tests/test_cloudaigym.py
  • tests/test_env_params.py
  • tests/test_gymnasium_adapter.py
  • tests/test_handlers.py

Comment thread src/cloudai/configurator/cloudai_gym.py Outdated
Comment thread tests/test_cloudaigym.py Outdated
Comment thread tests/test_gymnasium_adapter.py Outdated
@rutayan-nv

Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/cloudai/configurator/cloudai_gym.py (1)

168-170: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Constraint-failure early return still breaks env.csv/trajectory.csv alignment.

When constraint_check fails, before_step() has already executed (line 146), writing to env.csv. But this early return skips both write_trajectory() and after_step() calls. This creates misalignment: env.csv has step N while trajectory.csv does not.

The past review flagged this and was marked "Addressed", but the current code still shows the bare early return. Either the fix was incomplete or reverted.

🛠️ Proposed fix
         if not self.test_run.test.constraint_check(self.test_run, self.runner.system):
             logging.info("Constraint check failed. Skipping step.")
-            return [-1.0], self.rewards.constraint_failure, True, {}
+            failure_obs = [-1.0]
+            failure_reward = self.rewards.constraint_failure
+            self.write_trajectory(
+                TrajectoryEntry(
+                    step=self.test_run.step,
+                    action=action,
+                    reward=failure_reward,
+                    observation=failure_obs,
+                    env_params=dict(self.test_run.current_env_params),
+                )
+            )
+            for observer in self.observers:
+                observer.after_step(self.test_run, failure_obs, failure_reward)
+            return failure_obs, failure_reward, True, {}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cloudai/configurator/cloudai_gym.py` around lines 168 - 170, The
constraint_check failure early return at line 168-170 skips both
write_trajectory() and after_step() calls while before_step() has already
executed and written to env.csv, causing misalignment between env.csv and
trajectory.csv. Move the write_trajectory() and after_step() method calls to
execute before this early return statement to ensure both files stay
synchronized, so that whenever a step is written to env.csv, the corresponding
trajectory is written to trajectory.csv.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@src/cloudai/configurator/cloudai_gym.py`:
- Around line 168-170: The constraint_check failure early return at line 168-170
skips both write_trajectory() and after_step() calls while before_step() has
already executed and written to env.csv, causing misalignment between env.csv
and trajectory.csv. Move the write_trajectory() and after_step() method calls to
execute before this early return statement to ensure both files stay
synchronized, so that whenever a step is written to env.csv, the corresponding
trajectory is written to trajectory.csv.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 3128d272-6f62-499c-b04a-76f177dbb799

📥 Commits

Reviewing files that changed from the base of the PR and between b3adf41 and 2411f39.

📒 Files selected for processing (10)
  • src/cloudai/_core/test_scenario.py
  • src/cloudai/cli/handlers.py
  • src/cloudai/configurator/base_agent.py
  • src/cloudai/configurator/cloudai_gym.py
  • src/cloudai/configurator/env_params.py
  • src/cloudai/models/workload.py
  • tests/test_cloudaigym.py
  • tests/test_env_params.py
  • tests/test_handlers.py
  • tests/test_test_scenario.py

@rutayan-nv rutayan-nv force-pushed the rpatro/env-params-first-class branch from 2411f39 to c53596d Compare June 16, 2026 15:06
@rutayan-nv

Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
src/cloudai/configurator/cloudai_gym.py (1)

168-170: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve step finalization on constraint failure to keep artifacts aligned.

At Line 168, this early return skips both trajectory persistence and observer after_step() even though before_step() already ran. That can leave env.csv and trajectory.csv out of sync by step and breaks the observer lifecycle contract on failure paths.

💡 Proposed fix
         if not self.test_run.test.constraint_check(self.test_run, self.runner.system):
             logging.info("Constraint check failed. Skipping step.")
-            return [-1.0], self.rewards.constraint_failure, True, {}
+            failure_observation = [-1.0] * max(len(self.test_run.test.agent_metrics), 1)
+            failure_reward = self.rewards.constraint_failure
+            self.write_trajectory(
+                TrajectoryEntry(
+                    step=self.test_run.step,
+                    action=action,
+                    reward=failure_reward,
+                    observation=failure_observation,
+                    env_params=dict(self.test_run.current_env_params),
+                )
+            )
+            for observer in self.observers:
+                observer.after_step(self.test_run, failure_observation, failure_reward)
+            return failure_observation, failure_reward, True, {}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cloudai/configurator/cloudai_gym.py` around lines 168 - 170, The
constraint check failure at line 168 returns early before executing trajectory
persistence and the observer after_step() method, breaking the observer
lifecycle since before_step() already executed. Refactor this code so that the
finalization steps (trajectory persistence and observer after_step() call) run
even when constraint_check() fails. Move the early return to after all
finalization code completes, ensuring env.csv and trajectory.csv remain
synchronized and the observer contract is maintained on all code paths including
constraint failures.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@src/cloudai/configurator/cloudai_gym.py`:
- Around line 168-170: The constraint check failure at line 168 returns early
before executing trajectory persistence and the observer after_step() method,
breaking the observer lifecycle since before_step() already executed. Refactor
this code so that the finalization steps (trajectory persistence and observer
after_step() call) run even when constraint_check() fails. Move the early return
to after all finalization code completes, ensuring env.csv and trajectory.csv
remain synchronized and the observer contract is maintained on all code paths
including constraint failures.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: e0a47585-a77f-469b-ad79-79b222c1b20a

📥 Commits

Reviewing files that changed from the base of the PR and between 2411f39 and c53596d.

📒 Files selected for processing (6)
  • src/cloudai/_core/test_scenario.py
  • src/cloudai/configurator/cloudai_gym.py
  • src/cloudai/configurator/env_params.py
  • src/cloudai/models/workload.py
  • tests/test_cloudaigym.py
  • tests/test_env_params.py

@rutayan-nv rutayan-nv force-pushed the rpatro/env-params-first-class branch 2 times, most recently from 1298463 to 9e379d1 Compare June 17, 2026 18:33
Comment thread src/cloudai/_core/test_scenario.py Outdated
Comment thread src/cloudai/_core/test_scenario.py Outdated
Comment thread src/cloudai/configurator/cloudai_gym.py Outdated
Comment thread src/cloudai/configurator/cloudai_gym.py Outdated
Comment thread src/cloudai/configurator/cloudai_gym.py Outdated
Comment thread src/cloudai/configurator/env_params.py Outdated
Comment thread src/cloudai/models/workload.py Outdated
Comment thread src/cloudai/configurator/env_params.py Outdated
Comment thread src/cloudai/configurator/env_params.py Outdated
@rutayan-nv

Copy link
Copy Markdown
Contributor Author

Follow-up gap: single-candidate ([1]) env_params still slip through

This PR rejects a scalar env_params target ("nothing to sample"), but a single-element list target (cmd_args.osl = [1] + [env_params.osl]) currently passes: validate_env_params only checks "is a non-empty list", not "≥ 2 candidates".

Semantically [1] is "sample from {1}" = always 1 = a constant draw — exactly the same no-op this PR is meant to kill, just spelled as a list. So it's the same class of bug leaking through.

Consistent fix: require an env_params target to be a candidate list of ≥ 2 values (a single candidate is a no-op, like a scalar). This also lines up with the parse-time canonicalization proposed in #946 (collapse non-env [x]x): once that exists, a [1] env_param naturally reduces to the already-rejected scalar case.

Filing as a follow-up rather than expanding this PR's scope.

@rutayan-nv

Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/cloudai/models/workload.py`:
- Around line 264-272: The validation in workload.py’s env-param candidate
handling is too permissive for singleton lists. Update the check around the
env-sampled field path in the candidate validation logic so any cmd_args.<name>
backed by EnvParams requires at least two candidates, not just when spec.weights
is set. Keep the empty-list error, and extend the existing len(value) < 2 guard
in the same validation block to reject [x] for all env-sampled parameters.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: ec13caed-ab2e-49c9-81c0-e5479b253080

📥 Commits

Reviewing files that changed from the base of the PR and between 7fec7f5 and a067e35.

📒 Files selected for processing (8)
  • src/cloudai/_core/test_scenario.py
  • src/cloudai/cli/handlers.py
  • src/cloudai/configurator/cloudai_gym.py
  • src/cloudai/configurator/env_params.py
  • src/cloudai/models/workload.py
  • tests/test_cloudaigym.py
  • tests/test_env_params.py
  • tests/test_handlers.py

Comment thread src/cloudai/models/workload.py Outdated
@rutayan-nv

Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@rutayan-nv

Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@rutayan-nv rutayan-nv requested a review from podkidyshev June 29, 2026 18:44
Make env_params a first-class part of CloudAIGymEnv trial identity so the
trajectory cache keys on (action, env_params) rather than action alone,
fixing the domain-randomization correctness bug where the same action under
a different env_params sample returned a stale reward.

- Cache key now includes env_params; cache-key tests pin the contract
  (formerly the TDD-red specs of NVIDIA#900, folded in here).
- Keep env.csv and trajectory.csv 1:1 step-aligned: a single TrajectoryEntry
  sinks both files coherently, including on constraint failure.
- Reject env_params on non-DSE jobs; reject non-finite / negative weights.
- Add cache-hit + declared-env_params integration coverage.

Folds the test-only PR NVIDIA#900 (cache-key TDD) into this PR so the stack has no
permanently-red standalone PR.
…rom search space

Make env_params a thin annotation over cmd_args fields instead of a holder of
candidate values. Candidate values live in cmd_args (the single source of truth,
exactly like an action-space dimension); env_params.<name> only marks a field as
env-sampled and carries optional sampling weights, never the values.

- EnvParamSpec drops `values`; validates weights (finite, non-negative, sum=1.0).
- Sampler/observer resolve candidate lists from cmd_args; scalar knobs are no-ops.
- TestDefinition.validate_env_params cross-checks annotations against cmd_args
  (key must be a real field; weights require a list and must match its length).
- Exclude env_params keys from both param_space and is_dse_job: an env-sampled
  list is not a search dimension, so an env-params-only workload is not a DSE job.
- validate_dse_env_params rejects env_params on non-DSE runs and on grid_search
  (exhaustive search cannot exploit per-trial randomization).
- Scrub private-implementation references from public docstrings.
- Unit tests use generic Atari Breakout semantics (ball_speed / paddle_width).
…pyright

- validate_env_params: reject structured (non-leaf) cmd_args targets. The
  observer cannot sample them, yet param_space/is_dse_job exclude the whole
  key, which would silently drop nested action dimensions.
- CloudAIGymEnv.write_trajectory: rebind the env.csv sink to the current
  iteration path before each write, so env.csv stays 1:1 aligned with
  trajectory.csv when the env instance is reused across iterations.
- test_env_params: assert the unknown-field rejection via model_validate so the
  negative test no longer trips pyright's call-arg check (CI Linting fix); add a
  structured-target rejection test.
An unweighted env_params spec skipped the candidate-list check, so an empty cmd_args list
(e.g. ball_speed = []) passed validation and only failed later in EnvParamsSampler.sample()
via rng.choice([]) (IndexError). Guard against an empty candidate list in validate_env_params
so the error surfaces at TestDefinition build time. Addresses CodeRabbit feedback.
…ms value objects

Replace the EnvParamsSampler class and the StepObserver/EnvParamsObserver
indirection with two frozen dataclasses: EnvParam (one resolved knob: candidates,
optional weights, single draw) and EnvParams (per-run knobs + seed, built via
from_test, sampled per trial). The sampling RNG lives in the env: step() draws
this trial's values and hands concrete values to TestRun.apply_params_set(action,
env_params=...), which overlays action and sample through one deterministic path.

Centralize the cmd_args -> env_params lookup in TestDefinition.is_env_sampled and
access current_env_params directly. Expand EnvParam/EnvParams unit tests to cover
draw, from_test, sample, and immutability.
Drop the EnvParamsSink Protocol + CsvSink pair (and runtime_checkable) for a
single concrete EnvParamsSink, built unconditionally in CloudAIGymEnv. The
sink is now stateless: write() takes the record path per call and skips empty
samples, so non-DR runs write nothing and write_trajectory needs no branch.

Derive both records from a new iteration_dir property and expose the env
record via the env_params_record_path property (was _env_csv_path), keeping
env.csv and trajectory.csv step-aligned without coupling the name to CSV.
…ty flag

Replace the hardcoded `agent == "grid_search"` check with a BaseAgent.samples_env_params
capability flag (opt-in, defaults False). Only agents whose search consumes per-trial
env_params sampling set it True; enumerating/surrogate agents leave it False, so a config
that declares env_params for an agent that would ignore them is rejected up front instead
of silently no-op'ing. New agents answer for themselves with no string to maintain.

Relocate validate_dse_env_params out of the CLI handlers into configurator/env_params.py
next to the logic it guards, looking the agent up via the Registry. Unknown agents are
deferred to the dedicated agent-resolution error rather than masked here.

Keep all public-facing comments, docstrings, and the error message generic (no internal
agent names). Cover the full validator matrix, including the unknown-agent deferral.
Compress multi-line inline comments down to the single non-obvious rationale (or drop
them where the code already speaks), per the self-documenting-code principle. Public API
docstrings and test intent comments are left intact.
apply_params_set overlays sampled scalar draws onto cmd_args, then
reconstructs the TestDefinition to validate the applied action values.
That pass re-ran validate_env_params, which rejects a weighted env_param
whose cmd_args target is no longer a candidate list - exactly what the
overlay produces. env_params is already validated at parse time, so drop
it from the validation-only dump. Adds a regression test covering a
weighted env_param's scalar draw.
An env_params entry only reclassifies a list-valued cmd_args sweep as
env-sampled; a scalar is already fixed, so annotating it is a meaningless
label. Previously such an annotation was tolerated as a silent no-op, which
let it slip through parse-time validation and inconsistently trip (or not)
the downstream "no agent will sample them" check depending on run mode.

Reject it where the contract lives - TestDefinition.validate_env_params -
so the failure is immediate and mode-independent. EnvParams.from_test's
non-list guard becomes defensive (parse-time now guarantees lists); the
post-overlay path already drops env_params before re-validating, so concrete
scalar draws are unaffected.

Extract the per-field checks into a helper to keep the validator under the
complexity limit, and update tests: scalar annotations now assert rejection
instead of no-op tolerance.
…edicates

Drop TestRun.current_env_params and the restore line in CloudAIGymEnv.step;
the per-trial draw is now a local threaded explicitly into the cache lookup
and TrajectoryEntry, mirroring action. Adds TestDefinition
.is_domain_randomization_enabled (declared?) and TestRun
.is_domain_randomization_active (will actually sample?), and renames
validate_dse_env_params -> validate_domain_randomization_active to flag
declared-but-inactive domain randomization. Also collapses EnvParamsSink
into a write_env_params function and tidies apply_params_set.
registry.py has no runtime cloudai imports (all under TYPE_CHECKING), so the
function-local import in is_domain_randomization_active was unnecessary. Move
it to the top with the other _core imports.
A single-element list (cmd_args.<name> == [x]) has the same degenerate
semantics as the scalar case already rejected in 8f91664: EnvParams.sample()
can only ever return x, while the rest of the flow treats the run as
domain-randomized. Move the >= 2 candidate guard ahead of the weights
short-circuit so it applies to every env-sampled field, not just weighted ones.
@rutayan-nv rutayan-nv force-pushed the rpatro/env-params-first-class branch from 7462135 to d02769e Compare June 29, 2026 22:08
…rts_variable_environment

The flag declares that an agent operates over a variable (non-stationary)
environment that changes per trial. "sampling" was an implementation detail
(env_params domain randomization); curriculum learning also varies the env
without sampling. The broader name keeps the same infra usable for both. No
behavior change - pure rename of the opt-in capability flag and its references.
@podkidyshev podkidyshev merged commit e160386 into NVIDIA:main Jun 30, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants