MAINT: Move include_baseline from Scenario constructor to initi…#1700
Conversation
0d921bb to
206f1b4
Compare
…alize_async Treats include_baseline like every other common runtime parameter on initialize_async. Subclasses control behavior via two ClassVar flags: SUPPORTS_DEFAULT_BASELINE (capability) and DEFAULT_INCLUDE_BASELINE (default when caller doesn't specify). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
206f1b4 to
a21dbbb
Compare
There was a problem hiding this comment.
Pull request overview
This PR refactors baseline handling for scenarios by moving include_baseline from Scenario/subclass constructors to a runtime, per-run keyword-only argument on Scenario.initialize_async. This aligns baseline inclusion with other runtime initialization inputs (target, strategies, datasets, concurrency, etc.) and adds class-level capability/default controls.
Changes:
- Removes
include_default_baselinefromScenario.__init__and all scenario subclass constructors; addsinclude_baseline: bool | None = NonetoScenario.initialize_async. - Introduces
Scenario.SUPPORTS_DEFAULT_BASELINEandScenario.DEFAULT_INCLUDE_BASELINEto control baseline capability and default inclusion per scenario type. - Updates unit/integration tests and scenario documentation/notebooks to reflect the new initialization-time baseline configuration.
Reviewed changes
Copilot reviewed 22 out of 22 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/scenario/test_scenario.py | Updates test scaffolding to use class-level baseline capability flags and removes constructor baseline args. |
| tests/unit/scenario/test_scenario_retry.py | Updates retry tests’ concrete scenario fixture to disable default baseline via class flag. |
| tests/unit/scenario/test_scenario_partial_results.py | Updates partial-results scenario fixture to disable default baseline via class flag. |
| tests/unit/scenario/test_scenario_parameters.py | Updates parameter tests to disable baseline via class flag and removes constructor baseline arg. |
| tests/unit/scenario/test_leakage_scenario.py | Adjusts Leakage baseline-related test to align with the new class-level baseline controls. |
| tests/unit/scenario/test_jailbreak.py | Adds coverage for Jailbreak’s baseline-default-off behavior and explicit override via initialize_async. |
| tests/unit/scenario/test_foundry.py | Moves RedTeamAgent baseline disabling from constructor to initialize_async in tests. |
| tests/unit/scenario/test_adversarial.py | Adds baseline capability/validation tests for AdversarialBenchmark (baseline forbidden). |
| tests/integration/datasets/test_seed_dataset_provider_integration.py | Updates integration test to pass include_baseline at initialize time. |
| pyrit/scenario/scenarios/garak/encoding.py | Removes constructor baseline parameter and stops forwarding baseline into Scenario.init. |
| pyrit/scenario/scenarios/foundry/red_team_agent.py | Removes constructor baseline parameter; widens initialize_async signature to accept include_baseline and forwards it to base. |
| pyrit/scenario/scenarios/benchmark/adversarial.py | Marks AdversarialBenchmark as not supporting the default baseline and removes constructor baseline wiring. |
| pyrit/scenario/scenarios/airt/scam.py | Removes constructor baseline parameter and stops forwarding baseline into Scenario.init. |
| pyrit/scenario/scenarios/airt/psychosocial.py | Marks Psychosocial as not supporting the default baseline; removes constructor baseline wiring. |
| pyrit/scenario/scenarios/airt/leakage.py | Removes constructor baseline wiring (falls back to base defaults). |
| pyrit/scenario/scenarios/airt/jailbreak.py | Sets Jailbreak’s default baseline inclusion to off; removes constructor baseline parameter/wiring. |
| pyrit/scenario/scenarios/airt/cyber.py | Removes constructor baseline parameter and stops forwarding baseline into Scenario.init. |
| pyrit/scenario/core/scenario.py | Implements class-level baseline capability/default flags and resolves baseline inclusion inside initialize_async. |
| doc/code/scenarios/1_common_scenario_parameters.py | Updates examples/instructions to configure baseline via initialize_async instead of constructors. |
| doc/code/scenarios/1_common_scenario_parameters.ipynb | Regenerated notebook reflecting baseline configuration via initialize_async. |
| doc/code/scenarios/0_scenarios.py | Updates baseline parameter documentation to reflect initialize_async-based configuration. |
| doc/code/scenarios/0_scenarios.ipynb | Regenerated notebook reflecting baseline parameter documentation changes. |
…e-async Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
This looks good, but holistically you may want to update the baseline behavior here as well |
…async Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…OLICY to BASELINE_POLICY Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…nstructions Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…v0.16.0) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…e-baseline-on-initialize-async
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
rlundeen2
left a comment
There was a problem hiding this comment.
The deprecation paths are heavy, but I'm not too concerned since we're removing them. Here are some minor inconsistencies copilot found
- Leakage and Psychosocial don't have the deprecated include_baseline constructor kwarg, while every other
subclass does. Leakage silently uses BaselinePolicy.Enabled (the default) — so its constructor is already "migrated"
to the new style. Fine, but it's inconsistent with the other scenarios which all carry the deprecation shim. If a
user was passing include_baseline=True to Leakage(), they'll get a TypeError instead of a deprecation warning. - Encoding._get_atomic_attacks_async (line 245-246) passes self._resolved_seed_groups or [] to
_build_baseline_atomic_attack. The or [] fallback is defensive but suspicious — if _resolved_seed_groups is None at
that point, it means _resolve_seed_groups() (line 241) returned something truthy but then got cleared, which can't
happen. The or [] masks a deeper invariant violation. Same pattern in Scam (line 287) and Jailbreak (line 328). - initialize_async line 692 (the rescue path): when baseline is needed but the override didn't emit it, the rescue
re-calls self._dataset_config.get_all_seed_attack_groups(). Under max_dataset_size, this triggers a second
random.sample — exactly the ADO 9012 bug this PR is supposed to fix. The rescue path itself is vulnerable to the
very bug it's working around. This is only for legacy/unmigrated overrides, and it's deprecated, but it's ironic and
worth noting.
…add Psychosocial deprecation shim Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Description
Two related changes to scenario baseline handling.
Part 1: Move
include_baselinetoinitialize_async. Dropsinclude_default_baselinefromScenario.__init__and every subclass constructor. Addsinclude_baseline: bool | None = Noneas a runtime kwarg onScenario.initialize_asyncso baseline can be overridden per run from config, CLI, or frontend without rebuilding the scenario.Part 2: Structural fix for ADO 9012. Folds the fix from #1697 into this PR per @rlundeen2's review there. Replaces the cache approach with a structural one. Baseline emission moves inside
_get_atomic_attacks_asyncso exactly one seed-group resolution call happens per run, and baseline and strategies share the same sampled population by construction. Closes #1697 on merge.Class-level controls
BaselinePolicyClassVar (Enabled,Disabled,Forbidden) declares per-scenario baseline default.Forbiddenrejects explicitinclude_baseline=True.Migration
The constructor
include_baselinekwarg and the implicit baseline-injection behavior for_get_atomic_attacks_asyncoverrides both emit aDeprecationWarningand continue to work until 0.16.0. Override authors should emit baseline themselves viaself._build_baseline_atomic_attack(seed_groups=...)using the same seeds passed to strategy attacks.Testing
7657 unit tests pass. Ruff and pre-commit clean. New regression coverage asserts exactly one
random.samplecall per run and verifies baseline objectives match strategy objectives undermax_dataset_size, validated by a bug-simulation experiment.