Skip to content

MAINT: Move include_baseline from Scenario constructor to initi…#1700

Merged
adrian-gavrila merged 23 commits into
microsoft:mainfrom
adrian-gavrila:adrian-gavrila/include-baseline-on-initialize-async
May 13, 2026
Merged

MAINT: Move include_baseline from Scenario constructor to initi…#1700
adrian-gavrila merged 23 commits into
microsoft:mainfrom
adrian-gavrila:adrian-gavrila/include-baseline-on-initialize-async

Conversation

@adrian-gavrila
Copy link
Copy Markdown
Contributor

@adrian-gavrila adrian-gavrila commented May 8, 2026

Description

Two related changes to scenario baseline handling.

Part 1: Move include_baseline to initialize_async. Drops include_default_baseline from Scenario.__init__ and every subclass constructor. Adds include_baseline: bool | None = None as a runtime kwarg on Scenario.initialize_async so baseline can be overridden per run from config, CLI, or frontend without rebuilding the scenario.

Part 2: Structural fix for ADO 9012. Folds the fix from #1697 into this PR per @rlundeen2's review there. Replaces the cache approach with a structural one. Baseline emission moves inside _get_atomic_attacks_async so exactly one seed-group resolution call happens per run, and baseline and strategies share the same sampled population by construction. Closes #1697 on merge.

Class-level controls

BaselinePolicy ClassVar (Enabled, Disabled, Forbidden) declares per-scenario baseline default. Forbidden rejects explicit include_baseline=True.

Migration

The constructor include_baseline kwarg and the implicit baseline-injection behavior for _get_atomic_attacks_async overrides both emit a DeprecationWarning and continue to work until 0.16.0. Override authors should emit baseline themselves via self._build_baseline_atomic_attack(seed_groups=...) using the same seeds passed to strategy attacks.

Testing

7657 unit tests pass. Ruff and pre-commit clean. New regression coverage asserts exactly one random.sample call per run and verifies baseline objectives match strategy objectives under max_dataset_size, validated by a bug-simulation experiment.

@adrian-gavrila adrian-gavrila force-pushed the adrian-gavrila/include-baseline-on-initialize-async branch from 0d921bb to 206f1b4 Compare May 8, 2026 21:18
…alize_async

Treats include_baseline like every other common runtime parameter on
initialize_async. Subclasses control behavior via two ClassVar flags:
SUPPORTS_DEFAULT_BASELINE (capability) and DEFAULT_INCLUDE_BASELINE
(default when caller doesn't specify).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@adrian-gavrila adrian-gavrila force-pushed the adrian-gavrila/include-baseline-on-initialize-async branch from 206f1b4 to a21dbbb Compare May 8, 2026 21:30
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors baseline handling for scenarios by moving include_baseline from Scenario/subclass constructors to a runtime, per-run keyword-only argument on Scenario.initialize_async. This aligns baseline inclusion with other runtime initialization inputs (target, strategies, datasets, concurrency, etc.) and adds class-level capability/default controls.

Changes:

  • Removes include_default_baseline from Scenario.__init__ and all scenario subclass constructors; adds include_baseline: bool | None = None to Scenario.initialize_async.
  • Introduces Scenario.SUPPORTS_DEFAULT_BASELINE and Scenario.DEFAULT_INCLUDE_BASELINE to control baseline capability and default inclusion per scenario type.
  • Updates unit/integration tests and scenario documentation/notebooks to reflect the new initialization-time baseline configuration.

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tests/unit/scenario/test_scenario.py Updates test scaffolding to use class-level baseline capability flags and removes constructor baseline args.
tests/unit/scenario/test_scenario_retry.py Updates retry tests’ concrete scenario fixture to disable default baseline via class flag.
tests/unit/scenario/test_scenario_partial_results.py Updates partial-results scenario fixture to disable default baseline via class flag.
tests/unit/scenario/test_scenario_parameters.py Updates parameter tests to disable baseline via class flag and removes constructor baseline arg.
tests/unit/scenario/test_leakage_scenario.py Adjusts Leakage baseline-related test to align with the new class-level baseline controls.
tests/unit/scenario/test_jailbreak.py Adds coverage for Jailbreak’s baseline-default-off behavior and explicit override via initialize_async.
tests/unit/scenario/test_foundry.py Moves RedTeamAgent baseline disabling from constructor to initialize_async in tests.
tests/unit/scenario/test_adversarial.py Adds baseline capability/validation tests for AdversarialBenchmark (baseline forbidden).
tests/integration/datasets/test_seed_dataset_provider_integration.py Updates integration test to pass include_baseline at initialize time.
pyrit/scenario/scenarios/garak/encoding.py Removes constructor baseline parameter and stops forwarding baseline into Scenario.init.
pyrit/scenario/scenarios/foundry/red_team_agent.py Removes constructor baseline parameter; widens initialize_async signature to accept include_baseline and forwards it to base.
pyrit/scenario/scenarios/benchmark/adversarial.py Marks AdversarialBenchmark as not supporting the default baseline and removes constructor baseline wiring.
pyrit/scenario/scenarios/airt/scam.py Removes constructor baseline parameter and stops forwarding baseline into Scenario.init.
pyrit/scenario/scenarios/airt/psychosocial.py Marks Psychosocial as not supporting the default baseline; removes constructor baseline wiring.
pyrit/scenario/scenarios/airt/leakage.py Removes constructor baseline wiring (falls back to base defaults).
pyrit/scenario/scenarios/airt/jailbreak.py Sets Jailbreak’s default baseline inclusion to off; removes constructor baseline parameter/wiring.
pyrit/scenario/scenarios/airt/cyber.py Removes constructor baseline parameter and stops forwarding baseline into Scenario.init.
pyrit/scenario/core/scenario.py Implements class-level baseline capability/default flags and resolves baseline inclusion inside initialize_async.
doc/code/scenarios/1_common_scenario_parameters.py Updates examples/instructions to configure baseline via initialize_async instead of constructors.
doc/code/scenarios/1_common_scenario_parameters.ipynb Regenerated notebook reflecting baseline configuration via initialize_async.
doc/code/scenarios/0_scenarios.py Updates baseline parameter documentation to reflect initialize_async-based configuration.
doc/code/scenarios/0_scenarios.ipynb Regenerated notebook reflecting baseline parameter documentation changes.

Comment thread pyrit/scenario/core/scenario.py Outdated
Comment thread pyrit/scenario/core/scenario.py Outdated
Comment thread tests/unit/scenario/test_adversarial.py Outdated
Comment thread tests/unit/scenario/test_adversarial.py Outdated
Comment thread doc/code/scenarios/0_scenarios.py Outdated
Comment thread pyrit/scenario/scenarios/foundry/red_team_agent.py
Comment thread pyrit/scenario/core/scenario.py
Comment thread pyrit/scenario/core/scenario.py Outdated
Comment thread pyrit/scenario/core/scenario.py Outdated
Comment thread pyrit/scenario/core/scenario.py
@adrian-gavrila adrian-gavrila changed the title MAINT BREAK: Move include_baseline from Scenario constructor to initi… MAINT: Move include_baseline from Scenario constructor to initi… May 12, 2026
Comment thread pyrit/scenario/scenarios/airt/leakage.py
Comment thread doc/code/scenarios/0_scenarios.py Outdated
Comment thread pyrit/scenario/core/scenario.py
Comment thread pyrit/scenario/core/scenario.py
@rlundeen2
Copy link
Copy Markdown
Contributor

This looks good, but holistically you may want to update the baseline behavior here as well

adrian-gavrila and others added 6 commits May 13, 2026 12:17
…async

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…OLICY to BASELINE_POLICY

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
adrian-gavrila and others added 6 commits May 13, 2026 15:21
…nstructions

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…v0.16.0)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comment thread pyrit/scenario/scenarios/airt/psychosocial.py Outdated
Comment thread pyrit/scenario/scenarios/airt/jailbreak.py Outdated
Comment thread pyrit/scenario/core/scenario.py
Copy link
Copy Markdown
Contributor

@rlundeen2 rlundeen2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deprecation paths are heavy, but I'm not too concerned since we're removing them. Here are some minor inconsistencies copilot found

  1. Leakage and Psychosocial don't have the deprecated include_baseline constructor kwarg, while every other
    subclass does. Leakage silently uses BaselinePolicy.Enabled (the default) — so its constructor is already "migrated"
    to the new style. Fine, but it's inconsistent with the other scenarios which all carry the deprecation shim. If a
    user was passing include_baseline=True to Leakage(), they'll get a TypeError instead of a deprecation warning.
  2. Encoding._get_atomic_attacks_async (line 245-246) passes self._resolved_seed_groups or [] to
    _build_baseline_atomic_attack. The or [] fallback is defensive but suspicious — if _resolved_seed_groups is None at
    that point, it means _resolve_seed_groups() (line 241) returned something truthy but then got cleared, which can't
    happen. The or [] masks a deeper invariant violation. Same pattern in Scam (line 287) and Jailbreak (line 328).
  3. initialize_async line 692 (the rescue path): when baseline is needed but the override didn't emit it, the rescue
    re-calls self._dataset_config.get_all_seed_attack_groups(). Under max_dataset_size, this triggers a second
    random.sample — exactly the ADO 9012 bug this PR is supposed to fix. The rescue path itself is vulnerable to the
    very bug it's working around. This is only for legacy/unmigrated overrides, and it's deprecated, but it's ironic and
    worth noting.

adrian-gavrila and others added 2 commits May 13, 2026 18:47
…add Psychosocial deprecation shim

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@adrian-gavrila adrian-gavrila enabled auto-merge May 13, 2026 22:54
@adrian-gavrila adrian-gavrila added this pull request to the merge queue May 13, 2026
Merged via the queue into microsoft:main with commit a9c93dc May 13, 2026
48 checks passed
@adrian-gavrila adrian-gavrila deleted the adrian-gavrila/include-baseline-on-initialize-async branch May 13, 2026 23:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants