Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
a21dbbb
MAINT BREAK: Move include_baseline from Scenario constructor to initi…
May 8, 2026
7b2f327
adding tests for additional classVar
May 9, 2026
fe9cf55
doc updates
May 9, 2026
0411f86
addressing doc accuracy comments
adrian-gavrila May 11, 2026
56927d9
Adding deprecation, and cleaner enum instead of 2 variable control
adrian-gavrila May 11, 2026
73e6731
Merge branch 'main' into adrian-gavrila/include-baseline-on-initializ…
adrian-gavrila May 12, 2026
b364135
Merge branch 'main' into adrian-gavrila/include-baseline-on-initializ…
adrian-gavrila May 12, 2026
38cf8ab
Rename BaselinePolicy to BaselineDefaultPolicy and add subclass guidance
adrian-gavrila May 13, 2026
c1cd978
Apply pre-commit notebook cleanup
adrian-gavrila May 13, 2026
3b551b9
Emit baseline from base _get_atomic_attacks_async via helper
adrian-gavrila May 13, 2026
bac9d36
Migrate scenario overrides to emit baseline from _get_atomic_attacks_…
adrian-gavrila May 13, 2026
de7d257
Rename BaselineDefaultPolicy to BaselinePolicy and BASELINE_DEFAULT_P…
adrian-gavrila May 13, 2026
7db5f76
Delete legacy _get_baseline and post-hoc baseline insertion
adrian-gavrila May 13, 2026
9677a38
Document override responsibility for baseline emission
adrian-gavrila May 13, 2026
d412ab5
Add baseline uniformity regression tests and helper unit tests
adrian-gavrila May 13, 2026
f16bf17
Document BASELINE_POLICY and baseline emission contract in scenario i…
adrian-gavrila May 13, 2026
1c41a79
Apply pre-commit auto-fixes (ruff format and unused import removal)
adrian-gavrila May 13, 2026
f3c2095
Add deprecation rescue for overrides that don't emit baseline (until …
adrian-gavrila May 13, 2026
768ffe5
Normalize deprecation markers (drop v prefix, fix rescue text)
adrian-gavrila May 13, 2026
c9058c8
Merge remote-tracking branch 'origin/main' into adrian-gavrila/includ…
adrian-gavrila May 13, 2026
3e58bb5
Trim deprecation rescue old_item to fit line-length
adrian-gavrila May 13, 2026
0286320
Address PR review: flip Jailbreak/Psychosocial defaults, fix rescue, …
adrian-gavrila May 13, 2026
dcef627
Merge branch 'main' into adrian-gavrila/include-baseline-on-initializ…
adrian-gavrila May 13, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 8 additions & 2 deletions .github/instructions/scenarios.instructions.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,15 @@ Scenarios orchestrate multi-attack security testing campaigns. Each scenario gro
All scenarios inherit from `Scenario` (ABC) and must:

1. **Define `VERSION`** as a class constant (increment on breaking changes)
2. **Implement three abstract methods:**
2. **Optionally declare `BASELINE_POLICY`** (defaults to `BaselinePolicy.Enabled` — a baseline `PromptSendingAttack` is prepended and callers can opt out per run via `initialize_async(include_baseline=False)`):
- `BaselinePolicy.Disabled` — baseline supported but off by default (e.g. `Jailbreak`, where templates dominate the run).
- `BaselinePolicy.Forbidden` — baseline is meaningless for this scenario's comparison axis (e.g. `AdversarialBenchmark`, which compares against gold-standard answers). Explicit `include_baseline=True` raises `ValueError`.
3. **Implement three abstract methods:**

```python
class MyScenario(Scenario):
VERSION: int = 1
BASELINE_POLICY: ClassVar[BaselinePolicy] = BaselinePolicy.Enabled

@classmethod
def get_strategy_class(cls) -> type[ScenarioStrategy]:
Expand All @@ -30,7 +34,7 @@ class MyScenario(Scenario):
return DatasetConfiguration(dataset_names=["my_dataset"])
```

3. **Optionally override `_get_atomic_attacks_async()`** — the base class provides a default
4. **Optionally override `_get_atomic_attacks_async()`** — the base class provides a default
that uses the factory/registry pattern (see "AtomicAttack Construction" below).
Only override if your scenario needs custom attack construction logic.

Expand Down Expand Up @@ -154,6 +158,8 @@ The default implementation:
Only override when the scenario **cannot** use the factory/registry pattern — e.g., scenarios
with custom composite logic, per-strategy converter stacks, or non-standard attack construction.

Overrides that want baseline support must emit it themselves by calling `self._build_baseline_atomic_attack(seed_groups=...)` with the same seeds used for the strategy attacks and prepending the result. The base implementation emits baseline automatically; passing freshly resolved seeds reintroduces ADO 9012 (baseline-vs-strategy population divergence under `max_dataset_size`).

### Manual AtomicAttack construction (for overrides):

```python
Expand Down
113 changes: 70 additions & 43 deletions doc/code/scenarios/0_scenarios.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,6 @@
" - `version`: Integer version number\n",
" - `strategy_class`: The strategy enum class for this scenario\n",
" - `objective_scorer_identifier`: Identifier dict for the scoring mechanism (optional)\n",
" - `include_default_baseline`: Whether to include a baseline attack (default: True)\n",
" - `scenario_result_id`: Optional ID to resume an existing scenario (optional)\n",
"\n",
"5. **Initialization**: Call `await scenario.initialize_async()` to populate atomic attacks:\n",
Expand All @@ -83,6 +82,8 @@
" - `max_concurrency`: Number of concurrent operations (default: 1)\n",
" - `max_retries`: Number of retry attempts on failure (default: 0)\n",
" - `memory_labels`: Optional labels for tracking (optional)\n",
" - `include_baseline`: Whether to prepend a baseline attack (defaults to the scenario type's\n",
" `BASELINE_POLICY`; most scenarios default it on, `Jailbreak` defaults it off)\n",
"\n",
"### Example Structure\n",
"\n",
Expand All @@ -101,9 +102,15 @@
"name": "stdout",
"output_type": "stream",
"text": [
"Found default environment files: ['./.pyrit/.env', './.pyrit/.env.local']\n",
"Loaded environment file: ./.pyrit/.env\n",
"Loaded environment file: ./.pyrit/.env.local\n"
"Found default environment files: ['./.pyrit/.env']\n",
"Loaded environment file: ./.pyrit/.env\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"No new upgrade operations detected.\n"
]
}
],
Expand Down Expand Up @@ -193,34 +200,13 @@
"name": "stdout",
"output_type": "stream",
"text": [
"Found default environment files: ['./.pyrit/.env', './.pyrit/.env.local']\n",
"Loading default configuration file: ./.pyrit/.pyrit_conf\n",
"Found default environment files: ['./.pyrit/.env']\n",
"Loaded environment file: ./.pyrit/.env\n",
"Loaded environment file: ./.pyrit/.env.local\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Available Scenarios:\n",
"================================================================================\n",
"\u001b[1m\u001b[36m\n",
" airt.content_harms\u001b[0m\n",
" Class: ContentHarms\n",
" Description:\n",
" Content Harms Scenario implementation for PyRIT. This scenario contains\n",
" various harm-based checks that you can run to get a quick idea about\n",
" model behavior with respect to certain harm categories.\n",
" Aggregate Strategies:\n",
" - all\n",
" Available Strategies (7):\n",
" hate, fairness, violence, sexual, harassment, misinformation, leakage\n",
" Default Strategy: all\n",
" Default Datasets (7, max 4 per dataset):\n",
" airt_hate, airt_fairness, airt_violence, airt_sexual, airt_harassment,\n",
" airt_misinformation, airt_leakage\n",
"\u001b[1m\u001b[36m\n",
" airt.cyber\u001b[0m\n",
" Class: Cyber\n",
" Description:\n",
Expand All @@ -229,9 +215,9 @@
" Cyber class contains different variations of the malware generation\n",
" techniques.\n",
" Aggregate Strategies:\n",
" - all\n",
" - all, single_turn, multi_turn\n",
" Available Strategies (2):\n",
" single_turn, multi_turn\n",
" prompt_sending, red_teaming\n",
" Default Strategy: all\n",
" Default Datasets (1, max 4 per dataset):\n",
" airt_malware\n",
Expand All @@ -256,14 +242,14 @@
" Description:\n",
" Leakage scenario implementation for PyRIT. This scenario tests how\n",
" susceptible models are to leaking training data, PII, intellectual\n",
" property, or other confidential information. The Leakage class\n",
" contains different attack variations designed to extract sensitive\n",
" information from models.\n",
" property, or other confidential information. Uses the registry/factory\n",
" pattern to construct attack techniques.\n",
" Aggregate Strategies:\n",
" - all, single_turn, multi_turn, ip, sensitive_data\n",
" Available Strategies (4):\n",
" first_letter, image, role_play, crescendo\n",
" Default Strategy: all\n",
" - all, default, single_turn, multi_turn\n",
" Available Strategies (9):\n",
" prompt_sending, role_play, many_shot, tap, crescendo_simulated,\n",
" red_teaming, context_compliance, first_letter, image\n",
" Default Strategy: default\n",
" Default Datasets (1, max 4 per dataset):\n",
" airt_leakage\n",
"\u001b[1m\u001b[36m\n",
Expand Down Expand Up @@ -296,6 +282,21 @@
" Default Datasets (1, max 4 per dataset):\n",
" airt_imminent_crisis\n",
"\u001b[1m\u001b[36m\n",
" airt.rapid_response\u001b[0m\n",
" Class: RapidResponse\n",
" Description:\n",
" Rapid Response scenario for content-harms testing. Tests model behavior\n",
" across multiple harm categories using selectable attack techniques.\n",
" Aggregate Strategies:\n",
" - all, default, single_turn, multi_turn\n",
" Available Strategies (7):\n",
" prompt_sending, role_play, many_shot, tap, crescendo_simulated,\n",
" red_teaming, context_compliance\n",
" Default Strategy: default\n",
" Default Datasets (7, max 4 per dataset):\n",
" airt_hate, airt_fairness, airt_violence, airt_sexual, airt_harassment,\n",
" airt_misinformation, airt_leakage\n",
"\u001b[1m\u001b[36m\n",
" airt.scam\u001b[0m\n",
" Class: Scam\n",
" Description:\n",
Expand All @@ -309,6 +310,21 @@
" Default Strategy: all\n",
" Default Datasets (1, max 4 per dataset):\n",
" airt_scams\n",
" Supported Parameters:\n",
" - max_turns (int) [default: 5]: Maximum conversation turns for the persuasive_rta strategy.\n",
"\u001b[1m\u001b[36m\n",
" benchmark.adversarial\u001b[0m\n",
" Class: AdversarialBenchmark\n",
" Description:\n",
" Benchmarking scenario that compares the attack success rate (ASR) of\n",
" several different adversarial models.\n",
" Aggregate Strategies:\n",
" - all, default, single_turn, multi_turn, light\n",
" Available Strategies (4):\n",
" role_play, tap, red_teaming, context_compliance\n",
" Default Strategy: light\n",
" Default Datasets (1, max 8 per dataset):\n",
" harmbench\n",
"\u001b[1m\u001b[36m\n",
" foundry.red_team_agent\u001b[0m\n",
" Class: RedTeamAgent\n",
Expand Down Expand Up @@ -359,7 +375,7 @@
"\n",
"================================================================================\n",
"\n",
"Total scenarios: 8\n"
"Total scenarios: 9\n"
]
},
{
Expand Down Expand Up @@ -389,11 +405,22 @@
"\n",
"Every scenario can optionally include a **baseline attack** — a `PromptSendingAttack` that sends\n",
"each objective directly to the target without any converters or multi-turn techniques. This is\n",
"controlled by the `include_default_baseline` parameter (default: `True` for most scenarios).\n",
"\n",
"To run *only* the baseline (no attack strategies), create a `RedTeamAgent` with\n",
"`include_baseline=True` (the default) and pass `scenario_strategies=None`. See\n",
"[Common Scenario Parameters](./1_common_scenario_parameters.ipynb) for a working example."
"controlled by the `include_baseline` parameter on `initialize_async`; when omitted, each\n",
"scenario falls back to its own `BASELINE_POLICY` class attribute (most scenarios default\n",
"it on; `Jailbreak` defaults it off). See\n",
"[Common Scenario Parameters](./1_common_scenario_parameters.ipynb) for a worked example.\n",
"\n",
"Custom scenarios should choose their `BASELINE_POLICY` based on whether an unmodified\n",
"prompt is a meaningful comparator for the scenario's strategies:\n",
"\n",
"- **`Enabled`** — the baseline is prepended by default and the caller can opt out. Use when an\n",
" unmodified-prompt run is a meaningful comparison point (most scenarios).\n",
"- **`Disabled`** — the baseline is supported but omitted by default; the caller must opt in. Use\n",
" when the scenario is already dominated by a large set of templates/strategies that already\n",
" exercise the unmodified surface (e.g., `Jailbreak`).\n",
"- **`Forbidden`** — the baseline is unavailable and passing `include_baseline=True` raises. Use\n",
" when the scenario's semantics make a single-shot unmodified prompt meaningless as a comparator\n",
" (e.g., benchmarks comparing across adversarial models, or multi-turn-only scenarios)."
]
},
{
Expand Down Expand Up @@ -436,7 +463,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.15"
"version": "3.12.13"
}
},
"nbformat": 4,
Expand Down
24 changes: 18 additions & 6 deletions doc/code/scenarios/0_scenarios.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,6 @@
# - `version`: Integer version number
# - `strategy_class`: The strategy enum class for this scenario
# - `objective_scorer_identifier`: Identifier dict for the scoring mechanism (optional)
# - `include_default_baseline`: Whether to include a baseline attack (default: True)
# - `scenario_result_id`: Optional ID to resume an existing scenario (optional)
#
# 5. **Initialization**: Call `await scenario.initialize_async()` to populate atomic attacks:
Expand All @@ -85,6 +84,8 @@
# - `max_concurrency`: Number of concurrent operations (default: 1)
# - `max_retries`: Number of retry attempts on failure (default: 0)
# - `memory_labels`: Optional labels for tracking (optional)
# - `include_baseline`: Whether to prepend a baseline attack (defaults to the scenario type's
# `BASELINE_POLICY`; most scenarios default it on, `Jailbreak` defaults it off)
#
# ### Example Structure
#
Expand Down Expand Up @@ -174,11 +175,22 @@ def _build_display_group(self, *, technique_name: str, seed_group_name: str) ->
#
# Every scenario can optionally include a **baseline attack** — a `PromptSendingAttack` that sends
# each objective directly to the target without any converters or multi-turn techniques. This is
# controlled by the `include_default_baseline` parameter (default: `True` for most scenarios).
#
# To run *only* the baseline (no attack strategies), create a `RedTeamAgent` with
# `include_baseline=True` (the default) and pass `scenario_strategies=None`. See
# [Common Scenario Parameters](./1_common_scenario_parameters.ipynb) for a working example.
# controlled by the `include_baseline` parameter on `initialize_async`; when omitted, each
# scenario falls back to its own `BASELINE_POLICY` class attribute (most scenarios default
# it on; `Jailbreak` defaults it off). See
# [Common Scenario Parameters](./1_common_scenario_parameters.ipynb) for a worked example.
#
# Custom scenarios should choose their `BASELINE_POLICY` based on whether an unmodified
# prompt is a meaningful comparator for the scenario's strategies:
#
# - **`Enabled`** — the baseline is prepended by default and the caller can opt out. Use when an
# unmodified-prompt run is a meaningful comparison point (most scenarios).
# - **`Disabled`** — the baseline is supported but omitted by default; the caller must opt in. Use
# when the scenario is already dominated by a large set of templates/strategies that already
# exercise the unmodified surface (e.g., `Jailbreak`).
# - **`Forbidden`** — the baseline is unavailable and passing `include_baseline=True` raises. Use
# when the scenario's semantics make a single-shot unmodified prompt meaningless as a comparator
# (e.g., benchmarks comparing across adversarial models, or multi-turn-only scenarios).

# %% [markdown]
#
Expand Down
Loading
Loading