microsoft · adrian-gavrila · May 13, 2026 · May 8, 2026 · May 9, 2026 · May 9, 2026
diff --git a/.github/instructions/scenarios.instructions.md b/.github/instructions/scenarios.instructions.md
@@ -11,11 +11,15 @@ Scenarios orchestrate multi-attack security testing campaigns. Each scenario gro
 All scenarios inherit from `Scenario` (ABC) and must:
 
 1. **Define `VERSION`** as a class constant (increment on breaking changes)
-2. **Implement three abstract methods:**
+2. **Optionally declare `BASELINE_POLICY`** (defaults to `BaselinePolicy.Enabled` — a baseline `PromptSendingAttack` is prepended and callers can opt out per run via `initialize_async(include_baseline=False)`):
+   - `BaselinePolicy.Disabled` — baseline supported but off by default (e.g. `Jailbreak`, where templates dominate the run).
+   - `BaselinePolicy.Forbidden` — baseline is meaningless for this scenario's comparison axis (e.g. `AdversarialBenchmark`, which compares against gold-standard answers). Explicit `include_baseline=True` raises `ValueError`.
+3. **Implement three abstract methods:**
 
 ```python
 class MyScenario(Scenario):
     VERSION: int = 1
+    BASELINE_POLICY: ClassVar[BaselinePolicy] = BaselinePolicy.Enabled
 
     @classmethod
     def get_strategy_class(cls) -> type[ScenarioStrategy]:
@@ -30,7 +34,7 @@ class MyScenario(Scenario):
         return DatasetConfiguration(dataset_names=["my_dataset"])
 ```
 
-3. **Optionally override `_get_atomic_attacks_async()`** — the base class provides a default
+4. **Optionally override `_get_atomic_attacks_async()`** — the base class provides a default
    that uses the factory/registry pattern (see "AtomicAttack Construction" below).
    Only override if your scenario needs custom attack construction logic.
 
@@ -154,6 +158,8 @@ The default implementation:
 Only override when the scenario **cannot** use the factory/registry pattern — e.g., scenarios
 with custom composite logic, per-strategy converter stacks, or non-standard attack construction.
 
+Overrides that want baseline support must emit it themselves by calling `self._build_baseline_atomic_attack(seed_groups=...)` with the same seeds used for the strategy attacks and prepending the result. The base implementation emits baseline automatically; passing freshly resolved seeds reintroduces ADO 9012 (baseline-vs-strategy population divergence under `max_dataset_size`).
+
 ### Manual AtomicAttack construction (for overrides):
 
 ```python

diff --git a/doc/code/scenarios/0_scenarios.ipynb b/doc/code/scenarios/0_scenarios.ipynb
@@ -74,7 +74,6 @@
     "   - `version`: Integer version number\n",
     "   - `strategy_class`: The strategy enum class for this scenario\n",
     "   - `objective_scorer_identifier`: Identifier dict for the scoring mechanism (optional)\n",
-    "   - `include_default_baseline`: Whether to include a baseline attack (default: True)\n",
     "   - `scenario_result_id`: Optional ID to resume an existing scenario (optional)\n",
     "\n",
     "5. **Initialization**: Call `await scenario.initialize_async()` to populate atomic attacks:\n",
@@ -83,6 +82,8 @@
     "   - `max_concurrency`: Number of concurrent operations (default: 1)\n",
     "   - `max_retries`: Number of retry attempts on failure (default: 0)\n",
     "   - `memory_labels`: Optional labels for tracking (optional)\n",
+    "   - `include_baseline`: Whether to prepend a baseline attack (defaults to the scenario type's\n",
+    "     `BASELINE_POLICY`; most scenarios default it on, `Jailbreak` defaults it off)\n",
     "\n",
     "### Example Structure\n",
     "\n",
@@ -101,9 +102,15 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "Found default environment files: ['./.pyrit/.env', './.pyrit/.env.local']\n",
-      "Loaded environment file: ./.pyrit/.env\n",
-      "Loaded environment file: ./.pyrit/.env.local\n"
+      "Found default environment files: ['./.pyrit/.env']\n",
+      "Loaded environment file: ./.pyrit/.env\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "No new upgrade operations detected.\n"
      ]
     }
    ],
@@ -193,34 +200,13 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "Found default environment files: ['./.pyrit/.env', './.pyrit/.env.local']\n",
+      "Loading default configuration file: ./.pyrit/.pyrit_conf\n",
+      "Found default environment files: ['./.pyrit/.env']\n",
       "Loaded environment file: ./.pyrit/.env\n",
-      "Loaded environment file: ./.pyrit/.env.local\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
       "\n",
       "Available Scenarios:\n",
       "================================================================================\n",
       "\u001b[1m\u001b[36m\n",
-      "  airt.content_harms\u001b[0m\n",
-      "    Class: ContentHarms\n",
-      "    Description:\n",
-      "      Content Harms Scenario implementation for PyRIT. This scenario contains\n",
-      "      various harm-based checks that you can run to get a quick idea about\n",
-      "      model behavior with respect to certain harm categories.\n",
-      "    Aggregate Strategies:\n",
-      "      - all\n",
-      "    Available Strategies (7):\n",
-      "      hate, fairness, violence, sexual, harassment, misinformation, leakage\n",
-      "    Default Strategy: all\n",
-      "    Default Datasets (7, max 4 per dataset):\n",
-      "      airt_hate, airt_fairness, airt_violence, airt_sexual, airt_harassment,\n",
-      "      airt_misinformation, airt_leakage\n",
-      "\u001b[1m\u001b[36m\n",
       "  airt.cyber\u001b[0m\n",
       "    Class: Cyber\n",
       "    Description:\n",
@@ -229,9 +215,9 @@
       "      Cyber class contains different variations of the malware generation\n",
       "      techniques.\n",
       "    Aggregate Strategies:\n",
-      "      - all\n",
+      "      - all, single_turn, multi_turn\n",
       "    Available Strategies (2):\n",
-      "      single_turn, multi_turn\n",
+      "      prompt_sending, red_teaming\n",
       "    Default Strategy: all\n",
       "    Default Datasets (1, max 4 per dataset):\n",
       "      airt_malware\n",
@@ -256,14 +242,14 @@
       "    Description:\n",
       "      Leakage scenario implementation for PyRIT. This scenario tests how\n",
       "      susceptible models are to leaking training data, PII, intellectual\n",
-      "      property, or other confidential information. The Leakage class\n",
-      "      contains different attack variations designed to extract sensitive\n",
-      "      information from models.\n",
+      "      property, or other confidential information. Uses the registry/factory\n",
+      "      pattern to construct attack techniques.\n",
       "    Aggregate Strategies:\n",
-      "      - all, single_turn, multi_turn, ip, sensitive_data\n",
-      "    Available Strategies (4):\n",
-      "      first_letter, image, role_play, crescendo\n",
-      "    Default Strategy: all\n",
+      "      - all, default, single_turn, multi_turn\n",
+      "    Available Strategies (9):\n",
+      "      prompt_sending, role_play, many_shot, tap, crescendo_simulated,\n",
+      "      red_teaming, context_compliance, first_letter, image\n",
+      "    Default Strategy: default\n",
       "    Default Datasets (1, max 4 per dataset):\n",
       "      airt_leakage\n",
       "\u001b[1m\u001b[36m\n",
@@ -296,6 +282,21 @@
       "    Default Datasets (1, max 4 per dataset):\n",
       "      airt_imminent_crisis\n",
       "\u001b[1m\u001b[36m\n",
+      "  airt.rapid_response\u001b[0m\n",
+      "    Class: RapidResponse\n",
+      "    Description:\n",
+      "      Rapid Response scenario for content-harms testing. Tests model behavior\n",
+      "      across multiple harm categories using selectable attack techniques.\n",
+      "    Aggregate Strategies:\n",
+      "      - all, default, single_turn, multi_turn\n",
+      "    Available Strategies (7):\n",
+      "      prompt_sending, role_play, many_shot, tap, crescendo_simulated,\n",
+      "      red_teaming, context_compliance\n",
+      "    Default Strategy: default\n",
+      "    Default Datasets (7, max 4 per dataset):\n",
+      "      airt_hate, airt_fairness, airt_violence, airt_sexual, airt_harassment,\n",
+      "      airt_misinformation, airt_leakage\n",
+      "\u001b[1m\u001b[36m\n",
       "  airt.scam\u001b[0m\n",
       "    Class: Scam\n",
       "    Description:\n",
@@ -309,6 +310,21 @@
       "    Default Strategy: all\n",
       "    Default Datasets (1, max 4 per dataset):\n",
       "      airt_scams\n",
+      "    Supported Parameters:\n",
+      "      - max_turns (int) [default: 5]: Maximum conversation turns for the persuasive_rta strategy.\n",
+      "\u001b[1m\u001b[36m\n",
+      "  benchmark.adversarial\u001b[0m\n",
+      "    Class: AdversarialBenchmark\n",
+      "    Description:\n",
+      "      Benchmarking scenario that compares the attack success rate (ASR) of\n",
+      "      several different adversarial models.\n",
+      "    Aggregate Strategies:\n",
+      "      - all, default, single_turn, multi_turn, light\n",
+      "    Available Strategies (4):\n",
+      "      role_play, tap, red_teaming, context_compliance\n",
+      "    Default Strategy: light\n",
+      "    Default Datasets (1, max 8 per dataset):\n",
+      "      harmbench\n",
       "\u001b[1m\u001b[36m\n",
       "  foundry.red_team_agent\u001b[0m\n",
       "    Class: RedTeamAgent\n",
@@ -359,7 +375,7 @@
       "\n",
       "================================================================================\n",
       "\n",
-      "Total scenarios: 8\n"
+      "Total scenarios: 9\n"
      ]
     },
     {
@@ -389,11 +405,22 @@
     "\n",
     "Every scenario can optionally include a **baseline attack** — a `PromptSendingAttack` that sends\n",
     "each objective directly to the target without any converters or multi-turn techniques. This is\n",
-    "controlled by the `include_default_baseline` parameter (default: `True` for most scenarios).\n",
-    "\n",
-    "To run *only* the baseline (no attack strategies), create a `RedTeamAgent` with\n",
-    "`include_baseline=True` (the default) and pass `scenario_strategies=None`. See\n",
-    "[Common Scenario Parameters](./1_common_scenario_parameters.ipynb) for a working example."
+    "controlled by the `include_baseline` parameter on `initialize_async`; when omitted, each\n",
+    "scenario falls back to its own `BASELINE_POLICY` class attribute (most scenarios default\n",
+    "it on; `Jailbreak` defaults it off). See\n",
+    "[Common Scenario Parameters](./1_common_scenario_parameters.ipynb) for a worked example.\n",
+    "\n",
+    "Custom scenarios should choose their `BASELINE_POLICY` based on whether an unmodified\n",
+    "prompt is a meaningful comparator for the scenario's strategies:\n",
+    "\n",
+    "- **`Enabled`** — the baseline is prepended by default and the caller can opt out. Use when an\n",
+    "  unmodified-prompt run is a meaningful comparison point (most scenarios).\n",
+    "- **`Disabled`** — the baseline is supported but omitted by default; the caller must opt in. Use\n",
+    "  when the scenario is already dominated by a large set of templates/strategies that already\n",
+    "  exercise the unmodified surface (e.g., `Jailbreak`).\n",
+    "- **`Forbidden`** — the baseline is unavailable and passing `include_baseline=True` raises. Use\n",
+    "  when the scenario's semantics make a single-shot unmodified prompt meaningless as a comparator\n",
+    "  (e.g., benchmarks comparing across adversarial models, or multi-turn-only scenarios)."
    ]
   },
   {
@@ -436,7 +463,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.11.15"
+   "version": "3.12.13"
   }
  },
  "nbformat": 4,

diff --git a/doc/code/scenarios/0_scenarios.py b/doc/code/scenarios/0_scenarios.py
@@ -76,7 +76,6 @@
 #    - `version`: Integer version number
 #    - `strategy_class`: The strategy enum class for this scenario
 #    - `objective_scorer_identifier`: Identifier dict for the scoring mechanism (optional)
-#    - `include_default_baseline`: Whether to include a baseline attack (default: True)
 #    - `scenario_result_id`: Optional ID to resume an existing scenario (optional)
 #
 # 5. **Initialization**: Call `await scenario.initialize_async()` to populate atomic attacks:
@@ -85,6 +84,8 @@
 #    - `max_concurrency`: Number of concurrent operations (default: 1)
 #    - `max_retries`: Number of retry attempts on failure (default: 0)
 #    - `memory_labels`: Optional labels for tracking (optional)
+#    - `include_baseline`: Whether to prepend a baseline attack (defaults to the scenario type's
+#      `BASELINE_POLICY`; most scenarios default it on, `Jailbreak` defaults it off)
 #
 # ### Example Structure
 #
@@ -174,11 +175,22 @@ def _build_display_group(self, *, technique_name: str, seed_group_name: str) ->
 #
 # Every scenario can optionally include a **baseline attack** — a `PromptSendingAttack` that sends
 # each objective directly to the target without any converters or multi-turn techniques. This is
-# controlled by the `include_default_baseline` parameter (default: `True` for most scenarios).
-#
-# To run *only* the baseline (no attack strategies), create a `RedTeamAgent` with
-# `include_baseline=True` (the default) and pass `scenario_strategies=None`. See
-# [Common Scenario Parameters](./1_common_scenario_parameters.ipynb) for a working example.
+# controlled by the `include_baseline` parameter on `initialize_async`; when omitted, each
+# scenario falls back to its own `BASELINE_POLICY` class attribute (most scenarios default
+# it on; `Jailbreak` defaults it off). See
+# [Common Scenario Parameters](./1_common_scenario_parameters.ipynb) for a worked example.
+#
+# Custom scenarios should choose their `BASELINE_POLICY` based on whether an unmodified
+# prompt is a meaningful comparator for the scenario's strategies:
+#
+# - **`Enabled`** — the baseline is prepended by default and the caller can opt out. Use when an
+#   unmodified-prompt run is a meaningful comparison point (most scenarios).
+# - **`Disabled`** — the baseline is supported but omitted by default; the caller must opt in. Use
+#   when the scenario is already dominated by a large set of templates/strategies that already
+#   exercise the unmodified surface (e.g., `Jailbreak`).
+# - **`Forbidden`** — the baseline is unavailable and passing `include_baseline=True` raises. Use
+#   when the scenario's semantics make a single-shot unmodified prompt meaningless as a comparator
+#   (e.g., benchmarks comparing across adversarial models, or multi-turn-only scenarios).
 
 # %% [markdown]
 #