Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion DOCUMENTATION_INDEX.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ Package root: `src/conclave/` (installed as the `conclave` package; console scri
| Gemini adapter | [`src/conclave/adapters/gemini.py`](src/conclave/adapters/gemini.py) | `GeminiAdapter` — native `generateContent`, OpenAI-role mapping, `usageMetadata`. |
| Registry | [`src/conclave/registry.py`](src/conclave/registry.py) | Friendly-name → model-id defaults; provider → env-var mapping; key **presence** logic (never values). |
| Config | [`src/conclave/config.py`](src/conclave/config.py) | Loads/merges `~/.conclave/config.yml` over defaults; resolves model ids and named/CSV councils; parses the `endpoints:` section (custom OpenAI-compatible providers). |
| Models | [`src/conclave/models.py`](src/conclave/models.py) | Pydantic result contract: `TokenUsage`, `ModelAnswer`, `StreamEvent`, `DebateRound`, `AdversarialResult`, `CouncilResult` (`mode`/`rounds`/`adversarial`). Stable downstream surface. |
| Models | [`src/conclave/models.py`](src/conclave/models.py) | Pydantic result contract: `TokenUsage`, `ModelAnswer`, `StreamEvent`, `DebateRound`, `AdversarialResult`, `CouncilResult` (`mode`/`rounds`/`adversarial`/`synthesis_error`/`prompt_version`). Stable downstream surface. |
| CLI | [`src/conclave/cli.py`](src/conclave/cli.py) | `conclave ask` (synthesize/raw/debate/adversarial; `--rounds`/`--proposer`/`--stream`) + `conclave providers`; rich panels, live `--stream` output, and `--json`; never prints key values. |
| Logging | [`src/conclave/logging.py`](src/conclave/logging.py) | Logger factory; stderr; verbosity via `CONCLAVE_LOG_LEVEL` (default `WARNING`). |

Expand All @@ -57,6 +57,7 @@ Package root: `src/conclave/` (installed as the `conclave` package; console scri
| File | Path | Covers |
|------|------|--------|
| Council tests | [`tests/test_council.py`](tests/test_council.py) | Fan-out, partial failure, synthesis behavior. |
| Synthesizer tests | [`tests/test_synthesizer.py`](tests/test_synthesizer.py) | Pins the synthesizer/judge contract: default + configurable (arg/config/CLI `--synthesizer`) selection; observable degradation (unkeyed/failed → `synthesis_error`/`verdict_error`, never silent) for synthesize, debate, and the adversarial judge; versioned synthesis prompt (`SYNTHESIS_PROMPT_VERSION` + `result.prompt_version`) with prompt-text + version pins. |
| Modes tests | [`tests/test_modes.py`](tests/test_modes.py) | Debate multi-round flow, mid-round drop-out, peer anonymization; adversarial proposer/critic/verdict, proposal/critic failure paths, no-key judge, sync wrappers. |
| Adapter tests | [`tests/test_adapters.py`](tests/test_adapters.py) | Per-adapter `build_request` + `parse_response` for openai-compat/anthropic/gemini: system-hoist, max_tokens, role mapping, usage parsing, empty/malformed/error-status raises. |
| Provider highway tests | [`tests/test_providers.py`](tests/test_providers.py) | `resolve_adapter` (built-in prefixes, per-provider URLs, custom endpoints, unknown-prefix raise), end-to-end `call_model`, and `redact()` (bearer/`sk-`/env-var-value/`x-api-key` scrubbing; pre-redacted provider errors). |
Expand Down Expand Up @@ -91,6 +92,7 @@ Run: `pytest` (config in `pyproject.toml`, `asyncio_mode = "auto"`).

| Date | Change |
|------|--------|
| 2026-06-14 | Documented + tested synthesizer behavior (v1.0 readiness must-do #5): README "Synthesizer behavior" section (selection precedence, observable degradation, versioned prompt); synthesis prompt set now versioned via `conclave.prompts.SYNTHESIS_PROMPT_VERSION`, stamped onto every `CouncilResult.prompt_version`; confirmed (not silent) degradation across synthesize/debate/adversarial-judge paths; new `tests/test_synthesizer.py` (21 tests). No non-synthesis behavior changed. |
| 2026-06-09 | Roadmap features shipped: adversarial proposer resilience (#9), optional result cache (#6), debate convergence early-stop (#4), 4 first-class providers groq/deepseek/mistral/together (#5), streaming for synthesize/raw (#7); tests 121→191. #8 local-server-mode spike evaluated (no-go on HTTP). Doc sync: System Context diagram now shows all 9 providers; PDD §12 resolved questions archived to `docs/archive/pdd-resolved-questions-2026-06-09.md` (PDD back under 500 lines); `config.example.yml` stale "LiteLLM" comment fixed. |
| 2026-06-08 | v0.3.0 version bump; CI foundation (Actions matrix, ruff, coverage floor, gitleaks, branch protection); redact() custom-endpoint key-leak fix (#14); status_error consolidation + conditional temperature (#16/#22); provider-metadata single-source + import-time drift guard + config memoization (#19/#15); CLI exit-code contract + httpx client lifecycle (#17/#20); transport/cli/logging test backfill (#18); public release + community files. |
| 2026-06-08 | PDD §11 repositioned vs. new direct peers (`llm-council-core`, `the-llm-council`); §12 Q1/Q3/Q4/Q5 resolved. Index Tests table updated for the PR #2 split (`test_adapters.py`, `test_providers.py`). |
Expand Down
39 changes: 39 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -199,6 +199,45 @@ print("VERDICT:\n", adv.adversarial.verdict) # also mirrored to adv.synthesis
critiques populate `answers` and the verdict mirrors into `synthesis` — so code
written against the v0.1 surface keeps working across every mode.

## Synthesizer behavior

The synthesizer is the single model that merges the council's answers (and is the
**judge** in `adversarial` mode and the final consolidator in `debate`). It is
chosen by this precedence, highest first:

1. the `synthesizer=` argument to `Council` (CLI: `--synthesizer/-s`);
2. the `synthesizer:` key in `~/.conclave/config.yml`;
3. the built-in default — **`claude`** (`anthropic/claude-sonnet-4-6`).

```bash
conclave ask "..." --council grok,gemini --synthesizer openai # override per run
```

**Degradation is observable, never silent.** Synthesis is skipped — and the
reason is always surfaced on the result — in three cases:

| Situation | What happens |
|---|---|
| No usable member answers (all errored/skipped) | `synthesis = None`, `synthesis_error = "no successful member answers…"` |
| Synthesizer has no API key | `synthesis = None`, `synthesis_error = "…has no API key; returning raw answers only"`; member answers preserved |
| Synthesizer call fails | `synthesis = None`, `synthesis_error =` the provider error |

In every case the member answers are returned intact and a warning is logged, so
a caller can reliably detect a non-synthesis with
`result.synthesis is None and result.synthesis_error is not None`. There is **no
path** where concatenated or partial output is silently returned as if it were a
synthesis. In `adversarial` mode the same signal lands on
`adversarial.verdict_error` (mirrored to `synthesis_error`).

**The synthesis prompt is a versioned constant.** The synthesize-mode system
prompt is fixed in code (not built per call); the debate/judge prompts live in
`conclave.prompts`. The whole prompt set carries a version tag,
`conclave.prompts.SYNTHESIS_PROMPT_VERSION`, stamped onto **every**
`CouncilResult` as `result.prompt_version`. A downstream eval or regression suite
can compare it across runs to detect that the synthesis wording changed, instead
of silently attributing the shift to model drift. The test suite pins both the
prompt text and the version, so changing one without the other fails CI.

## Config (optional)

Create `~/.conclave/config.yml` to add models, define named councils, and set a
Expand Down
78 changes: 77 additions & 1 deletion src/conclave/council.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,45 @@
The deliberation modes (``debate``, ``adversarial``) live in :mod:`conclave.modes`
and reuse this class's :meth:`Council.fan_out` primitive so the partial-failure
handling is written exactly once.

Synthesizer selection and degradation (the "council" value prop)
----------------------------------------------------------------

**Which model synthesizes.** Synthesis is performed by one *synthesizer* model,
separate from the council members (though a member may also be the synthesizer).
Selection precedence, highest first:

1. the ``synthesizer=`` argument to :class:`Council` (the CLI ``--synthesizer/-s``
flag wires straight through to this);
2. the ``synthesizer:`` key in ``~/.conclave/config.yml``;
3. the built-in default :data:`conclave.registry.DEFAULT_SYNTHESIZER` (``"claude"``,
i.e. ``anthropic/claude-sonnet-4-6``).

The same model is the **judge** in ``adversarial`` mode and the final
consolidator in ``debate`` mode -- one selection drives all three.

**The fallback / degraded path is OBSERVABLE, never silent.** Synthesis can fail
to run for three reasons, and each one is signaled on the result rather than
silently swallowed:

* *No usable member answers* (every member errored/skipped) -- nothing to merge;
* *The synthesizer has no API key* in the environment;
* *The synthesizer call itself fails* (provider error/timeout).

In all three cases ``CouncilResult.synthesis`` stays ``None``, the member answers
are still returned intact, a warning is logged, and an actionable reason is set
on ``CouncilResult.synthesis_error`` (in ``adversarial`` mode the analogous
``AdversarialResult.verdict_error``, mirrored to ``synthesis_error``). A caller
can therefore always tell synthesis did **not** happen as expected by checking
``synthesis is None and synthesis_error is not None`` -- there is no path where
the council quietly returns concatenated/partial output dressed up as a synthesis.

**The synthesis prompt is a versioned constant.** The synthesize-mode system
prompt is :data:`_SYNTH_SYSTEM` (the debate/judge prompts live in
:mod:`conclave.prompts`); the prompt *set* carries the version tag
:data:`conclave.prompts.SYNTHESIS_PROMPT_VERSION`, stamped onto every
:class:`~conclave.models.CouncilResult` as ``prompt_version`` so a prompt change
is detectable downstream instead of being silently absorbed as model drift.
"""

from __future__ import annotations
Expand All @@ -20,6 +59,7 @@
from .config import ConclaveConfig, load_config
from .logging import get_logger
from .models import CouncilResult, ModelAnswer, StreamEvent
from .prompts import SYNTHESIS_PROMPT_VERSION
from .providers import call_model
from .registry import key_present

Expand All @@ -30,6 +70,14 @@
# per member while sharing Council.fan_out's concurrency + partial-failure code.
MessagesFor = Callable[[str, str], list[dict[str, str]]]

# The synthesize-mode system prompt. It is a stable module constant -- never
# built per-call -- so the wording the council synthesizes under is auditable and
# diffable. Any change to it (or to the debate/judge prompts in
# :mod:`conclave.prompts`) MUST be paired with a bump of
# :data:`conclave.prompts.SYNTHESIS_PROMPT_VERSION`, which is stamped onto every
# :class:`~conclave.models.CouncilResult` as ``prompt_version`` so a downstream
# eval can detect the change rather than silently absorb it. ``test_synthesizer``
# pins both this text and the version, so editing one without the other fails CI.
_SYNTH_SYSTEM = (
"You are the synthesizer of a council of AI models. You are given the same "
"user prompt that was posed to several models, plus each model's answer. "
Expand All @@ -38,6 +86,9 @@
"Do not invent a model's position; rely only on the answers provided."
)

# Re-exported for callers that want the version without importing prompts.
__all__ = ["Council", "SYNTHESIS_PROMPT_VERSION"]


class Council:
"""A council of foundation models with an optional synthesizer.
Expand Down Expand Up @@ -358,7 +409,32 @@ def _replay_cached(result: CouncilResult) -> list[StreamEvent]:
return events

async def _synthesize(self, result: CouncilResult) -> None:
"""Run the synthesizer over the successful answers, mutating ``result``."""
"""Run the synthesizer over the successful answers, mutating ``result``.

This is the buffered (non-streaming) synthesize path; the streaming
counterpart :func:`conclave.streaming._stream_synthesis` mirrors it
short-circuit for short-circuit. The synthesizer model is
``self.synthesizer`` (resolved per the precedence documented in the module
docstring: constructor arg, else config, else the ``"claude"`` default).

Every degraded outcome is made observable on ``result`` -- none is
silent. On success ``result.synthesis`` holds the merged answer; on any
of the three short-circuits ``result.synthesis`` stays ``None`` and
``result.synthesis_error`` carries the reason:

* **no usable answers** -- every member failed/was skipped, so there is
nothing to merge;
* **synthesizer unkeyed** -- ``self.synthesizer``'s API key is absent, so
the raw member answers are returned with an explanatory error;
* **synthesizer call failed** -- the synthesizer provider errored, and its
error text is surfaced verbatim.

The synthesizer identity (``synthesizer`` / ``synthesizer_model_id``) is
recorded on ``result`` before the key check so a consumer can see *which*
model was selected even when it could not run. The prompt used is the
versioned :data:`_SYNTH_SYSTEM`; the version tag already lives on
``result.prompt_version``.
"""
usable = result.successful_answers
if not usable:
result.synthesis_error = "no successful member answers to synthesize"
Expand Down
21 changes: 21 additions & 0 deletions src/conclave/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,19 @@
from pydantic import BaseModel, Field


def _default_prompt_version() -> str:
"""Resolve the current synthesis-prompt version without an import cycle.

``conclave.prompts`` imports this module, so importing it at module load
would be circular. The import is deferred into this factory (run only when a
``CouncilResult`` is constructed, by which point both modules are loaded), so
every result defaults to the live :data:`conclave.prompts.SYNTHESIS_PROMPT_VERSION`.
"""
from .prompts import SYNTHESIS_PROMPT_VERSION

return SYNTHESIS_PROMPT_VERSION


class TokenUsage(BaseModel):
"""Token accounting for a single model call."""

Expand Down Expand Up @@ -164,6 +177,13 @@ class CouncilResult(BaseModel):
convergence_score: The convergence score (0.0--1.0) of the round that
triggered an early stop, or ``None`` when no early stop occurred.
Higher means more stable round-over-round (more converged).
prompt_version: The version tag of the synthesizer/judge prompt set used
for this run (:data:`conclave.prompts.SYNTHESIS_PROMPT_VERSION`).
Stamped on **every** result regardless of mode or whether synthesis
actually ran, so a downstream eval/regression suite can detect that
the synthesis prompt wording changed between two runs instead of
silently attributing the shift to model drift. Opaque string; only
equality is meaningful.
"""

prompt: str
Expand All @@ -179,6 +199,7 @@ class CouncilResult(BaseModel):
cached: bool = False
converged: bool = False
convergence_score: float | None = None
prompt_version: str = Field(default_factory=_default_prompt_version)

@property
def successful_answers(self) -> list[ModelAnswer]:
Expand Down
11 changes: 11 additions & 0 deletions src/conclave/prompts.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,17 @@

from .models import ModelAnswer

# Version identifier for the synthesis/judge prompt *set*. Bump this string
# whenever ANY synthesizer-facing prompt changes -- the synthesize-mode system
# prompt (``conclave.council._SYNTH_SYSTEM``), the debate consolidation prompt
# (:data:`DEBATE_FINAL_SYSTEM`), or the adversarial judge prompt
# (:data:`JUDGE_SYSTEM`). It is surfaced on :class:`conclave.models.CouncilResult`
# (the ``prompt_version`` field) so a downstream eval or regression suite can
# detect that the wording the synthesis was produced under has shifted, rather
# than silently absorbing a prompt change as a quality regression. The value is
# opaque (a date-stamped tag); only equality/inequality is meaningful.
SYNTHESIS_PROMPT_VERSION = "2026-06-14"

# Stable position-based labels used to anonymize peers in debate rounds 2..N.
LETTERS = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"

Expand Down
24 changes: 20 additions & 4 deletions tests/test_logging.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,21 @@
from conclave.logging import get_logger


def _own_handlers(logger: logging.Logger) -> list[logging.Handler]:
"""Return only the handlers conclave installs, excluding pytest's capture ones.

``get_logger`` installs exactly one plain ``logging.StreamHandler`` on the
``conclave`` root. Because that logger sets ``propagate = False``, pytest's
log-capture machinery attaches its own handlers (``LogCaptureHandler``, a
*subclass* of ``StreamHandler``) directly to it during a run -- the count of
which varies by pytest version. Selecting by exact type (``type(h) is
StreamHandler``) counts conclave's handler alone and ignores any injected
capture handler, so the one-shot-configuration assertions stay precise and
robust across pytest versions (pytest 9.x attaches more than older lines did).
"""
return [h for h in logger.handlers if type(h) is logging.StreamHandler]


@pytest.fixture
def fresh_logging(monkeypatch):
"""Reset the one-shot logger config so a fresh get_logger() reconfigures.
Expand Down Expand Up @@ -54,8 +69,9 @@ def test_default_level_is_warning_when_env_unset(fresh_logging, monkeypatch):
assert logger.name == "conclave"
assert logger.level == logging.WARNING
assert logger.propagate is False
assert len(logger.handlers) == 1
assert isinstance(logger.handlers[0], logging.StreamHandler)
own = _own_handlers(logger)
assert len(own) == 1
assert isinstance(own[0], logging.StreamHandler)


def test_env_var_sets_level_case_insensitively(fresh_logging, monkeypatch):
Expand Down Expand Up @@ -95,13 +111,13 @@ def test_configuration_happens_once(fresh_logging, monkeypatch):
monkeypatch.setenv("CONCLAVE_LOG_LEVEL", "ERROR")

first = get_logger()
assert len(first.handlers) == 1
assert len(_own_handlers(first)) == 1
assert logging_mod._CONFIGURED is True

# Changing the env now must have no effect -- the guard short-circuits.
monkeypatch.setenv("CONCLAVE_LOG_LEVEL", "DEBUG")
second = get_logger()

assert second is first
assert len(second.handlers) == 1 # not duplicated
assert len(_own_handlers(second)) == 1 # not duplicated
assert second.level == logging.ERROR # unchanged from first config
Loading
Loading