Add policyengine.derivations for per-variable computation explanations by MaxGhenis · Pull Request #365 · PolicyEngine/policyengine.py

MaxGhenis · 2026-05-17T19:29:29Z

Summary

Adds a new policyengine.derivations module so any caller can ask "how did PolicyEngine arrive at this value?" for a single household and get either a structured object or a plain-prose walkthrough.

from policyengine.derivations import derive, narrate

derivation = derive(simulation, "income_tax_before_credits", 2026)
print(derivation.value)                       # 3220.0
print(derivation.trace_text(max_depth=4))     # indented dependency tree
print(derivation.top_level_contributions())   # [(name, value), ...]
narrative = narrate(derivation, country="us", household_summary="TX, joint")

What's in it

Derivation and TraceNode — frozen dataclasses for the pruned computation tree (deterministic).
derive(simulation, variable, period) — runs the calculation with simulation.trace = True, captures the tree, returns a Derivation.
top_level_contributions(derivation) — the immediate dependencies of the root, useful for "breakdown by component" tables.
narrate(derivation, ...) / narrate_async(...) — optional LLM narration via LiteLLM. Lazy-imports litellm so the deterministic path has no LLM dependency.

Why this is upstream

policybench needed per-cell "PolicyEngine derivation" panels in its prediction-detail modal. The deterministic primitives are general — calculators, dashboards, and papers can use them too — so they belong in policyengine.py rather than being baked into a downstream consumer. policybench will be refactored to consume this module.

Test plan

Hermetic tests for is_zero_value, zero-subtree pruning, and narrate/narrate_async (with a fake LiteLLM client). The PolicyEngine-US integration tests in tests/test_derivations.py exercise derive against a real one-adult household.
ruff check + ruff format clean.

🤖 Generated with Claude Code

…ions For any variable on any Simulation, derive(simulation, variable, period) returns a Derivation with the pruned dependency tree and the same scalar the simulation would have returned. The Derivation can be: - rendered as indented text (trace_text()) - walked programmatically (TraceNode is a stable, frozen dataclass) - summarized by the deterministic top_level_contributions() helper - handed to narrate(derivation) for an optional LLM-generated narrative The narration path is in its own submodule with a lazy LiteLLM import, so importing policyengine.derivations costs nothing if you only want the deterministic structured tree. Motivation: policybench needed per-cell "how PolicyEngine derived this value" walkthroughs for its leaderboard's prediction-detail modal. The deterministic primitives are useful for any caller wanting "explain this result" surfaces (calculators, dashboards, papers), so they belong here rather than baked into a downstream consumer.

MaxGhenis · 2026-05-17T19:33:48Z

CI failure (Test 3.13/3.14) is a pre-existing upstream issue: policyengine_core's new check_computation_modes rejects policyengine_us.self_employment_income because it uses both adds and uprating. The same failure reproduces on a stale main checkout because the test workflow installs the latest published policyengine (which transitively pulls the broken combo), and it is independent of this branch's changes — the new code only adds the derivations module.

The hermetic unit tests for derivations pass; the integration test that exercises derive on a real US Simulation also passes locally with a working policyengine_us install.

The previous implementation took the first element of OpenFisca's vectorised result for every node, which silently dropped every other entity's contribution. For a joint household with $45k self-employment income (head) and $40k wages (spouse), the narrative would report "the household's only income is $45,000 of self-employment income" because irs_gross_income's [45000, 40000] array was truncated to [45000]. Switch _capture to: - collapse length-1 arrays to a scalar (the common case for tax-unit / household variables), - preserve multi-entity arrays as tuples (numeric or boolean). Update _format_value to render numeric tuples as ``sum (per entity: a, b, ...)`` so summarisers see both the total and the per-person decomposition; boolean tuples render as ``[True, False]``. Update is_zero_value to recurse into tuples (every entry must be zero). Add tests covering the multi-entity render and zero check.

MaxGhenis mentioned this pull request May 17, 2026

Fix multi-entity trace truncation in PE derivation narratives PolicyEngine/policybench#37

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add policyengine.derivations for per-variable computation explanations#365

Add policyengine.derivations for per-variable computation explanations#365
MaxGhenis wants to merge 2 commits into
mainfrom
derivations-narrative

MaxGhenis commented May 17, 2026

Uh oh!

MaxGhenis commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MaxGhenis commented May 17, 2026

Summary

What's in it

Why this is upstream

Test plan

Uh oh!

MaxGhenis commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant