Skip to content

Add policyengine.derivations for per-variable computation explanations#365

Open
MaxGhenis wants to merge 2 commits into
mainfrom
derivations-narrative
Open

Add policyengine.derivations for per-variable computation explanations#365
MaxGhenis wants to merge 2 commits into
mainfrom
derivations-narrative

Conversation

@MaxGhenis
Copy link
Copy Markdown
Contributor

Summary

Adds a new policyengine.derivations module so any caller can ask "how did PolicyEngine arrive at this value?" for a single household and get either a structured object or a plain-prose walkthrough.

from policyengine.derivations import derive, narrate

derivation = derive(simulation, "income_tax_before_credits", 2026)
print(derivation.value)                       # 3220.0
print(derivation.trace_text(max_depth=4))     # indented dependency tree
print(derivation.top_level_contributions())   # [(name, value), ...]
narrative = narrate(derivation, country="us", household_summary="TX, joint")

What's in it

  • Derivation and TraceNode — frozen dataclasses for the pruned computation tree (deterministic).
  • derive(simulation, variable, period) — runs the calculation with simulation.trace = True, captures the tree, returns a Derivation.
  • top_level_contributions(derivation) — the immediate dependencies of the root, useful for "breakdown by component" tables.
  • narrate(derivation, ...) / narrate_async(...) — optional LLM narration via LiteLLM. Lazy-imports litellm so the deterministic path has no LLM dependency.

Why this is upstream

policybench needed per-cell "PolicyEngine derivation" panels in its prediction-detail modal. The deterministic primitives are general — calculators, dashboards, and papers can use them too — so they belong in policyengine.py rather than being baked into a downstream consumer. policybench will be refactored to consume this module.

Test plan

  • Hermetic tests for is_zero_value, zero-subtree pruning, and narrate/narrate_async (with a fake LiteLLM client). The PolicyEngine-US integration tests in tests/test_derivations.py exercise derive against a real one-adult household.
  • ruff check + ruff format clean.

🤖 Generated with Claude Code

…ions

For any variable on any Simulation, derive(simulation, variable, period)
returns a Derivation with the pruned dependency tree and the same scalar
the simulation would have returned. The Derivation can be:

- rendered as indented text (trace_text())
- walked programmatically (TraceNode is a stable, frozen dataclass)
- summarized by the deterministic top_level_contributions() helper
- handed to narrate(derivation) for an optional LLM-generated narrative

The narration path is in its own submodule with a lazy LiteLLM import, so
importing policyengine.derivations costs nothing if you only want the
deterministic structured tree.

Motivation: policybench needed per-cell "how PolicyEngine derived this
value" walkthroughs for its leaderboard's prediction-detail modal. The
deterministic primitives are useful for any caller wanting "explain this
result" surfaces (calculators, dashboards, papers), so they belong here
rather than baked into a downstream consumer.
@MaxGhenis
Copy link
Copy Markdown
Contributor Author

CI failure (Test 3.13/3.14) is a pre-existing upstream issue: policyengine_core's new check_computation_modes rejects policyengine_us.self_employment_income because it uses both adds and uprating. The same failure reproduces on a stale main checkout because the test workflow installs the latest published policyengine (which transitively pulls the broken combo), and it is independent of this branch's changes — the new code only adds the derivations module.

The hermetic unit tests for derivations pass; the integration test that exercises derive on a real US Simulation also passes locally with a working policyengine_us install.

The previous implementation took the first element of OpenFisca's
vectorised result for every node, which silently dropped every other
entity's contribution. For a joint household with $45k self-employment
income (head) and $40k wages (spouse), the narrative would report
"the household's only income is $45,000 of self-employment income"
because irs_gross_income's [45000, 40000] array was truncated to
[45000].

Switch _capture to:
- collapse length-1 arrays to a scalar (the common case for tax-unit /
  household variables),
- preserve multi-entity arrays as tuples (numeric or boolean).

Update _format_value to render numeric tuples as
``sum (per entity: a, b, ...)`` so summarisers see both the total and
the per-person decomposition; boolean tuples render as ``[True, False]``.
Update is_zero_value to recurse into tuples (every entry must be zero).

Add tests covering the multi-entity render and zero check.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant