Add policyengine.derivations for per-variable computation explanations#365
Open
MaxGhenis wants to merge 2 commits into
Open
Add policyengine.derivations for per-variable computation explanations#365MaxGhenis wants to merge 2 commits into
MaxGhenis wants to merge 2 commits into
Conversation
…ions For any variable on any Simulation, derive(simulation, variable, period) returns a Derivation with the pruned dependency tree and the same scalar the simulation would have returned. The Derivation can be: - rendered as indented text (trace_text()) - walked programmatically (TraceNode is a stable, frozen dataclass) - summarized by the deterministic top_level_contributions() helper - handed to narrate(derivation) for an optional LLM-generated narrative The narration path is in its own submodule with a lazy LiteLLM import, so importing policyengine.derivations costs nothing if you only want the deterministic structured tree. Motivation: policybench needed per-cell "how PolicyEngine derived this value" walkthroughs for its leaderboard's prediction-detail modal. The deterministic primitives are useful for any caller wanting "explain this result" surfaces (calculators, dashboards, papers), so they belong here rather than baked into a downstream consumer.
Contributor
Author
|
CI failure (Test 3.13/3.14) is a pre-existing upstream issue: The hermetic unit tests for |
The previous implementation took the first element of OpenFisca's vectorised result for every node, which silently dropped every other entity's contribution. For a joint household with $45k self-employment income (head) and $40k wages (spouse), the narrative would report "the household's only income is $45,000 of self-employment income" because irs_gross_income's [45000, 40000] array was truncated to [45000]. Switch _capture to: - collapse length-1 arrays to a scalar (the common case for tax-unit / household variables), - preserve multi-entity arrays as tuples (numeric or boolean). Update _format_value to render numeric tuples as ``sum (per entity: a, b, ...)`` so summarisers see both the total and the per-person decomposition; boolean tuples render as ``[True, False]``. Update is_zero_value to recurse into tuples (every entry must be zero). Add tests covering the multi-entity render and zero check.
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a new
policyengine.derivationsmodule so any caller can ask "how did PolicyEngine arrive at this value?" for a single household and get either a structured object or a plain-prose walkthrough.What's in it
DerivationandTraceNode— frozen dataclasses for the pruned computation tree (deterministic).derive(simulation, variable, period)— runs the calculation withsimulation.trace = True, captures the tree, returns aDerivation.top_level_contributions(derivation)— the immediate dependencies of the root, useful for "breakdown by component" tables.narrate(derivation, ...)/narrate_async(...)— optional LLM narration via LiteLLM. Lazy-importslitellmso the deterministic path has no LLM dependency.Why this is upstream
policybench needed per-cell "PolicyEngine derivation" panels in its prediction-detail modal. The deterministic primitives are general — calculators, dashboards, and papers can use them too — so they belong in
policyengine.pyrather than being baked into a downstream consumer. policybench will be refactored to consume this module.Test plan
is_zero_value, zero-subtree pruning, andnarrate/narrate_async(with a fake LiteLLM client). The PolicyEngine-US integration tests intests/test_derivations.pyexercisederiveagainst a real one-adult household.ruff check+ruff formatclean.🤖 Generated with Claude Code