RFC: Migrate quirks out of zigpy and provide direct entity access#762
RFC: Migrate quirks out of zigpy and provide direct entity access#762puddly wants to merge 7 commits into
Conversation
|
Here's the zigpy side: zigpy/zigpy@dev...puddly:puddly/decouple-zigpy I wouldn't recommend it but you can run ZHA with these three library branches in tandem. Nothing is visibly broken 😅 |
zigpy-review-bot
left a comment
There was a problem hiding this comment.
RFC engagement — click to expand (architectural notes: inversion direction, three matching mechanisms, public/internal API line, dispatch duplication)
Engaging with this as an RFC rather than a code review — happy this is being explored. The user-facing API (a Device subclass + DeviceMatch + Replace tuple + discover_entities override) reads well; reaching for it from the Hue native-control case shows it can carry weight that v2's declarative builder can't.
A few architectural things to push on before promoting from draft:
1. The inversion (and what the title promises) needs the companion zigpy branch to land further
The PR title is "Migrate quirks out of zigpy." The diff on https://github.com/puddly/zigpy/tree/puddly/decouple-zigpy doesn't actually remove anything from zigpy — it adds a _device_resolver callable (plus register_device_resolver and a device_resolver= kwarg to ControllerApplication.new) and routes _finalize_device / appdb._load_attributes through it. The legacy zigpy.quirks.get_device and the v1/v2 registry are unchanged, and zha.zigbee.device.resolve_device still falls back to zigpy.quirks.get_device(...) for unmatched devices.
In the meantime ZHA gains a duplicate API surface that has to be kept in sync with zigpy's: zha.application.EntityType / EntityPlatform shadowing zigpy.quirks.v2.EntityType / EntityPlatform, full BinarySensor/Sensor/Number device-class enums copied into zha/application/platforms/*/device_class.py (~950 LOC), and unit constants in zha.units expanded to mirror more of homeassistant.const. The new tools/compare_constants.py exists specifically to catch drift between ZHA's copies and HA's canonical definitions — useful, but the fact it's needed is a signal we're maintaining three copies (HA Core, zigpy quirks v2, ZHA), not eliminating one.
What would actually deliver on the title: a corresponding zigpy PR that removes zigpy.quirks.v2.homeassistant.* (the enums/units now mirrored in ZHA), and ideally moves EntityMetadata/QuirksV2RegistryEntry toward their final home. Tests in this PR still from zigpy.quirks.v2.homeassistant import EntityPlatform, EntityType; production code still does from zigpy.quirks.v2 import EntityMetadata, ZCLEnumMetadata, … in 10 places across zha/. For the inversion to be real, those should be importing from zha.* and the zigpy module should be the one missing.
Worth pinning down in the RFC: is the target end-state "zigpy knows nothing about HA semantics, ZHA owns all of quirks-v2," or just "ZHA can register a resolver"? They're very different scopes.
2. Three matching mechanisms now (CustomDevice / QuirkBuilder / @register_device) — deprecation story?
The PR body says the functional builder remains for simple cases and class-based is for complex devices, but the cut is fuzzy: a QuirkBuilder with one .replaces() and one .switch() and a @register_device Device subclass with one Replace operation produce almost the same surface. The Hue example shows what's uniquely possible (subscribing to manufacturer cluster commands inside the entity, applying multi-attribute frames atomically) — but that uniqueness comes from subclassing Light and overriding _async_turn_on_impl, not from the Device subclass itself. A QuirkBuilder that could attach a custom entity class via something like .add_entity_class(HueNativeLight, on=HueLightCluster) would cover the same use case without a third matching path.
Open questions worth answering before this stabilizes:
- Is
CustomDevicegoing away on a timeline? It's described as "Not recommended" in the PR body but isn't touched here. - Will existing v2 quirks migrate, or just coexist?
zha-device-handlershas a lot of v2 quirks, and a migration script vs. a coexistence story have very different costs. - If
QuirkBuilderis "inverted to be a functional API built on top of this one" (PR body), what does that desugaring look like? That's the answer that makes "three becomes two becomes one" feel achievable rather than aspirational.
3. Device.discover_entities as a public override point — stability cost
Quirks now subclass the ZHA Device and override discover_entities, with direct access to endpoint.all_cluster_handlers, Light.__init__'s positional kwargs, self.cluster_handlers[…], maybe_emit_state_changed_event, etc. That freezes a lot of ZHA-internal shape that's churned recently — #657 just reworked cluster handlers / entity attributes, and #716/#763 are actively touching _is_entity_removed_by_quirk and adjacent surfaces. Once a non-trivial number of quirks reach into Light internals (or any other entity class), refactors like #657 become breaking changes for an out-of-tree package.
The Hue example reaches into:
endpoint.all_cluster_handlers(adictkeyed by"{ep_id}:0x{cluster_id:04x}"strings — a format choice that's an implementation detail right now)self.cluster_handlers[HueLightCluster.ep_attribute](also a dict; key shape differs)Light._state/Light._brightness/Light._xy_color/Light._color_temp/Light._effect(underscored attrs, currently free to rename)self._on_off_cluster_handler(presumably ZHA'sOnOffcluster handler — name shape is informal)- positional
cluster_handlers=,endpoint=,device=onLight.__init__
Which of these are intended as public quirks API and which were stable-by-accident? A small Device API doc — or even just # Public: / # Internal: annotations on the methods/attrs quirks may touch — would let ZHA refactor freely below the line while keeping the contract intact above it. Otherwise the next #657-scale refactor lands as "please update your quirks" tickets in zha-device-handlers.
The prevent_default_entity_creation path (added in #716, refined in #763) and the in-iterator if entity.PLATFORM == Platform.LIGHT: continue filter in the Hue example do overlapping work. Worth clarifying which path quirks-v3 is supposed to use, since the example chose the manual filter.
4. Subclass dispatch happens at two layers — easy to drift
Both resolve_device (zigpy-time, applies _operations) and Device.new (ZHA-time, picks subclass) iterate DEVICE_QUIRKS and call cls.matches(zigpy_device). They could disagree if matching ever depended on mutable state (e.g. firmware version changed between the two calls). Caching the matched class on the zigpy device in resolve_device and reading it in Device.new would remove that possibility — and the matching cost.
Related, CoordinatorDevice is chosen by ieee equality in Device.new but never registered via @register_device, so it doesn't go through resolve_device (no operations to apply — fine) but the asymmetry between "hardcoded subclass for coordinator" and "registry-driven subclass for everything else" is worth a comment.
Companion-branches reality check
For the record: I checked out the puddly/decouple-zigpy branches on both zigpy and zha-device-handlers. The zigpy side is one WIP commit (the resolver hook, no quirks-API removal). The quirks side is one WIP commit adding only zhaquirks/philips/{bifrost,hue_native}.py (the example from the PR body). The trio is consistent if all three are pinned together. "Nothing visibly broken" reads roughly right for the happy path; current CI failures (_FakeApp mocks missing _device_resolver) are just version skew against unbumped zigpy and will resolve once that lands.
Roadmap context
OpenHomeFoundation/roadmap#93 names "improve APIs to make adding new devices simpler" as in-scope. This PR is a credible early step there — the API ergonomics for a complex device end up much closer to ad-hoc Python than the current v2 builder lets you get. The Tuya-IR-module use case the PR mentions is exactly where v2's declarative model breaks down, and a class-based shape is the right shape for that.
Small things noted but not blockers (RFC stage)
Quickly, since this is RFC and ergonomics matter more than polish right now: Replace.apply() pops the existing cluster before constructing the replacement (loses the original on construction failure); resolve_device logs at warning level on every matched join (should be debug once de-experimentalized); the broad except Exception in _discover_new_entities will silently swallow real construction bugs and could use the entity / endpoint / cluster handler in the log message; pyproject.toml is now zha-quirks>=1.2.0, which until the first quirks release that imports ZHA will pull in versions that don't actually use the new API. None of these change the design discussion.
What would let me sign off later: clarity on items 1 (target end-state for zigpy) and 2 (deprecation story for the three mechanisms), and a small thinking-out-loud doc on item 3 (which Device/Light surfaces are public quirks API). The rest is concrete code asks that flow naturally once those are settled.
| for op in cls._operations: | ||
| op.apply(zigpy_device) | ||
| return zigpy_device | ||
| return zigpy.quirks.get_device(zigpy_device) |
There was a problem hiding this comment.
The fallback to zigpy.quirks.get_device(zigpy_device) keeps the v1/v2 registry load-bearing — the resolver layers on top rather than replacing it. That's a reasonable migration step, but the PR title "Migrate quirks out of zigpy" reads as if zigpy's quirks API were going away, and this line says it isn't. Worth being explicit in the RFC about the expected timeline for zigpy.quirks.get_device to lose this role and the legacy registry to be moved (or pruned) entirely.
| firmware_version_allow_missing: bool = True | ||
|
|
||
|
|
||
| DEVICE_QUIRKS: list[type[Device]] = [] |
There was a problem hiding this comment.
Architectural: subclass dispatch is duplicated between resolve_device (here in spirit — applies _operations at zigpy time) and Device.new (instantiates the right subclass at ZHA time). Both walk DEVICE_QUIRKS independently. Consider caching the matched class on the zigpy device in resolve_device and reusing it in Device.new — removes the possibility of disagreement if matching ever grows mutable-state dependencies, and halves the matching cost on join.
Also: no way to unregister or replace a registration. If the long-running HA process ever hot-reloads quirks (rare but not impossible), repeated registration of the same subclass will duplicate matches. An idempotence guard or a dict keyed by class qualname would harden against that.
| """Yield entities for this device.""" | ||
| # TODO: purge old coordinator entities | ||
| if self.is_coordinator: | ||
| return |
There was a problem hiding this comment.
Subtle interaction: this if self.is_coordinator: return guards the base Device.discover_entities, but CoordinatorDevice.discover_entities (added at the end of this file) overrides it and yields counters unconditionally without checking is_active_coordinator. The prior discover_coordinator_device_entities only ran for active coordinator devices. Device.new currently picks CoordinatorDevice for any ieee equal to gateway.state.node_info.ieee — which is the active coordinator in practice, but the explicit name and the prior is_active_coordinator check made the intent clearer. Worth either gating on is_active_coordinator in CoordinatorDevice.discover_entities or documenting that gateway.state.node_info.ieee is canonically the active coordinator's ieee.
| NUMBER = "number" | ||
| SENSOR = "sensor" | ||
| SELECT = "select" | ||
| SWITCH = "switch" |
There was a problem hiding this comment.
EntityPlatform and EntityType are defined here, but tests still from zigpy.quirks.v2.homeassistant import EntityPlatform, EntityType (e.g. tests/test_sensor.py:18, tests/test_device.py:19) and production from zigpy.quirks.v2 import EntityMetadata, ZCLEnumMetadata, … in ~10 places. Coexistence during migration is fine; flagging so the cleanup pass doesn't get lost when the companion zigpy PR drops the zigpy copies. The goal-state is presumably that zha.application.EntityPlatform is the only definition and zigpy.quirks.v2.homeassistant is gone.
This is a test PR for inverting the quirks dependency chain. Currently, zigpy is responsible for defining the quirks API, ZHA defines the base entity models and ZCL autodiscovery, and quirks use the zigpy APIs to provide ZHA context on how to create custom entities. This PR inverts it: all quirks constants and definitions will move out of zigpy and into ZHA, quirks will import ZHA itself, and ZHA will unpin quirks as an
==dependency.Currently, we have a few APIs for matching devices for the purposes of modifying their clusters at runtime:
CustomDevice: this matches the full device signature exactly and provides a way to replace the device object directly. Not recommended.QuirkBuilder: matches by model and manufacturer, plus a firmware version filter. Also aggregates small operations for modifying clusters and endpoints but also provides a simplified API for exposing entities derived from ZCL attributes.@register_device: it sort of aggregates the two by having the ZHADevicesubclass "be" theQuirkBuilder. We'd need an API to also allow replacing the zigpyDeviceobject as well. This has been yanked out of zigpy by creating a "device resolver" callable that ZHA can provide. It replaceszigpy.quirks.get_deviceby allowing ZHA to itself provide a replacement device, avoiding the zigpy quirks registry entirely.As a test implementation, I've added support for the manufacturer-specific Hue lighting frame format in a fork of quirks. It look something like this (100% slop that manages to run):
I think we can greatly minimize the Python code required to implement custom entities with a better API. This has a few benefits:
This is part of / a requirement for: