Skip to content

RFC: Migrate quirks out of zigpy and provide direct entity access#762

Draft
puddly wants to merge 7 commits into
zigpy:devfrom
puddly:puddly/decouple-zigpy
Draft

RFC: Migrate quirks out of zigpy and provide direct entity access#762
puddly wants to merge 7 commits into
zigpy:devfrom
puddly:puddly/decouple-zigpy

Conversation

@puddly
Copy link
Copy Markdown
Contributor

@puddly puddly commented May 11, 2026

This is a test PR for inverting the quirks dependency chain. Currently, zigpy is responsible for defining the quirks API, ZHA defines the base entity models and ZCL autodiscovery, and quirks use the zigpy APIs to provide ZHA context on how to create custom entities. This PR inverts it: all quirks constants and definitions will move out of zigpy and into ZHA, quirks will import ZHA itself, and ZHA will unpin quirks as an == dependency.

Currently, we have a few APIs for matching devices for the purposes of modifying their clusters at runtime:

  1. CustomDevice: this matches the full device signature exactly and provides a way to replace the device object directly. Not recommended.
  2. QuirkBuilder: matches by model and manufacturer, plus a firmware version filter. Also aggregates small operations for modifying clusters and endpoints but also provides a simplified API for exposing entities derived from ZCL attributes.
  3. (proposed) @register_device: it sort of aggregates the two by having the ZHA Device subclass "be" the QuirkBuilder. We'd need an API to also allow replacing the zigpy Device object as well. This has been yanked out of zigpy by creating a "device resolver" callable that ZHA can provide. It replaces zigpy.quirks.get_device by allowing ZHA to itself provide a replacement device, avoiding the zigpy quirks registry entirely.

As a test implementation, I've added support for the manufacturer-specific Hue lighting frame format in a fork of quirks. It look something like this (100% slop that manages to run):

"""Philips Hue native control via cluster 0xFC03 — v3 device-level quirk."""

from __future__ import annotations

import asyncio
import logging
from typing import Any, Final

from zigpy.quirks import CustomCluster
import zigpy.types as t
from zigpy.zcl import ClusterType
from zigpy.zcl.clusters.general import LevelControl, OnOff
from zigpy.zcl.clusters.lighting import Color
from zigpy.zcl.foundation import BaseCommandDefs, ZCLCommandDef

from zha.application import Platform
from zha.application.platforms.light import Light
from zha.zigbee.device import Device, DeviceMatch, Replace, register_device

class HueNativeLight(Light):
    """Light driven by atomic Bifrost frames on cluster 0xFC03."""

    def __init__(self, *args: Any, **kwargs: Any) -> None:
        super().__init__(*args, **kwargs)
        self._hue_cluster_handler = self.cluster_handlers[HueLightCluster.ep_attribute]

    @property
    def effect_list(self) -> list[str] | None:
        return list(EFFECT_NAME_TO_HUE)

    def on_add(self) -> None:
        super().on_add()
        self._hue_cluster_handler.cluster.add_listener(self)

    def cluster_command(self, tsn: int, command_id: int, args: list[Any]) -> None:
        """Handle a server-issued cluster command on 0xFC03."""
        if command_id != HUE_NATIVE_COMMAND_ID:
            return
        payload: t.SerializableBytes = args[0]
        try:
            frame = decode(bytes(payload.serialize()))
        except ValueError:
            _LOGGER.exception("Failed to decode Hue native frame: %s", payload)
            return
        self._apply_frame(frame)

    def _apply_frame(self, frame: HueFrame) -> None:
        if frame.on_off is not None:
            self._state = frame.on_off
        if frame.brightness is not None:
            self._brightness = frame.brightness
        if frame.color_xy is not None:
            self._xy_color = frame.color_xy
        if frame.color_mirek is not None:
            self._color_temp = frame.color_mirek
        if frame.effect_type is not None:
            self._effect = HUE_TO_EFFECT_NAME.get(frame.effect_type, "none")
        self.maybe_emit_state_changed_event()

    async def _async_turn_on_impl(
        self,
        *,
        transition: float | None,
        brightness: int | None,
        effect: str | None,
        flash: Any,
        color_temp: int | None,
        xy_color: tuple[float, float] | None,
        duration: float,
        **_unused: Any,
    ) -> None:
        del flash, _unused  # Flash uses the Identify cluster, not Bifrost.
        frame = HueFrame(
            on_off=True,
            brightness=_clamp_brightness(brightness),
            color_mirek=color_temp,
            color_xy=xy_color,
            fade_speed=_seconds_to_fade_speed(transition or duration),
            effect_type=EFFECT_NAME_TO_HUE.get(effect) if effect else None,
        )
        await self._send_frame(frame)
        if effect and EFFECT_NAME_TO_HUE.get(effect) not in (None, HueEffect.NO_EFFECT):
            asyncio.get_running_loop().call_later(
                _POST_EFFECT_REFRESH_DELAY,
                lambda: asyncio.create_task(self._refresh_state()),
            )

    async def async_turn_off(self, *, transition: float | None = None) -> None:
        await self._send_frame(
            HueFrame(
                on_off=False,
                fade_speed=_seconds_to_fade_speed(transition),
            )
        )

    async def _send_frame(self, frame: HueFrame) -> None:
        payload = encode(frame)
        _LOGGER.debug("Hue native frame: %s", payload.hex())
        await self._hue_cluster_handler.cluster.hue_native(t.SerializableBytes(payload))

    async def _refresh_state(self) -> None:
        # OnOff/Level/Color report standard attributes; trigger a re-poll.
        await self._on_off_cluster_handler.cluster.read_attributes(
            ["on_off"], allow_cache=False
        )

@register_device
class HueLcx004(Device):
    """Hue Play gradient lightstrip (LCX004) — Bifrost-native control."""

    _device_match = DeviceMatch(
        manufacturers=frozenset({PHILIPS, SIGNIFY}),
        models=frozenset({"LCX004"}),
    )
    _operations = (
        Replace(
            endpoint_id=11,
            cluster_id=HUE_FC03_CLUSTER_ID,
            cluster_type=ClusterType.Server,
            replacement=HueLightCluster,
        ),
    )

    def discover_entities(self):
        for entity in super().discover_entities():
            # Skip existing Light entities
            if entity.PLATFORM == Platform.LIGHT:
                continue
            yield entity

        # Register new HueNativeLight entities
        for ep_id, endpoint in self.endpoints.items():
            if ep_id == 0:
                continue

            handlers = endpoint.all_cluster_handlers
            on_off = handlers.get(f"{ep_id}:0x{OnOff.cluster_id:04x}")
            fc03 = handlers.get(f"{ep_id}:0x{HUE_FC03_CLUSTER_ID:04x}")
            if on_off is None or fc03 is None:
                continue

            # Will be cleaned up once device handlers are removed
            cluster_handlers = [on_off, fc03]
            for opt_id in (LevelControl.cluster_id, Color.cluster_id):
                ch = handlers.get(f"{ep_id}:0x{opt_id:04x}")
                if ch is not None:
                    cluster_handlers.append(ch)

            yield HueNativeLight(
                cluster_handlers=cluster_handlers,
                endpoint=endpoint,
                device=self,
            )

I think we can greatly minimize the Python code required to implement custom entities with a better API. This has a few benefits:

  1. Complex devices can be supported via quirks directly. I originally planned this for a Tuya IR module and it would work the same.
  2. Quirks v2 can essentially be inverted to be a functional API built on top of this one. It would still be the preferred way to write quirks for simple devices. The only use case for this API is complex devices that have cross-entity interaction, hook into ZHA internals, etc.
  3. Zigpy can slowly shed the quirks APIs that (IMO) don't belong in it.

This is part of / a requirement for:

@puddly
Copy link
Copy Markdown
Contributor Author

puddly commented May 11, 2026

Here's the zigpy side: zigpy/zigpy@dev...puddly:puddly/decouple-zigpy
And quirks: zigpy/zha-device-handlers@dev...puddly:puddly/decouple-zigpy

I wouldn't recommend it but you can run ZHA with these three library branches in tandem. Nothing is visibly broken 😅

Copy link
Copy Markdown
Collaborator

@zigpy-review-bot zigpy-review-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RFC engagement — click to expand (architectural notes: inversion direction, three matching mechanisms, public/internal API line, dispatch duplication)

Engaging with this as an RFC rather than a code review — happy this is being explored. The user-facing API (a Device subclass + DeviceMatch + Replace tuple + discover_entities override) reads well; reaching for it from the Hue native-control case shows it can carry weight that v2's declarative builder can't.

A few architectural things to push on before promoting from draft:

1. The inversion (and what the title promises) needs the companion zigpy branch to land further

The PR title is "Migrate quirks out of zigpy." The diff on https://github.com/puddly/zigpy/tree/puddly/decouple-zigpy doesn't actually remove anything from zigpy — it adds a _device_resolver callable (plus register_device_resolver and a device_resolver= kwarg to ControllerApplication.new) and routes _finalize_device / appdb._load_attributes through it. The legacy zigpy.quirks.get_device and the v1/v2 registry are unchanged, and zha.zigbee.device.resolve_device still falls back to zigpy.quirks.get_device(...) for unmatched devices.

In the meantime ZHA gains a duplicate API surface that has to be kept in sync with zigpy's: zha.application.EntityType / EntityPlatform shadowing zigpy.quirks.v2.EntityType / EntityPlatform, full BinarySensor/Sensor/Number device-class enums copied into zha/application/platforms/*/device_class.py (~950 LOC), and unit constants in zha.units expanded to mirror more of homeassistant.const. The new tools/compare_constants.py exists specifically to catch drift between ZHA's copies and HA's canonical definitions — useful, but the fact it's needed is a signal we're maintaining three copies (HA Core, zigpy quirks v2, ZHA), not eliminating one.

What would actually deliver on the title: a corresponding zigpy PR that removes zigpy.quirks.v2.homeassistant.* (the enums/units now mirrored in ZHA), and ideally moves EntityMetadata/QuirksV2RegistryEntry toward their final home. Tests in this PR still from zigpy.quirks.v2.homeassistant import EntityPlatform, EntityType; production code still does from zigpy.quirks.v2 import EntityMetadata, ZCLEnumMetadata, … in 10 places across zha/. For the inversion to be real, those should be importing from zha.* and the zigpy module should be the one missing.

Worth pinning down in the RFC: is the target end-state "zigpy knows nothing about HA semantics, ZHA owns all of quirks-v2," or just "ZHA can register a resolver"? They're very different scopes.

2. Three matching mechanisms now (CustomDevice / QuirkBuilder / @register_device) — deprecation story?

The PR body says the functional builder remains for simple cases and class-based is for complex devices, but the cut is fuzzy: a QuirkBuilder with one .replaces() and one .switch() and a @register_device Device subclass with one Replace operation produce almost the same surface. The Hue example shows what's uniquely possible (subscribing to manufacturer cluster commands inside the entity, applying multi-attribute frames atomically) — but that uniqueness comes from subclassing Light and overriding _async_turn_on_impl, not from the Device subclass itself. A QuirkBuilder that could attach a custom entity class via something like .add_entity_class(HueNativeLight, on=HueLightCluster) would cover the same use case without a third matching path.

Open questions worth answering before this stabilizes:

  • Is CustomDevice going away on a timeline? It's described as "Not recommended" in the PR body but isn't touched here.
  • Will existing v2 quirks migrate, or just coexist? zha-device-handlers has a lot of v2 quirks, and a migration script vs. a coexistence story have very different costs.
  • If QuirkBuilder is "inverted to be a functional API built on top of this one" (PR body), what does that desugaring look like? That's the answer that makes "three becomes two becomes one" feel achievable rather than aspirational.

3. Device.discover_entities as a public override point — stability cost

Quirks now subclass the ZHA Device and override discover_entities, with direct access to endpoint.all_cluster_handlers, Light.__init__'s positional kwargs, self.cluster_handlers[…], maybe_emit_state_changed_event, etc. That freezes a lot of ZHA-internal shape that's churned recently — #657 just reworked cluster handlers / entity attributes, and #716/#763 are actively touching _is_entity_removed_by_quirk and adjacent surfaces. Once a non-trivial number of quirks reach into Light internals (or any other entity class), refactors like #657 become breaking changes for an out-of-tree package.

The Hue example reaches into:

  • endpoint.all_cluster_handlers (a dict keyed by "{ep_id}:0x{cluster_id:04x}" strings — a format choice that's an implementation detail right now)
  • self.cluster_handlers[HueLightCluster.ep_attribute] (also a dict; key shape differs)
  • Light._state / Light._brightness / Light._xy_color / Light._color_temp / Light._effect (underscored attrs, currently free to rename)
  • self._on_off_cluster_handler (presumably ZHA's OnOff cluster handler — name shape is informal)
  • positional cluster_handlers=, endpoint=, device= on Light.__init__

Which of these are intended as public quirks API and which were stable-by-accident? A small Device API doc — or even just # Public: / # Internal: annotations on the methods/attrs quirks may touch — would let ZHA refactor freely below the line while keeping the contract intact above it. Otherwise the next #657-scale refactor lands as "please update your quirks" tickets in zha-device-handlers.

The prevent_default_entity_creation path (added in #716, refined in #763) and the in-iterator if entity.PLATFORM == Platform.LIGHT: continue filter in the Hue example do overlapping work. Worth clarifying which path quirks-v3 is supposed to use, since the example chose the manual filter.

4. Subclass dispatch happens at two layers — easy to drift

Both resolve_device (zigpy-time, applies _operations) and Device.new (ZHA-time, picks subclass) iterate DEVICE_QUIRKS and call cls.matches(zigpy_device). They could disagree if matching ever depended on mutable state (e.g. firmware version changed between the two calls). Caching the matched class on the zigpy device in resolve_device and reading it in Device.new would remove that possibility — and the matching cost.

Related, CoordinatorDevice is chosen by ieee equality in Device.new but never registered via @register_device, so it doesn't go through resolve_device (no operations to apply — fine) but the asymmetry between "hardcoded subclass for coordinator" and "registry-driven subclass for everything else" is worth a comment.

Companion-branches reality check

For the record: I checked out the puddly/decouple-zigpy branches on both zigpy and zha-device-handlers. The zigpy side is one WIP commit (the resolver hook, no quirks-API removal). The quirks side is one WIP commit adding only zhaquirks/philips/{bifrost,hue_native}.py (the example from the PR body). The trio is consistent if all three are pinned together. "Nothing visibly broken" reads roughly right for the happy path; current CI failures (_FakeApp mocks missing _device_resolver) are just version skew against unbumped zigpy and will resolve once that lands.

Roadmap context

OpenHomeFoundation/roadmap#93 names "improve APIs to make adding new devices simpler" as in-scope. This PR is a credible early step there — the API ergonomics for a complex device end up much closer to ad-hoc Python than the current v2 builder lets you get. The Tuya-IR-module use case the PR mentions is exactly where v2's declarative model breaks down, and a class-based shape is the right shape for that.

Small things noted but not blockers (RFC stage)

Quickly, since this is RFC and ergonomics matter more than polish right now: Replace.apply() pops the existing cluster before constructing the replacement (loses the original on construction failure); resolve_device logs at warning level on every matched join (should be debug once de-experimentalized); the broad except Exception in _discover_new_entities will silently swallow real construction bugs and could use the entity / endpoint / cluster handler in the log message; pyproject.toml is now zha-quirks>=1.2.0, which until the first quirks release that imports ZHA will pull in versions that don't actually use the new API. None of these change the design discussion.

What would let me sign off later: clarity on items 1 (target end-state for zigpy) and 2 (deprecation story for the three mechanisms), and a small thinking-out-loud doc on item 3 (which Device/Light surfaces are public quirks API). The rest is concrete code asks that flow naturally once those are settled.

Comment thread zha/zigbee/device.py
for op in cls._operations:
op.apply(zigpy_device)
return zigpy_device
return zigpy.quirks.get_device(zigpy_device)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fallback to zigpy.quirks.get_device(zigpy_device) keeps the v1/v2 registry load-bearing — the resolver layers on top rather than replacing it. That's a reasonable migration step, but the PR title "Migrate quirks out of zigpy" reads as if zigpy's quirks API were going away, and this line says it isn't. Worth being explicit in the RFC about the expected timeline for zigpy.quirks.get_device to lose this role and the legacy registry to be moved (or pruned) entirely.

Comment thread zha/zigbee/device.py
firmware_version_allow_missing: bool = True


DEVICE_QUIRKS: list[type[Device]] = []
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Architectural: subclass dispatch is duplicated between resolve_device (here in spirit — applies _operations at zigpy time) and Device.new (instantiates the right subclass at ZHA time). Both walk DEVICE_QUIRKS independently. Consider caching the matched class on the zigpy device in resolve_device and reusing it in Device.new — removes the possibility of disagreement if matching ever grows mutable-state dependencies, and halves the matching cost on join.

Also: no way to unregister or replace a registration. If the long-running HA process ever hot-reloads quirks (rare but not impossible), repeated registration of the same subclass will duplicate matches. An idempotence guard or a dict keyed by class qualname would harden against that.

Comment thread zha/zigbee/device.py
"""Yield entities for this device."""
# TODO: purge old coordinator entities
if self.is_coordinator:
return
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Subtle interaction: this if self.is_coordinator: return guards the base Device.discover_entities, but CoordinatorDevice.discover_entities (added at the end of this file) overrides it and yields counters unconditionally without checking is_active_coordinator. The prior discover_coordinator_device_entities only ran for active coordinator devices. Device.new currently picks CoordinatorDevice for any ieee equal to gateway.state.node_info.ieee — which is the active coordinator in practice, but the explicit name and the prior is_active_coordinator check made the intent clearer. Worth either gating on is_active_coordinator in CoordinatorDevice.discover_entities or documenting that gateway.state.node_info.ieee is canonically the active coordinator's ieee.

NUMBER = "number"
SENSOR = "sensor"
SELECT = "select"
SWITCH = "switch"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EntityPlatform and EntityType are defined here, but tests still from zigpy.quirks.v2.homeassistant import EntityPlatform, EntityType (e.g. tests/test_sensor.py:18, tests/test_device.py:19) and production from zigpy.quirks.v2 import EntityMetadata, ZCLEnumMetadata, … in ~10 places. Coexistence during migration is fine; flagging so the cleanup pass doesn't get lost when the companion zigpy PR drops the zigpy copies. The goal-state is presumably that zha.application.EntityPlatform is the only definition and zigpy.quirks.v2.homeassistant is gone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Further shape Zigbee support by implementing missing features in ZHA while retaining choice

2 participants