Add map_blocks and some testing support for array query expressions#11398
Add map_blocks and some testing support for array query expressions#11398mrocklin wants to merge 2 commits into
Conversation
Add a --use-dask-array pytest mode that registers the dask-array chunk manager and runs the existing suite through that backend. Generalize dask-specific tests around the active chunk manager and add dask-array expression support for xarray map_blocks.
Avoid importing the dask chunk manager in bare-minimum test environments and tighten datetime accessor types for mypy/stubtest.
|
Gentle ping. |
| ) | ||
| parser.addoption("--run-mypy", action="store_true", help="runs mypy tests") | ||
| parser.addoption( | ||
| "--use-dask-array", |
There was a problem hiding this comment.
nit: let's call this --use-dask-array-with-expr to avoid confusion.
| ) | ||
| from xarray.tests.test_namedarray import NamedArraySubclassobjects | ||
|
|
||
| dask_array_type = array_type("dask") |
There was a problem hiding this comment.
For readability can we do
dask_array_type = get_dask_chunkmanager().array_cls
(with failure handling when dask is not installed)
| ).chunk() | ||
| if Version(sparse.__version__) >= Version("0.16.0"): | ||
| meta = "sparse.numba_backend._coo.core.COO" | ||
| if get_dask_chunkmanager().array_cls.__module__.startswith("dask_array"): |
There was a problem hiding this comment.
Helper for get_dask_chunkmanager().array_cls.__module__.startswith("dask_array") would be appreciated.
There was a problem hiding this comment.
i wonder if
_, has_dask_array = _importorskip("dask_array")
here would work. Then these conditions can become if has_dask_array: That pattern is used extensively (we should add it to tests/CLAUDE.md)
| requires_scipy, | ||
| ) | ||
|
|
||
| dask_array_type = array_type("dask") |
There was a problem hiding this comment.
same here; we could add it to tests/__init__.py and just import dask_array_type everywhere.
| @requires_dask | ||
| def test_lazy_int_bins_error() -> None: | ||
| import dask.array | ||
| da = get_dask_chunkmanager().array_api |
There was a problem hiding this comment.
from xarray.tests import dask_array_api would be a useful thing everywhere.
| ) | ||
|
|
||
|
|
||
| def _using_dask_array_chunkmanager() -> bool: |
There was a problem hiding this comment.
this would be has_dask_array (https://github.com/pydata/xarray/pull/11398/changes#r3467914554)
| name == "dask" and manager is chunkmanager | ||
| for name, manager in list_chunkmanagers().items() | ||
| ) | ||
| if isinstance(chunkmanager, DaskManager) or registered_as_dask: |
There was a problem hiding this comment.
| if isinstance(chunkmanager, DaskManager) or registered_as_dask: | |
| if registered_as_dask: |
?
|
|
||
| array_cls: type[T_ChunkedArray] | ||
| available: bool = True | ||
| vectorized_indexing_returns_numpy_order: bool = False |
There was a problem hiding this comment.
what's this for? Seems like a bug if its needed
| name == "dask" and manager is chunked_array_type | ||
| for name, manager in list_chunkmanagers().items() | ||
| ) | ||
| if isinstance(chunked_array_type, DaskManager) or registered_as_dask: |
There was a problem hiding this comment.
| if isinstance(chunked_array_type, DaskManager) or registered_as_dask: | |
| if registered_as_dask: |
| return has, func | ||
|
|
||
|
|
||
| def get_dask_chunkmanager(): |
There was a problem hiding this comment.
i left comments below but it would be cleaner to add dask_array_type, dask_array_cls, has_dask_array_expr, dask_array_api here and use that in the test suite.
(the code style has changed dramatically since some of those tests were written ;) )
| metadata into a private ``dask_array`` multi-output map expression. Each output | ||
| variable is still a normal ``dask_array.Array`` child expression, so Dask can | ||
| group the children with the composite-collection protocol and ``dask_array`` can | ||
| optimize, cull, persist, and compute those arrays. |
There was a problem hiding this comment.
Can we add a test for culling please? (e.g. ds.pipe(xr.map_blocks(...)).sel(...) ≡ s.sel(...).pipe(xr.map_blocks, ...) )? Or... I guess that's slice pushdown? Do we gain that?
This adds a
--use-dask-arraypytest mode that registers the dask-array chunk manager and runs the existing suite through that backend. Most of the work here is making tests a bit more dask.array/dask-array agnostic. I also brought back the xarray map_blocks implementation.I haven't reviewed this thoroughly yet, and it's not complete (there are still sections of tests that would fail if they weren't marked to xfail under dask-array (flox and masked arrays are good examples), but I wanted to push up something before pushing much further forward on this so that this could get some early feedback.
cc @shoyer @dcherian
AI Disclosure
Tools: Claude, Codex