Skip to content

Add map_blocks and some testing support for array query expressions#11398

Open
mrocklin wants to merge 2 commits into
pydata:mainfrom
mrocklin:codex/dask-array-xarray-suite
Open

Add map_blocks and some testing support for array query expressions#11398
mrocklin wants to merge 2 commits into
pydata:mainfrom
mrocklin:codex/dask-array-xarray-suite

Conversation

@mrocklin

Copy link
Copy Markdown
Contributor

This adds a --use-dask-array pytest mode that registers the dask-array chunk manager and runs the existing suite through that backend. Most of the work here is making tests a bit more dask.array/dask-array agnostic. I also brought back the xarray map_blocks implementation.

I haven't reviewed this thoroughly yet, and it's not complete (there are still sections of tests that would fail if they weren't marked to xfail under dask-array (flox and masked arrays are good examples), but I wanted to push up something before pushing much further forward on this so that this could get some early feedback.

cc @shoyer @dcherian

AI Disclosure

  • This PR contains AI-generated content.
  • I have tested any AI-generated content in my PR.
  • I take responsibility for any AI-generated content in my PR.

Tools: Claude, Codex

Add a --use-dask-array pytest mode that registers the dask-array chunk manager and runs the existing suite through that backend.

Generalize dask-specific tests around the active chunk manager and add dask-array expression support for xarray map_blocks.
@github-actions github-actions Bot added topic-backends topic-dask topic-testing topic-arrays related to flexible array support io topic-NamedArray Lightweight version of Variable labels Jun 22, 2026
Avoid importing the dask chunk manager in bare-minimum test environments and tighten datetime accessor types for mypy/stubtest.
@mrocklin

Copy link
Copy Markdown
Contributor Author

Gentle ping.

Comment thread conftest.py
)
parser.addoption("--run-mypy", action="store_true", help="runs mypy tests")
parser.addoption(
"--use-dask-array",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: let's call this --use-dask-array-with-expr to avoid confusion.

)
from xarray.tests.test_namedarray import NamedArraySubclassobjects

dask_array_type = array_type("dask")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For readability can we do

dask_array_type = get_dask_chunkmanager().array_cls

(with failure handling when dask is not installed)

).chunk()
if Version(sparse.__version__) >= Version("0.16.0"):
meta = "sparse.numba_backend._coo.core.COO"
if get_dask_chunkmanager().array_cls.__module__.startswith("dask_array"):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Helper for get_dask_chunkmanager().array_cls.__module__.startswith("dask_array") would be appreciated.

@dcherian dcherian Jun 24, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wonder if

_, has_dask_array = _importorskip("dask_array")

here would work. Then these conditions can become if has_dask_array: That pattern is used extensively (we should add it to tests/CLAUDE.md)

requires_scipy,
)

dask_array_type = array_type("dask")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here; we could add it to tests/__init__.py and just import dask_array_type everywhere.

@requires_dask
def test_lazy_int_bins_error() -> None:
import dask.array
da = get_dask_chunkmanager().array_api

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from xarray.tests import dask_array_api would be a useful thing everywhere.

)


def _using_dask_array_chunkmanager() -> bool:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread xarray/backends/api.py
name == "dask" and manager is chunkmanager
for name, manager in list_chunkmanagers().items()
)
if isinstance(chunkmanager, DaskManager) or registered_as_dask:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if isinstance(chunkmanager, DaskManager) or registered_as_dask:
if registered_as_dask:

?


array_cls: type[T_ChunkedArray]
available: bool = True
vectorized_indexing_returns_numpy_order: bool = False

@dcherian dcherian Jun 24, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's this for? Seems like a bug if its needed

name == "dask" and manager is chunked_array_type
for name, manager in list_chunkmanagers().items()
)
if isinstance(chunked_array_type, DaskManager) or registered_as_dask:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if isinstance(chunked_array_type, DaskManager) or registered_as_dask:
if registered_as_dask:

Comment thread xarray/tests/__init__.py
return has, func


def get_dask_chunkmanager():

@dcherian dcherian Jun 24, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i left comments below but it would be cleaner to add dask_array_type, dask_array_cls, has_dask_array_expr, dask_array_api here and use that in the test suite.

(the code style has changed dramatically since some of those tests were written ;) )

metadata into a private ``dask_array`` multi-output map expression. Each output
variable is still a normal ``dask_array.Array`` child expression, so Dask can
group the children with the composite-collection protocol and ``dask_array`` can
optimize, cull, persist, and compute those arrays.

@dcherian dcherian Jun 24, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test for culling please? (e.g. ds.pipe(xr.map_blocks(...)).sel(...) ≡ s.sel(...).pipe(xr.map_blocks, ...) )? Or... I guess that's slice pushdown? Do we gain that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

io topic-arrays related to flexible array support topic-backends topic-dask topic-NamedArray Lightweight version of Variable topic-testing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants