Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
7039c00
build: change supported dask versions
melonora Mar 30, 2026
63fbe9a
feat[config]: allow for persisting config
melonora Mar 30, 2026
b7e98b9
test: add config fixture
melonora Apr 7, 2026
558fe97
test: add tests for config
melonora Apr 7, 2026
8eebdb0
add raster write kwargs to api
melonora Apr 14, 2026
790be0c
add tests for raster API
melonora Apr 14, 2026
1a1c673
build: add zarrs-python for improved shard io
melonora Apr 14, 2026
471f72c
CI: change lowerbound dask version
melonora Apr 14, 2026
cd48574
build: correct zarrs
melonora Apr 14, 2026
0187fe9
build: change distributed version constraint
melonora Apr 14, 2026
b629de0
build: support dask and distributed >=2026.3.0
melonora Apr 14, 2026
c2b1375
CI: change lowerbound test version of dask
melonora Apr 14, 2026
bf5b910
fix: pre-commit error due to incorrect typehint
melonora Apr 14, 2026
49441fe
build: include zarrs as dependency
melonora Apr 14, 2026
6334fa8
make zarrs codec default
melonora Apr 14, 2026
73ca72a
config: change chunks and shards to accomodate for raster and table
melonora Apr 16, 2026
c6041bb
chore: adjust raster_write to new config fields
melonora Apr 16, 2026
278606a
docs: add docstring
melonora Apr 16, 2026
f6c0a37
change: add support for providing storage options as list
melonora Apr 16, 2026
36e2271
docs: make docstring for raster write kwargs more clear
melonora Apr 16, 2026
50cb6bb
tests: complete and refactor sharding tests
melonora Apr 16, 2026
6257096
docs: adjust docstring config
melonora Apr 16, 2026
5525fbf
fix: correct parsing chunks, shards argument
melonora Apr 16, 2026
44414c1
test: add testing for adjusting chunks with env variable
melonora Apr 16, 2026
e974647
test: write using settings raster_chunks
melonora Apr 16, 2026
bd6249e
feat: add raster_write_kwargs to write_element
melonora Apr 16, 2026
2600237
test: add test writing multiple elements with raster_kwargs
melonora Apr 16, 2026
bbb4bb6
fix: handle case of element name other than current element
melonora Apr 27, 2026
b95c710
fix: add contextmanager preventing settings leakage between tests
melonora Apr 27, 2026
03aafc9
build: remove sharding dependency group
melonora Apr 27, 2026
f44221a
Merge branch 'main' into support_sharding
LucaMarconato May 12, 2026
f77cff4
chore: use JSONdict
melonora May 20, 2026
d614519
Merge branch 'support_sharding' of github.com:melonora/spatialdata in…
melonora May 20, 2026
10ba46b
chore: use JSONDict
melonora May 20, 2026
c036d49
docs: add comment passing by create_raster_element_kwargs
melonora May 20, 2026
e65c0fb
docs: consistent arrow format and comment renaming chunks/shards
melonora May 20, 2026
26cb40b
refactor: create base_options before storage_options
melonora May 20, 2026
2642b74
build: update dependencies
melonora May 20, 2026
302fd1a
Merge branch 'main' into support_sharding
LucaMarconato May 21, 2026
1ce8d08
Merge branch 'main' into support_sharding
melonora May 21, 2026
7851d94
build: make zarrs optional and test dependency
melonora May 22, 2026
1cd0eb3
change: don't enforce zarrs codec globally
melonora May 22, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,6 @@ dependencies = [
"click",
"dask-image",
"dask>=2026.3.0",
"distributed>=2026.3.0",
"datashader",
"fsspec[s3,http]",
"geopandas>=0.14",
Expand All @@ -37,6 +36,7 @@ dependencies = [
"numpy",
"ome_zarr>=0.16.0",
"pandas",
"platformdirs",
"pooch",
"pyarrow",
"rich",
Expand All @@ -60,6 +60,9 @@ extra = [
"spatialdata-plot",
"spatialdata-io",
]
zarrs = [
"zarrs"
]

[dependency-groups]
dev = [
Expand All @@ -71,6 +74,7 @@ test = [
"pytest-mock",
"pytest-xdist",
"torch",
"zarrs",
]
docs = [
"sphinx>=4.5",
Expand Down
35 changes: 35 additions & 0 deletions src/spatialdata/_core/_utils.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,10 @@
from __future__ import annotations

from collections.abc import Iterable
from typing import Any

from anndata import AnnData
from ome_zarr.types import JSONDict

from spatialdata._core.spatialdata import SpatialData

Expand Down Expand Up @@ -164,3 +166,36 @@ def get_unique_name(name: str, attr: str, is_dataframe_column: bool = False) ->
setattr(sanitized, attr, new_dict)

return None if inplace else sanitized


def create_raster_element_kwargs(
raster_write_kwargs: dict[str, JSONDict | list[JSONDict]] | list[JSONDict],
element_name: str,
element_names: set[str],
) -> dict[str, Any] | list[dict[str, Any]]:
Comment on lines +171 to +175
Copy link
Copy Markdown
Member

@LucaMarconato LucaMarconato May 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function would strongly benefit from a docstring. In particular explaining:

  • what are the input types
  • what the output types
  • who is calling this function
  • who is consuming the output of this function

In particular, if the input/output types are implicitly defined in a library, e.g. by Dask, I suggest to write that this is the format defined by X library.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that the input type is defined in the docstring of write().

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I did it there because this is private API and write is public API, but can add it to private API as well if prefered.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please reopen if you would prefer that.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put some context at least. Right now if a contributor sees that function, it can be quite confusing.

element_raster_write_kwargs = None
if isinstance(raster_write_kwargs, dict) and (kwargs := raster_write_kwargs.get(element_name)):
element_raster_write_kwargs = kwargs

if not element_raster_write_kwargs:
Comment on lines +171 to +180
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just else? In general I would try to simplify a bit the branching of this code.

if isinstance(raster_write_kwargs, dict):
for name in element_names:
raster_write_kwargs.pop(name, None)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be mutating the user's input. We should make a copy instead.

if not raster_write_kwargs:
element_raster_write_kwargs = {}
elif isinstance(raster_write_kwargs, dict) and not all(
isinstance(x, (dict, list)) for x in raster_write_kwargs.values()
):
element_raster_write_kwargs = raster_write_kwargs
elif isinstance(raster_write_kwargs, list):
if not all(isinstance(x, dict) for x in raster_write_kwargs):
raise ValueError(
"If passing raster_write_kwargs as list, it is assumed to be the storage "
"options for each scale of a multiscale raster as a dictionary."
)
element_raster_write_kwargs = raster_write_kwargs
else:
raise ValueError(
f"Type of raster_write_kwargs should be either dict or list, got {type(raster_write_kwargs)}."
)
return element_raster_write_kwargs
61 changes: 61 additions & 0 deletions src/spatialdata/_core/spatialdata.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
from dask.dataframe import DataFrame as DaskDataFrame
from dask.dataframe import Scalar
from geopandas import GeoDataFrame
from ome_zarr.types import JSONDict
from shapely import MultiPolygon, Polygon
from upath import UPath
from xarray import DataArray, DataTree
Expand Down Expand Up @@ -1113,6 +1114,7 @@ def write(
update_sdata_path: bool = True,
sdata_formats: SpatialDataFormatType | list[SpatialDataFormatType] | None = None,
shapes_geometry_encoding: Literal["WKB", "geoarrow"] | None = None,
raster_write_kwargs: dict[str, JSONDict | list[JSONDict]] | list[JSONDict] | None = None,
raster_compressor: dict[Literal["lz4", "zstd"], int] | None = None,
) -> None:
"""
Expand Down Expand Up @@ -1161,12 +1163,32 @@ def write(
shapes_geometry_encoding
Whether to use the WKB or geoarrow encoding for GeoParquet. See :meth:`geopandas.GeoDataFrame.to_parquet`
for details. If None, uses the value from :attr:`spatialdata.settings.shapes_geometry_encoding`.
raster_write_kwargs
Copy link
Copy Markdown
Member

@LucaMarconato LucaMarconato May 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please replace with docstring parameter since this docstring is repeated later

@docstring_parameter(min_coordinate_docs=MIN_COORDINATE_DOCS, max_coordinate_docs=MAX_COORDINATE_DOCS)

Storage options for raster elements. These options are passed to the zarr storage backend for writing and
can be provided in several formats:

1. Single dictionary
A dictionary containing all storage options applied globally.
2. Dictionary per raster element
A dictionary where:
- Keys = names of raster elements
- Values = storage options for each element
- For single-scale data: a dictionary
- For multiscale data: a list of dictionaries (one per scale)
3. List of dictionaries (multiscale only)
A list where each dictionary defines the storage options for one scale of a multiscale raster element.

Important Notes
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I would strongly recommend adding an example for users so that they have a recommendation on what to write, at least for Zarr v3. The tests already contain this information.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean as part of the docstring or actual doc? I would provide a follow up PR with specific docs.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd put a quick example of how it's called right it in the docstring.

- The available key–value pairs in these dictionaries depend on the Zarr format used for writing.
- For a full list of supported storage options, refer to:
https://zarr.readthedocs.io/en/stable/api/zarr/create/#zarr.create_array
raster_compressor
A lenght-1 dictionary with as key the type of compression to use for images and labels and as value the
compression level which should be inclusive between 0 and 9. For compression, `lz4` and `zstd` are
supported. If not specified, the compression will be `lz4` with compression level 5. Bytes are automatically
ordered for more efficient compression.
"""
from spatialdata._core._utils import create_raster_element_kwargs
from spatialdata._io._utils import _resolve_zarr_store, _validate_compressor_args
from spatialdata._io.format import _parse_formats

Expand All @@ -1185,6 +1207,13 @@ def write(
store.close()

for element_type, element_name, element in self.gen_elements():
element_raster_write_kwargs = None
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of calling this in write() and in write_element(), can we move this code to _write_element() so it appears only once?

if element_type in ("images", "labels") and raster_write_kwargs:
element_names = set(self.images.keys()).union(self.labels.keys())
element_raster_write_kwargs = create_raster_element_kwargs(
raster_write_kwargs, element_name, element_names
)

self._write_element(
element=element,
zarr_container_path=file_path,
Expand All @@ -1193,6 +1222,7 @@ def write(
overwrite=False,
parsed_formats=parsed,
shapes_geometry_encoding=shapes_geometry_encoding,
element_raster_write_kwargs=element_raster_write_kwargs,
raster_compressor=raster_compressor,
)

Expand All @@ -1211,6 +1241,7 @@ def _write_element(
overwrite: bool,
parsed_formats: dict[str, SpatialDataFormatType] | None = None,
shapes_geometry_encoding: Literal["WKB", "geoarrow"] | None = None,
element_raster_write_kwargs: JSONDict | list[JSONDict] | None = None,
raster_compressor: dict[Literal["lz4", "zstd"], int] | None = None,
) -> None:
from spatialdata._io.io_zarr import _get_groups_for_element
Expand Down Expand Up @@ -1250,6 +1281,7 @@ def _write_element(
group=element_group,
name=element_name,
element_format=parsed_formats["raster"],
storage_options=element_raster_write_kwargs,
raster_compressor=raster_compressor,
)
elif element_type == "labels":
Expand All @@ -1258,6 +1290,7 @@ def _write_element(
group=root_group,
name=element_name,
element_format=parsed_formats["raster"],
storage_options=element_raster_write_kwargs,
raster_compressor=raster_compressor,
)
elif element_type == "points":
Expand Down Expand Up @@ -1289,6 +1322,7 @@ def write_element(
overwrite: bool = False,
sdata_formats: SpatialDataFormatType | list[SpatialDataFormatType] | None = None,
shapes_geometry_encoding: Literal["WKB", "geoarrow"] | None = None,
raster_write_kwargs: dict[str, JSONDict | list[JSONDict] | Any] | list[JSONDict] | None = None,
raster_compressor: dict[Literal["lz4", "zstd"], int] | None = None,
) -> None:
"""
Expand All @@ -1308,6 +1342,25 @@ def write_element(
shapes_geometry_encoding
Whether to use the WKB or geoarrow encoding for GeoParquet. See :meth:`geopandas.GeoDataFrame.to_parquet`
for details. If None, uses the value from :attr:`spatialdata.settings.shapes_geometry_encoding`.
raster_write_kwargs
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicated docstring. I would rather use the docstring_parameter decorator as shown here

@docstring_parameter(min_coordinate_docs=MIN_COORDINATE_DOCS, max_coordinate_docs=MAX_COORDINATE_DOCS)

Storage options for raster elements. These options are passed to the zarr storage backend for writing and
can be provided in several formats:

1. Single dictionary
A dictionary containing all storage options applied globally.
2. Dictionary per raster element
A dictionary where:
- Keys = names of raster elements
- Values = storage options for each element
- For single-scale data: a dictionary
- For multiscale data: a list of dictionaries (one per scale)
3. List of dictionaries (multiscale only)
A list where each dictionary defines the storage options for one scale of a multiscale raster element.

Important Notes
- The available key–value pairs in these dictionaries depend on the Zarr format used for writing.
- For a full list of supported storage options, refer to:
https://zarr.readthedocs.io/en/stable/api/zarr/create/#zarr.create_array
raster_compressor
A lenght-1 dictionary with as key the type of compression to use for images and labels and as value the
compression level which should be inclusive between 0 and 9. For compression, `lz4` and `zstd` are
Expand All @@ -1319,6 +1372,7 @@ def write_element(
If you pass a list of names, the elements will be written one by one. If an error occurs during the writing of
an element, the writing of the remaining elements will not be attempted.
"""
from spatialdata._core._utils import create_raster_element_kwargs
from spatialdata._io.format import _parse_formats

parsed_formats = _parse_formats(formats=sdata_formats)
Expand All @@ -1331,6 +1385,7 @@ def write_element(
overwrite=overwrite,
sdata_formats=sdata_formats,
shapes_geometry_encoding=shapes_geometry_encoding,
raster_write_kwargs=raster_write_kwargs,
raster_compressor=raster_compressor,
)
return
Expand Down Expand Up @@ -1359,6 +1414,11 @@ def write_element(

self._check_element_not_on_disk_with_different_type(element_type=element_type, element_name=element_name)

element_raster_write_kwargs = None
if element_type in ("images", "labels") and raster_write_kwargs:
element_names = set(self.images.keys()).union(self.labels.keys())
element_raster_write_kwargs = create_raster_element_kwargs(raster_write_kwargs, element_name, element_names)

self._write_element(
element=element,
zarr_container_path=self.path,
Expand All @@ -1367,6 +1427,7 @@ def write_element(
overwrite=overwrite,
parsed_formats=parsed_formats,
shapes_geometry_encoding=shapes_geometry_encoding,
element_raster_write_kwargs=element_raster_write_kwargs,
raster_compressor=raster_compressor,
)
# After every write, metadata should be consolidated, otherwise this can lead to IO problems like when deleting.
Expand Down
28 changes: 26 additions & 2 deletions src/spatialdata/_io/io_raster.py
Original file line number Diff line number Diff line change
Expand Up @@ -148,13 +148,13 @@ def _prepare_storage_options(
return None
if isinstance(storage_options, dict):
prepared = dict(storage_options)
if "chunks" in prepared:
if "chunks" in prepared and prepared["chunks"] is not None:
prepared["chunks"] = _normalize_explicit_chunks(prepared["chunks"])
return prepared

prepared_options = [dict(options) for options in storage_options]
for options in prepared_options:
if "chunks" in options:
if "chunks" in options and options["chunks"] is not None:
options["chunks"] = _normalize_explicit_chunks(options["chunks"])
return prepared_options

Expand Down Expand Up @@ -284,6 +284,19 @@ def _write_raster(
raster_format
The format used to write the raster data.
storage_options
Comment thread
melonora marked this conversation as resolved.
Storage options for raster elements, which have been extracted from potentially mixed kwargs dict by
`create_raster_element_kwargs`. These options are passed to the zarr storage backend for writing and can be
provided in several formats:

1. Single dictionary
A dictionary containing all storage options applied to the raster, either single or multiscale.
2. List of dictionaries (multiscale only)
A list where each dictionary defines the storage options for one scale of the multiscale raster element.

Important Notes
- The available key–value pairs in these dictionaries depend on the Zarr format used for writing.
- For a full list of supported storage options, refer to:
https://zarr.readthedocs.io/en/stable/api/zarr/create/#zarr.create_array
Additional options for writing the raster data, like chunks and compression.
raster_compressor
Compression settings as a len-1 dictionary with a single key-value {compression: compression level} pair
Expand All @@ -292,6 +305,10 @@ def _write_raster(
metadata
Additional metadata for the raster element
"""
from dataclasses import asdict

from spatialdata import settings

if raster_type not in ["image", "labels"]:
raise ValueError(f"{raster_type} is not a valid raster type. Must be 'image' or 'labels'.")
# "name" and "label_metadata" are only used for labels. "name" is written in write_multiscale_ngff() but ignored in
Expand All @@ -308,6 +325,13 @@ def _write_raster(
for c in channels:
metadata["metadata"]["omero"]["channels"].append({"label": c}) # type: ignore[union-attr, index, call-overload]

base_options = {k.split("_")[1]: v for k, v in asdict(settings).items() if k in ("raster_chunks", "raster_shards")}

if isinstance(storage_options, list):
storage_options = [{**base_options, **x} for x in storage_options]
else:
storage_options = {**base_options, **(storage_options or {})}

if isinstance(raster_data, DataArray):
_write_raster_dataarray(
raster_type,
Expand Down
Loading
Loading