Model-centric refactoring to reduce dataset creation#2646
Model-centric refactoring to reduce dataset creation#2646VeckoTheGecko wants to merge 48 commits into
Conversation
Split out data manipulation and grid creation from the construction of fields
Move all data validation code to the model itself
for more information, see https://pre-commit.ci
Ultimately we want to make the grid just dependent on the SGRID compliant model data since it contains all the information needed regarding staggering (we dont need xgcm anymore). I want to update the constructor to remove the xgcm grid object - so adding an adapter at the moment to help with refactoring (will be removed at a later date)
Also update calling code in model and field.py
60e7ff9 to
f1799ac
Compare
|
OK - now I've mapped out most of the architectural changes here. I'm going to now profile performance and memory consumption to see where we stand, and how to optimise that under the new structure. Once i have results for that, I will populate this PR with them as well as a full description over the proposed refactoring. Once we decide on structure and performance differences and agree, we can go through the test suite and update. |
For when adding fieldsets together
Fix constant field
Field, VectorField, FieldSet, ParticleSet, XGrid Ahead of Parcels-code#2683 and so that we don't have to refactor too much in this PR
Follow the new API and new way of setting interpolators
caa3432 to
da46ba7
Compare
Per review feedback
per review feedback
Yes, that is correct |
|
Will do the Model renamings after lunch and re-request a review |
|
Hmmm.
Thinking more about this, I think the renaming to I agree that I think having the term I think there is a degree to which we need to use language that works best for development (particularly for internal API) even if there is alternative usage in other domains (i.e., oceanography or atmosphere) - there's just not enough words. Perhaps a middleground would be to use |
|
I was looking at the docs build, and saw an example that adds a second velocity field import matplotlib.pyplot as plt
import numpy as np
import xarray as xr
import parcels
import parcels.tutorial
# Load the CopernicusMarine data in the Agulhas region from the example_datasets
ds_fields = parcels.tutorial.open_dataset("CopernicusMarine_data_for_Argo_tutorial/data")
ds_fields.load() # load the dataset into memory
# Create an idealised wind field and add it to the dataset
tdim, ydim, xdim = (len(ds_fields.time),len(ds_fields.latitude), len(ds_fields.longitude))
ds_fields["UWind"] = xr.DataArray(
data=np.ones((tdim, ydim, xdim)) * np.sin(ds_fields.latitude.values)[None, :, None],
coords=[ds_fields.time, ds_fields.latitude, ds_fields.longitude])
ds_fields["VWind"] = xr.DataArray(
data=np.zeros((tdim, ydim, xdim)),
coords=[ds_fields.time, ds_fields.latitude, ds_fields.longitude])
fields = {
"U": ds_fields["uo"],
"V": ds_fields["vo"],
"UWind": ds_fields["UWind"],
"VWind": ds_fields["VWind"],
}
ds_fset = parcels.convert.copernicusmarine_to_sgrid(fields=fields)
fieldset = parcels.FieldSet.from_sgrid_conventions(ds_fset)
# Create a vecorfield for the wind
windvector = parcels.VectorField(
"Wind",
fieldset.UWind,
fieldset.VWind,
interp_method=parcels.interpolators.XLinear_Velocity
)
fieldset.add_field(windvector)We no longer have |
6b6d8c1 to
4e02d1f
Compare
|
Just chatted f2f with Erik, and decided on vvvvv
|
Also renamed the subclasses
Description
Overview
The central change is the introduction of a new
Modelabstraction layer between raw xarray/uxarray data and theField/FieldSetobjects. Previously,Fieldowned its data and grid directly. Now,Fieldis a thin view over aModel, andFieldSetis a container ofModelobjects rather thanFieldobjects.New file:
src/parcels/_core/model.pyModel(abstract base class)Abstract class with three required attributes:
data: Any— the underlying datasetgrid: BaseGrid— the grid objectfield_to_interpolator: dict[str, ScalarInterpolator | VectorInterpolator]— maps field names to interpolator instancesAbstract methods:
construct_fields() -> list[Field | VectorField]— build field objects from this modelscalar_field_names -> list[str]— names of scalar fields in the dataassert_valid_field_data(field_data)— validate a single field's dataConcrete methods on
Model:assert_valid_model_data()— iteratesscalar_field_namesand callsassert_valid_field_dataon eachtime_interval -> TimeInterval | None— computed fromself.dataStructuredModel(Model)For structured (SGRID) grid data backed by
xr.Dataset.Constructor:
StructuredModel(data: xr.Dataset, mesh: Mesh)preprocess_sgrid_model_data(data)to transpose fields to(t, z, y, x)orderXGrid(data, mesh)gridfield_to_interpolator = {}assert_valid_model_data()on constructionfrom_sgrid_conventions(cls, ds, mesh=None)classmethod:FieldSet.from_sgrid_conventions— handles time axis renaming, mesh type inferenceXLinear()on all scalar fields after constructionStructuredModelinstanceconstruct_fields():Field("U", self),Field("V", self)etc., then wraps them inVectorField("UV", ...)if U+V presentXLinear_Velocity()for A-grids,CGrid_Velocity()for C-gridsUnstructuredModel(Model)For unstructured (UGRID) grid data backed by
ux.UxDataset.Constructor:
UnstructuredModel(data: ux.UxDataset, grid: UxGrid)from_ugrid_conventions(cls, ds, mesh="spherical")classmethod:time,zf,zc)UxGrid, calls_discover_ux_U_and_V, returns instanceconstruct_fields():_select_uxinterpolator(da)to pick the appropriate interpolator per fieldField(name, model, interp)— see Field changes belowHelper functions moved from
fieldset.pytomodel.py_discover_ux_U_and_V(ds)— unchanged logic_select_uxinterpolator(da)— unchanged logic_get_mesh_type_from_sgrid_dataset(ds)— unchanged logic_is_coordinate_in_degrees(da)— unchanged logic_get_time_interval(data)— logic adjusted: checks"time" not in data or data["time"].size == 1(previously checkeddata.shape[0] == 1)_assert_valid_uxdataarray(data)— unchanged logic_assert_has_time_coordinate(da)— new helper extracted from oldField.__init__New helper in
model.pypreprocess_sgrid_model_data(ds)— transposes all non-grid-topology data vars to(t, z, y, x)using_transpose_xfield_data_to_tzyxChanges to
src/parcels/_core/field.pyField.__init__signature changeBefore:
After:
data,grid, andinterp_methodare no longer constructor argumentsself.name,self.model, andself.igrid = -1__init__Fieldproperties (delegating to model)Three new properties proxy into the model:
These preserve backward compatibility for code that reads
field.data,field.grid,field.time_interval.Field.interp_methodproperty/setterBefore: stored as
self._interp_method; validated viaassert_same_function_signatureagainstZeroInterpolatorAfter: stored in
self.model.field_to_interpolator[self.name]AttributeError(notKeyError) if no interpolator is set for this fieldisinstance(value, ScalarInterpolator)instead of checking function signatureInterpolator call convention change
Before:
self._interp_method(particle_positions, grid_positions, self)After:
self.interp_method.interp(particle_positions, grid_positions, self)Interpolators are now objects with an
.interp(...)method, not plain callables.VectorFieldchangesinterp_methodparameter type annotation changed fromCallable | NonetoVectorInterpolator | Noneassert_same_function_signature(...)toisinstance(interp_method, VectorInterpolator)isinstance(method, VectorInterpolator)self._interp_method.interp(...)instead ofself._interp_method(...)Removed from
field.py_assert_valid_uxdataarray— moved tomodel.py_assert_compatible_combination— removed (validation now handled per-model)_get_time_interval— moved tomodel.pyuxarray,xarray,Callable,TimeInterval,ZeroInterpolator,ZeroInterpolator_Vector,assert_same_function_signature,_transpose_xfield_data_to_tzyx,assert_all_field_dims_have_axisChanges to
src/parcels/_core/fieldset.pyFieldSet.__init__signature changeBefore:
FieldSet(fields: list[Field | VectorField])After:
FieldSet(models: list[Model])self.models: list[Model]self.reconstruct_fields()on init to buildself._fieldsassert_compatible_calendars(fields)call commented out (TODO)New
FieldSet.fieldsproperty_fieldsis now the backing store;fieldsis a lazy property that callsreconstruct_fields()if_fieldsisNone.New
FieldSet.reconstruct_fields()methodIterates
self.models, callsmodel.construct_fields()on each, flattens intoself._fieldsdict.New
FieldSet.__add__operatorfrom_ugrid_conventionssimplifiedBefore: ~15 lines building grid, discovering U/V, creating Field objects, returning
cls(list(fields.values()))After:
from_sgrid_conventionssimplifiedBefore: ~50 lines handling time axis, xgcm grid creation, field creation
After:
add_fieldconstant field creation updatedThe inline
xgcm.Grid(...)call when adding a constant scalar field is replaced with constructingXGrid(ds, mesh=mesh)directly (after attaching SGRID metadata viasgrid._attach_sgrid_metadata).New module-level function:
assert_compatible_fieldsetsRaises
ValueErrorif the two fieldsets share any field names or constant names.Removed from
fieldset.pyxgcmimportUxGridimport_DEFAULT_XGCM_KWARGSimportloggerimport_ds_rename_using_standard_namesimportXConstantFieldremains)_discover_ux_U_and_V— moved tomodel.py_select_uxinterpolator— moved tomodel.py_get_mesh_type_from_sgrid_dataset— moved tomodel.py_is_coordinate_in_degrees— moved tomodel.pySummary of architectural intent
Field(heldself.data,self.grid)Model(holdsself.data,self.grid)Field._interp_method(per-field callable)Model.field_to_interpolator(dict of objects)ZeroInterpolatorsignatureScalarInterpolator/VectorInterpolatorinterp_method(positions, grid_positions, field)interp_method.interp(positions, grid_positions, field)FieldSetcontentslist[Field | VectorField]list[Model]FieldSet.from_*classmethodsModel.construct_fields()FieldSetcombinationfieldset_a + fieldset_bvia__add__Open questions
Modelthe best name for this level of abstraction? (this question doesn't have to be answered now - talking with @erikvansebille it seemed confusing, but there wasn't a clear better alternative)Checklist
mainfor normal development,v3-supportfor v3 support)AI Disclosure