API Reference#

This page provides an auto-generated summary of xcollection’s API. For more details and examples, refer to the relevant chapters in the main part of the documentation.

Collection#

xcollection.main.Collection([datasets])

A collection of datasets.

xcollection.main.open_collection(store, **kwargs)

Open a collection stored in a Zarr store.

class xcollection.main.Collection(datasets=None)[source]#

A collection of datasets. The keys are the dataset names and the values are the datasets.

Parameters

datasets (dict, optional) – A dictionary of datasets to initialize the collection with.

Examples

>>> import xcollection as xc
>>> import xarray as xr
>>> ds = xr.tutorial.open_dataset('rasm')
>>> c = xc.Collection({'foo': ds.isel(time=0), 'bar': ds.isel(y=0)})
>>> c
<Collection (2 keys)>
πŸ”‘ foo
<xarray.Dataset>
Dimensions:  (y: 205, x: 275)
Coordinates:
    time     object 1980-09-16 12:00:00
    xc       (y, x) float64 ...
    yc       (y, x) float64 ...
Dimensions without coordinates: y, x
Data variables:
    Tair     (y, x) float64 ...
πŸ”‘ bar
<xarray.Dataset>
Dimensions:  (time: 36, x: 275)
Coordinates:
* time     (time) object 1980-09-16 12:00:00 ... 1983-08-17 00:00:00
    xc       (x) float64 ...
    yc       (x) float64 ...
Dimensions without coordinates: x
Data variables:
    Tair     (time, x) float64 ...
choose(data_vars, *, mode='any')[source]#

Return a collection with datasets containing all or any of the specified data variables.

Parameters
  • data_vars (str or list of str) – The data variables to select on.

  • mode (str, optional) – The selection mode. Must be one of β€˜all’ or β€˜any’. Defaults to β€˜any’.

Returns

Collection – A new collection containing only the selected datasets.

Examples

>>> c
<Collection (3 keys)>
πŸ”‘ foo
<xarray.Dataset>
Dimensions:  (y: 205, x: 275)
Coordinates:
    time     object 1980-09-16 12:00:00
    xc       (y, x) float64 ...
    yc       (y, x) float64 ...
Dimensions without coordinates: y, x
Data variables:
    Tair     (y, x) float64 ...
πŸ”‘ bar
<xarray.Dataset>
Dimensions:  (time: 36, x: 275)
Coordinates:
* time     (time) object 1980-09-16 12:00:00 ... 1983-08-17 00:00:00
    xc       (x) float64 ...
    yc       (x) float64 ...
Dimensions without coordinates: x
Data variables:
    Tair     (time, x) float64 ...
πŸ”‘ baz
<xarray.Dataset>
Dimensions:  ()
Data variables:
    *empty*
>>> len(c)
3
>>> c.keys()
dict_keys(['foo', 'bar', 'baz'])
>>> d = c.choose(data_vars=['Tair'], mode='any')
>>> len(d)
2
>>> d.keys()
dict_keys(['foo', 'bar'])
>>> d = c.choose(data_vars=['Tair'], mode='all')
filter(*, by, func)[source]#

Return a collection with datasets that match the filter function.

Parameters
  • by (str) – Option to filter by. Must be one of β€˜key’, β€˜value’, or β€˜item’.

  • func (callable) – The filter function.

Returns

Collection – A new collection containing only the selected datasets.

Examples

>>> c
<Collection (3 keys)>
πŸ”‘ foo
<xarray.Dataset>
Dimensions:  (y: 205, x: 275)
Coordinates:
    time     object 1980-09-16 12:00:00
    xc       (y, x) float64 ...
    yc       (y, x) float64 ...
Dimensions without coordinates: y, x
Data variables:
    Tair     (y, x) float64 ...
πŸ”‘ bar
<xarray.Dataset>
Dimensions:  (time: 36, x: 275)
Coordinates:
* time     (time) object 1980-09-16 12:00:00 ... 1983-08-17 00:00:00
    xc       (x) float64 ...
    yc       (x) float64 ...
Dimensions without coordinates: x
Data variables:
    Tair     (time, x) float64 ...
πŸ”‘ baz
<xarray.Dataset>
Dimensions:  ()
Data variables:
    *empty*
>>> len(c)
3
>>> c.keys()
dict_keys(['foo', 'bar', 'baz'])
>>> c.filter(by='key', func=lambda key: isinstance(key, str))
>>> c.filter(by='value', func=lambda ds: 'Tair' in ds.data_vars)
>>> c.filter(
...     by='item',
...     func=lambda item: 2014 in item[1].time.dt.year and isinstance(item[0], str),
... )
items()[source]#

Return the items of the collection.

keymap(func)[source]#

Apply a function to each key in the collection.

Parameters

func (callable) – The function to apply to each key.

Returns

Collection – A new collection containing the results of the function.

Examples

>>> c
<Collection (2 keys)>
πŸ”‘ foo
<xarray.Dataset>
Dimensions:  (y: 205, x: 275)
Coordinates:
    time     object 1980-09-16 12:00:00
    xc       (y, x) float64 ...
    yc       (y, x) float64 ...
Dimensions without coordinates: y, x
Data variables:
    Tair     (y, x) float64 ...
πŸ”‘ bar
<xarray.Dataset>
Dimensions:  (time: 36, x: 275)
Coordinates:
* time     (time) object 1980-09-16 12:00:00 ... 1983-08-17 00:00:00
    xc       (x) float64 ...
    yc       (x) float64 ...
Dimensions without coordinates: x
Data variables:
    Tair     (time, x) float64 ...
>>> c.keys()
dict_keys(['foo', 'bar'])
>>> d = c.keymap(lambda x: x.upper())
>>> d.keys()
dict_keys(['FOO', 'BAR'])
keys()[source]#

Return the keys of the collection.

map(func, args=None, **kwargs)[source]#

Apply a function to each dataset in the collection.

Parameters
  • func (callable) – The function to apply to each dataset.

  • args (tuple, optional) – Positional arguments to pass to func in addition to the dataset.

  • kwargs – Additional keyword arguments to pass as keywords arguments to func.

Returns

Collection – A new collection containing the results of the function.

Examples

>>> c
<Collection (2 keys)>
πŸ”‘ foo
<xarray.Dataset>
Dimensions:  (y: 205, x: 275)
Coordinates:
    time     object 1980-09-16 12:00:00
    xc       (y, x) float64 ...
    yc       (y, x) float64 ...
Dimensions without coordinates: y, x
Data variables:
    Tair     (y, x) float64 ...
πŸ”‘ bar
<xarray.Dataset>
Dimensions:  (time: 36, x: 275)
Coordinates:
* time     (time) object 1980-09-16 12:00:00 ... 1983-08-17 00:00:00
    xc       (x) float64 ...
    yc       (x) float64 ...
Dimensions without coordinates: x
Data variables:
    Tair     (time, x) float64 ...
>>> c.map(func=lambda x: x.isel(x=slice(0, 10)))
<Collection (2 keys)>
πŸ”‘ foo
<xarray.Dataset>
Dimensions:  (y: 205, x: 10)
Coordinates:
    time     object 1980-09-16 12:00:00
    xc       (y, x) float64 ...
    yc       (y, x) float64 ...
Dimensions without coordinates: y, x
Data variables:
    Tair     (y, x) float64 ...
πŸ”‘ bar
<xarray.Dataset>
Dimensions:  (time: 36, x: 10)
Coordinates:
* time     (time) object 1980-09-16 12:00:00 ... 1983-08-17 00:00:00
    xc       (x) float64 ...
    yc       (x) float64 ...
Dimensions without coordinates: x
Data variables:
    Tair     (time, x) float64 ...
to_zarr(store, mode='w', **kwargs)[source]#

Write the collection to a Zarr store.

Parameters
  • store (str or pathlib.Path) – Store or path to directory in local or remote file system.

  • mode ({"w", "w-", "a", "r+", None}, optional) – Persistence mode: β€œw” means create (overwrite if exists); β€œw-” means create (fail if exists); β€œa” means override existing variables (create if does not exist); β€œr+” means modify existing array values only (raise an error if any metadata or shapes would change). The default mode is β€œa” if append_dim is set. Otherwise, it is β€œr+” if region is set and w- otherwise.

  • kwargs – Additional keyword arguments to pass to to_zarr() method.

Examples

>>> c.to_zarr(store='/tmp/foo.zarr', mode='w')
values()[source]#

Return the values of the collection.

weighted(weights, **kwargs)[source]#

Return a collection with datasets weighted by the given weights.

xcollection.main.open_collection(store, **kwargs)[source]#

Open a collection stored in a Zarr store.

Parameters
  • store (str or pathlib.Path) – Store or path to directory in local or remote file system.

  • kwargs – Additional keyword arguments to pass to open_dataset() function.

Returns

Collection – A collection containing the datasets in the Zarr store.

Examples

>>> import xcollection as xc
>>> c = xc.open_collection('/tmp/foo.zarr', decode_times=True, use_cftime=True)