easyclimate.core.stats¶

Submodules¶

Functions¶

`calc_detrend_spatial_fast`(→ xarray.DataArray)	Remove linear trend along time dimension from spatio-temporal data.
`calc_monthly_mean`(data_input[, dim])	Calculate monthly mean.
`calc_monthly_sum`(data_input[, dim])	Calculate monthly sum.
`calc_monthly_std`(data_input[, dim])	Calculate monthly standard deviation.
`calc_monthly_var`(data_input[, dim])	Calculate monthly variance.
`calc_monthly_max`(data_input[, dim])	Calculate monthly maximum.
`calc_monthly_min`(data_input[, dim])	Calculate monthly minimum.
`calc_yearly_mean`(data_input[, dim])	Calculate yearly mean.
`calc_yearly_sum`(data_input[, dim])	Calculate yearly sum.
`calc_yearly_std`(data_input[, dim])	Calculate yearly standard deviation.
`calc_yearly_var`(data_input[, dim])	Calculate yearly standard deviation.
`calc_yearly_max`(data_input[, dim])	Calculate yearly standard deviation.
`calc_yearly_min`(data_input[, dim])	Calculate yearly standard deviation.

Package Contents¶

easyclimate.core.stats.calc_detrend_spatial_fast(data_input: xarray.DataArray, time_dim: str = 'time', min_valid_fraction: float = 0.5, method: Literal['scipy_reduce', 'scipy', 'numpy', 'rust', 'rust_chunked', 'rust_flexible', 'auto'] = 'auto', **kwargs) → xarray.DataArray¶

Remove linear trend along time dimension from spatio-temporal data.

Supports multiple computation methods with optional automatic selection.

Parameters¶

data_inputxr.DataArray: The spatio-temporal data to be detrended.
time_dimstr, default “time”: Name of the time dimension.
min_valid_fractionfloat, default 0.5: Minimum fraction of valid (finite) values required for detrending. Grid points with fewer valid values will be set to NaN.
methodstr, default ‘auto’: Computation method to use: - ‘scipy_reduce’: Simplified version using scipy.signal.detrend - ‘scipy’: Optimized version using scipy.signal.detrend - ‘numpy’: Manual numpy vectorized implementation - ‘rust’: High-performance Rust backend - ‘rust_chunked’: Rust backend with chunked processing (for large datasets) - ‘rust_flexible’: Rust backend with flexible dimension handling - ‘auto’: Automatically selects the best available method
**kwargsdict: Additional arguments passed to specific methods: - chunk_size: int (for ‘rust_chunked’ method) - use_chunked: bool (for ‘rust_chunked’ method)

Returns¶

xr.DataArray: Detrended data with same shape and coordinates as input.

Raises¶

TypeError: If data_input is not an xarray.DataArray.
ValueError: If input parameters are invalid.
ImportError: If Rust method is selected but Rust backend is not available.

Examples¶

>>> import xarray as xr
>>> import numpy as np
>>>
>>> # Create sample data
>>> data = xr.DataArray(
...     np.random.randn(100, 50, 100),
...     dims=['time', 'lat', 'lon']
... )
>>>
>>> # Using scipy method
>>> result1 = calc_detrend_spatial_fast(data, method='scipy')
>>>
>>> # Using numpy method
>>> result2 = calc_detrend_spatial_fast(data, method='numpy')
>>>
>>> # Using Rust method (if available)
>>> try:
>>>     result3 = calc_detrend_spatial_fast(data, method='rust')
>>> except ImportError:
>>>     print("Rust backend not available")
>>>
>>> # Automatic method selection
>>> result4 = calc_detrend_spatial_fast(data, method='auto')

Notes¶

‘scipy_reduce’: Simplest method but less robust with NaN values
‘scipy’: Optimized scipy version with better special value handling
‘numpy’: Manual vectorized implementation, typically 2-3x faster than scipy
‘rust’: High-performance Rust implementation, typically 10-50x faster than numpy
‘rust_chunked’: Rust chunked version for large memory datasets
‘rust_flexible’: Rust version with flexible dimension ordering
‘auto’: Uses ‘rust’ if available, otherwise ‘numpy’

easyclimate.core.stats.calc_monthly_mean(data_input: xarray.DataArray, dim: str = 'time', **kwargs)¶

Calculate monthly mean.

For every adjacent sequence \(t_1, ..., t_n\) of timesteps of the same month it is:

\[o(t, x) = \mathrm{mean} \left \lbrace i(t', x), t_1 < t' \leqslant t_n \right\rbrace\]

Parameters¶

data_input: xarray.DataArray.: xarray.DataArray to be calculated.

Note

The recommended frequence of the data_input is daily.
dim: str: Dimension(s) over which to apply extracting. By default extracting is applied over the time dimension.
**kwargs:: Additional keyword arguments passed on to the appropriate array function for calculating mean on this object’s data. These could include dask-specific kwargs like split_every.

Returns¶

xarray.DataArray.

Examples¶

>>> import xarray as xr
>>> import numpy as np
>>> import pandas as pd
>>> import easyclimate as ecl
>>> # Create sample data with daily frequency
>>> time_index = pd.date_range('2020-01-01', '2020-03-31', freq='D')
>>> rng = np.random.default_rng(42)
>>> data = rng.random((len(time_index), 3, 3))
>>> da = xr.DataArray(data, dims=['time', 'x', 'y'], coords={'time': time_index})
>>> # Calculate monthly mean
>>> monthly_mean = ecl.calc_monthly_mean(da)
>>> print(monthly_mean)
<xarray.DataArray (time: 3, x: 3, y: 3)> Size: 216B
array([[[0.50635175, 0.45767908, 0.50271707],
        [0.52091523, 0.44830133, 0.44293946],
        [0.47944591, 0.53314083, 0.48073062]],
    [[0.53431127, 0.48259521, 0.47464862],
        [0.41070456, 0.51619935, 0.4872374 ],
        [0.6009132 , 0.43963445, 0.6028882 ]],
    [[0.48363606, 0.60589154, 0.42622008],
        [0.47519641, 0.46989711, 0.45327877],
        [0.44193025, 0.45050389, 0.641573  ]]])
Coordinates:
* time     (time) datetime64[ns] 24B 2020-01-01 2020-02-01 2020-03-01
Dimensions without coordinates: x, y

easyclimate.core.stats.calc_monthly_sum(data_input: xarray.DataArray, dim: str = 'time', **kwargs)¶

Calculate monthly sum.

For every adjacent sequence \(t_1, ..., t_n\) of timesteps of the same month it is:

\[o(t, x) = \mathrm{sum} \left \lbrace i(t', x), t_1 < t' \leqslant t_n \right\rbrace\]

Parameters¶

data_input: xarray.DataArray.: xarray.DataArray to be calculated.

Note

The recommended frequence of the data_input is daily.
dim: str: Dimension(s) over which to apply extracting. By default extracting is applied over the time dimension.
**kwargs:: Additional keyword arguments passed on to the appropriate array function for calculating mean on this object’s data. These could include dask-specific kwargs like split_every.

Returns¶

xarray.DataArray.

Examples¶

>>> import xarray as xr
>>> import numpy as np
>>> import pandas as pd
>>> import easyclimate as ecl
>>> # Create sample data with daily frequency
>>> time_index = pd.date_range('2020-01-01', '2020-03-31', freq='D')
>>> rng = np.random.default_rng(42)
>>> data = rng.random((len(time_index), 3, 3))
>>> da = xr.DataArray(data, dims=['time', 'x', 'y'], coords={'time': time_index})
>>> # Calculate monthly sum
>>> monthly_sum = ecl.calc_monthly_sum(da)
>>> print(monthly_sum)
<xarray.DataArray (time: 3, x: 3, y: 3)> Size: 216B
array([[[15.69690421, 14.18805154, 15.58422905],
        [16.14837203, 13.89734131, 13.7311233 ],
        [14.86282316, 16.52736588, 14.90264935]],
    [[15.49502686, 13.99526102, 13.76481005],
        [11.91043229, 14.96978113, 14.1298847 ],
        [17.42648281, 12.7493991 , 17.48375789]],
    [[14.99271788, 18.78263759, 13.21282259],
        [14.73108859, 14.56681052, 14.05164186],
        [13.69983774, 13.96562059, 19.88876291]]])
Coordinates:
* time     (time) datetime64[ns] 24B 2020-01-01 2020-02-01 2020-03-01
Dimensions without coordinates: x, y

easyclimate.core.stats.calc_monthly_std(data_input: xarray.DataArray, dim: str = 'time', **kwargs)¶

Calculate monthly standard deviation.

For every adjacent sequence \(t_1, ..., t_n\) of timesteps of the same month it is:

\[o(t, x) = \mathrm{std} \left \lbrace i(t', x), t_1 < t' \leqslant t_n \right\rbrace\]

Parameters¶

data_input: xarray.DataArray.: xarray.DataArray to be calculated.

Note

The recommended frequence of the data_input is daily.
dim: str: Dimension(s) over which to apply extracting. By default extracting is applied over the time dimension.
**kwargs:: Additional keyword arguments passed on to the appropriate array function for calculating mean on this object’s data. These could include dask-specific kwargs like split_every.

Returns¶

xarray.DataArray.

Examples¶

>>> import xarray as xr
>>> import numpy as np
>>> import pandas as pd
>>> import easyclimate as ecl
>>> # Create sample data with daily frequency
>>> time_index = pd.date_range('2020-01-01', '2020-03-31', freq='D')
>>> rng = np.random.default_rng(42)
>>> data = rng.random((len(time_index), 3, 3))
>>> da = xr.DataArray(data, dims=['time', 'x', 'y'], coords={'time': time_index})
>>> # Calculate monthly std
>>> monthly_std = ecl.calc_monthly_std(da)
>>> print(monthly_std)
<xarray.DataArray (time: 3, x: 3, y: 3)> Size: 216B
array([[[0.30528844, 0.29905447, 0.26472868],
        [0.23456056, 0.30879525, 0.29333846],
        [0.26139562, 0.306974  , 0.27987361]],
    [[0.30196879, 0.24783961, 0.26078164],
        [0.2708643 , 0.3012602 , 0.29801453],
        [0.24816804, 0.33863555, 0.25623523]],
    [[0.29399346, 0.31595077, 0.30336434],
        [0.31117807, 0.3130123 , 0.28909393],
        [0.27104435, 0.26864038, 0.22912052]]])
Coordinates:
* time     (time) datetime64[ns] 24B 2020-01-01 2020-02-01 2020-03-01
Dimensions without coordinates: x, y

easyclimate.core.stats.calc_monthly_var(data_input: xarray.DataArray, dim: str = 'time', **kwargs)¶

Calculate monthly variance.

For every adjacent sequence \(t_1, ..., t_n\) of timesteps of the same month it is:

\[o(t, x) = \mathrm{var} \left \lbrace i(t', x), t_1 < t' \leqslant t_n \right\rbrace\]

Parameters¶

data_input: xarray.DataArray.: xarray.DataArray to be calculated.

Note

The recommended frequence of the data_input is daily.
dim: str: Dimension(s) over which to apply extracting. By default extracting is applied over the time dimension.
**kwargs:: Additional keyword arguments passed on to the appropriate array function for calculating mean on this object’s data. These could include dask-specific kwargs like split_every.

Returns¶

xarray.DataArray.

Examples¶

>>> import xarray as xr
>>> import numpy as np
>>> import pandas as pd
>>> import easyclimate as ecl
>>> # Create sample data with daily frequency
>>> time_index = pd.date_range('2020-01-01', '2020-03-31', freq='D')
>>> rng = np.random.default_rng(42)
>>> data = rng.random((len(time_index), 3, 3))
>>> da = xr.DataArray(data, dims=['time', 'x', 'y'], coords={'time': time_index})
>>> # Calculate monthly var
>>> monthly_var = ecl.calc_monthly_var(da)
>>> print(monthly_var)
<xarray.DataArray (time: 3, x: 3, y: 3)> Size: 216B
array([[[0.09320103, 0.08943358, 0.07008127],
        [0.05501865, 0.09535451, 0.08604745],
        [0.06832767, 0.09423304, 0.07832924]],
    [[0.09118515, 0.06142447, 0.06800706],
        [0.07336747, 0.09075771, 0.08881266],
        [0.06158738, 0.11467404, 0.0656565 ]],
    [[0.08643216, 0.09982489, 0.09202992],
        [0.09683179, 0.0979767 , 0.0835753 ],
        [0.07346504, 0.07216766, 0.05249621]]])
Coordinates:
* time     (time) datetime64[ns] 24B 2020-01-01 2020-02-01 2020-03-01
Dimensions without coordinates: x, y

easyclimate.core.stats.calc_monthly_max(data_input: xarray.DataArray, dim: str = 'time', **kwargs)¶

Calculate monthly maximum.

For every adjacent sequence \(t_1, ..., t_n\) of timesteps of the same month it is:

\[o(t, x) = \mathrm{max} \left \lbrace i(t', x), t_1 < t' \leqslant t_n \right\rbrace\]

Parameters¶

data_input: xarray.DataArray.: xarray.DataArray to be calculated.

Note

The recommended frequence of the data_input is daily.
dim: str: Dimension(s) over which to apply extracting. By default extracting is applied over the time dimension.
**kwargs:: Additional keyword arguments passed on to the appropriate array function for calculating mean on this object’s data. These could include dask-specific kwargs like split_every.

Returns¶

xarray.DataArray.

Examples¶

>>> import xarray as xr
>>> import numpy as np
>>> import pandas as pd
>>> import easyclimate as ecl
>>> # Create sample data with daily frequency
>>> time_index = pd.date_range('2020-01-01', '2020-03-31', freq='D')
>>> rng = np.random.default_rng(42)
>>> data = rng.random((len(time_index), 3, 3))
>>> da = xr.DataArray(data, dims=['time', 'x', 'y'], coords={'time': time_index})
>>> # Calculate monthly max
>>> monthly_max = ecl.calc_monthly_max(da)
>>> print(monthly_max)
<xarray.DataArray (time: 3, x: 3, y: 3)> Size: 216B
<xarray.DataArray (time: 3, x: 3, y: 3)> Size: 216B
array([[[0.96189766, 0.95855921, 0.93604357],
        [0.97182643, 0.97069802, 0.97562235],
        [0.99237556, 0.96623191, 0.91601185]],
    [[0.95119466, 0.88414571, 0.85053368],
        [0.94602445, 0.99910473, 0.99546447],
        [0.98002718, 0.98663154, 0.99703466]],
    [[0.989133  , 0.99874337, 0.99308458],
        [0.99032166, 0.98180595, 0.92746046],
        [0.99758004, 0.91879368, 0.99470175]]])
Coordinates:
* time     (time) datetime64[ns] 24B 2020-01-01 2020-02-01 2020-03-01
Dimensions without coordinates: x, y

easyclimate.core.stats.calc_monthly_min(data_input: xarray.DataArray, dim: str = 'time', **kwargs)¶

Calculate monthly minimum.

For every adjacent sequence \(t_1, ..., t_n\) of timesteps of the same month it is:

\[o(t, x) = \mathrm{min} \left \lbrace i(t', x), t_1 < t' \leqslant t_n \right\rbrace\]

Parameters¶

data_input: xarray.DataArray.: xarray.DataArray to be calculated.

Note

The recommended frequence of the data_input is daily.
dim: str: Dimension(s) over which to apply extracting. By default extracting is applied over the time dimension.
**kwargs:: Additional keyword arguments passed on to the appropriate array function for calculating mean on this object’s data. These could include dask-specific kwargs like split_every.

Returns¶

xarray.DataArray.

Examples¶

>>> import xarray as xr
>>> import numpy as np
>>> import pandas as pd
>>> import easyclimate as ecl
>>> # Create sample data with daily frequency
>>> time_index = pd.date_range('2020-01-01', '2020-03-31', freq='D')
>>> rng = np.random.default_rng(42)
>>> data = rng.random((len(time_index), 3, 3))
>>> da = xr.DataArray(data, dims=['time', 'x', 'y'], coords={'time': time_index})
>>> # Calculate monthly min
>>> monthly_min = ecl.calc_monthly_min(da)
>>> print(monthly_min)
<xarray.DataArray (time: 3, x: 3, y: 3)> Size: 216B
array([[[0.02280387, 0.02485949, 0.05338193],
        [0.02271207, 0.01783678, 0.02161208],
        [0.00736227, 0.04161417, 0.02114802]],
    [[0.0289995 , 0.04347506, 0.0401513 ],
        [0.01072764, 0.09172101, 0.01468284],
        [0.07205915, 0.01230269, 0.00542983]],
    [[0.0040076 , 0.0165798 , 0.00166071],
        [0.04896371, 0.01903415, 0.00123306],
        [0.04737402, 0.00450012, 0.22825288]]])
Coordinates:
* time     (time) datetime64[ns] 24B 2020-01-01 2020-02-01 2020-03-01
Dimensions without coordinates: x, y

easyclimate.core.stats.calc_yearly_mean(data_input: xarray.DataArray, dim: str = 'time', **kwargs)¶

Calculate yearly mean.

For every adjacent sequence \(t_1, ..., t_n\) of timesteps of the same year it is:

\[o(t, x) = \mathrm{mean} \left \lbrace i(t', x), t_1 < t' \leqslant t_n \right\rbrace\]

Tip

This function uses xarray.DataArray.groupby to implement the calculation. To substantially improve the performance of GroupBy operations, particularly with dask install the flox package. flox extends Xarray’s in-built GroupBy capabilities by allowing grouping by multiple variables, and lazy grouping by dask arrays. If installed, Xarray will automatically use flox by default.

Parameters¶

data_input: xarray.DataArray.: xarray.DataArray to be calculated.

Note

The recommended frequence of the data_input is monthly.
dim: str: Dimension(s) over which to apply extracting. By default extracting is applied over the time dimension.
**kwargs:: Additional keyword arguments passed on to the appropriate array function for calculating mean on this object’s data. These could include dask-specific kwargs like split_every.

Returns¶

xarray.DataArray with time dimension type of numpy.datetime64.

Basic Statistical Analysis

easyclimate.core.stats.calc_yearly_sum(data_input: xarray.DataArray, dim: str = 'time', **kwargs)¶

Calculate yearly sum.

For every adjacent sequence \(t_1, ..., t_n\) of timesteps of the same year it is:

\[o(t, x) = \mathrm{sum} \left \lbrace i(t', x), t_1 < t' \leqslant t_n \right\rbrace\]

Tip

Parameters¶

data_input: xarray.DataArray.: xarray.DataArray to be calculated.

Note

The recommended frequence of the data_input is monthly.
dim: str: Dimension(s) over which to apply extracting. By default extracting is applied over the time dimension.
**kwargs:: Additional keyword arguments passed on to the appropriate array function for calculating sum on this object’s data. These could include dask-specific kwargs like split_every.

Returns¶

xarray.DataArray with time dimension type of numpy.datetime64.

easyclimate.core.stats.calc_yearly_std(data_input: xarray.DataArray, dim: str = 'time', **kwargs)¶

Calculate yearly standard deviation.

For every adjacent sequence \(t_1, ..., t_n\) of timesteps of the same year it is:

\[o(t, x) = \mathrm{std} \left \lbrace i(t', x), t_1 < t' \leqslant t_n \right\rbrace\]

Tip

Parameters¶

data_input: xarray.DataArray.: xarray.DataArray to be calculated.

Note

The recommended frequence of the data_input is monthly.
dim: str: Dimension(s) over which to apply extracting. By default extracting is applied over the time dimension.
**kwargs:: Additional keyword arguments passed on to the appropriate array function for calculating std on this object’s data. These could include dask-specific kwargs like split_every.

Note

The parameter ddof is Delta Degrees of Freedom: the divisor used in the calculation is N - ddof, where N represents the number of elements. If the data needs to be Normalize by (n-1), then ddof=1.

Returns¶

xarray.DataArray with time dimension type of numpy.datetime64.

easyclimate.core.stats.calc_yearly_var(data_input: xarray.DataArray, dim: str = 'time', **kwargs)¶

Calculate yearly standard deviation.

For every adjacent sequence \(t_1, ..., t_n\) of timesteps of the same year it is:

\[o(t, x) = \mathrm{var} \left \lbrace i(t', x), t_1 < t' \leqslant t_n \right\rbrace\]

Tip

Parameters¶

data_input: xarray.DataArray.: xarray.DataArray to be calculated.

Note

The recommended frequence of the data_input is monthly.
dim: str: Dimension(s) over which to apply extracting. By default extracting is applied over the time dimension.
**kwargs:: Additional keyword arguments passed on to the appropriate array function for calculating var on this object’s data. These could include dask-specific kwargs like split_every.

Note

The parameter ddof is Delta Degrees of Freedom: the divisor used in the calculation is N - ddof, where N represents the number of elements. If the data needs to be Normalize by (n-1), then ddof=1.

Returns¶

xarray.DataArray with time dimension type of numpy.datetime64.

easyclimate.core.stats.calc_yearly_max(data_input: xarray.DataArray, dim: str = 'time', **kwargs)¶

Calculate yearly standard deviation.

For every adjacent sequence \(t_1, ..., t_n\) of timesteps of the same year it is:

\[o(t, x) = \mathrm{max} \left \lbrace i(t', x), t_1 < t' \leqslant t_n \right\rbrace\]

Tip

Parameters¶

data_input: xarray.DataArray.: xarray.DataArray to be calculated.

Note

The recommended frequence of the data_input is monthly.
dim: str: Dimension(s) over which to apply extracting. By default extracting is applied over the time dimension.
**kwargs:: Additional keyword arguments passed on to the appropriate array function for calculating max on this object’s data. These could include dask-specific kwargs like split_every.

Returns¶

xarray.DataArray with time dimension type of numpy.datetime64.

easyclimate.core.stats.calc_yearly_min(data_input: xarray.DataArray, dim: str = 'time', **kwargs)¶

Calculate yearly standard deviation.

For every adjacent sequence \(t_1, ..., t_n\) of timesteps of the same year it is:

\[o(t, x) = \mathrm{min} \left \lbrace i(t', x), t_1 < t' \leqslant t_n \right\rbrace\]

Tip

Parameters¶

data_input: xarray.DataArray.: xarray.DataArray to be calculated.

Note

The recommended frequence of the data_input is monthly.
dim: str: Dimension(s) over which to apply extracting. By default extracting is applied over the time dimension.
**kwargs:: Additional keyword arguments passed on to the appropriate array function for calculating min on this object’s data. These could include dask-specific kwargs like split_every.

Returns¶

xarray.DataArray with time dimension type of numpy.datetime64.