easyclimate.core.normalized

Normalized Data

Functions

timeseries_normalize_zscore(, ddof)

Perform Z-Score standardization on an xarray time series.

timeseries_normalize_minmax(, feature_range, ...)

Perform Min-Max standardization on an xarray time series.

timeseries_normalize_robust(, q_low, q_high)

Perform Robust standardization on an xarray time series.

timeseries_normalize_mean() → xarray.DataArray)

Perform Mean normalization on an xarray time series.

calc_precip_anomaly_percentage(, time_dim)

Calculate Precipitation Anomaly Percentage (PAP, 降水距平百分率)

Module Contents

easyclimate.core.normalized.timeseries_normalize_zscore(da: xarray.DataArray, dim: str = 'time', time_range: slice = slice(None, None), ddof: int = 1) xarray.DataArray

Perform Z-Score standardization on an xarray time series.

This function standardizes the input data by transforming it to have a mean of 0 and a standard deviation of 1, using the formula:

\[z = \frac{x - \mu}{\sigma}\]

where \(\mu\) is the mean and \(\sigma\) is the standard deviation.

Parameters

da: xarray.DataArray.

The input time series data to be standardized.

dim: str, default: time.

The dimension along which to compute the mean and standard deviation. By default, standardization is applied over the time dimension.

time_range: slice, default: slice(None, None).

The time range of da to be normalized. The default value is the entire time range.

ddof: int, default: 1.

Delta degrees of freedom for standard deviation calculation. The divisor used in calculations is \(N - \mathrm{ddof}\), where \(N\) is the number of elements.

Returns

xarray.DataArray.

The standardized data with mean 0 and standard deviation 1 along the specified dimension.

Note

  • Applicable Scenarios: Suitable for data that is approximately normally distributed or when comparing variables with different units in machine learning or statistical analysis.

  • Advantages: Retains the relative distribution characteristics of the data, widely used in algorithms requiring standardized inputs.

  • Disadvantages: Sensitive to outliers, which can skew the mean and standard deviation.

easyclimate.core.normalized.timeseries_normalize_minmax(da: xarray.DataArray, dim: str = 'time', time_range: slice = slice(None, None), feature_range: tuple[float, float] = (0, 1)) xarray.DataArray

Perform Min-Max standardization on an xarray time series.

This function linearly scales the data to a specified range (default \([0, 1]\)) using the formula:

\[x' = \frac{x - x_{\min}}{x_{\max} - x_{\min}} \cdot (b - a) + a\]

where \((a, b)\) are the target range bounds.

Parameters

da: xarray.DataArray.

The input time series data to be standardized.

dim: str, default: time.

The dimension along which to compute the minimum and maximum values. By default, standardization is applied over the time dimension.

time_range: slice, default: slice(None, None).

The time range of da to be normalized. The default value is the entire time range.

feature_range: tuple[float, float], default: (0, 1).

The target range for scaling the data, specified as (min, max).

Returns

xarray.DataArray.

The standardized data scaled to the specified range.

Note

  • Applicable Scenarios: Ideal for neural network inputs or when data needs to be constrained to a fixed range.

  • Advantages: Simple and intuitive, preserves relative relationships in the data.

  • Disadvantages: Sensitive to outliers, as the range depends on the minimum and maximum values.

easyclimate.core.normalized.timeseries_normalize_robust(da: xarray.DataArray, dim: str = 'time', time_range: slice = slice(None, None), q_low: float = 0.25, q_high: float = 0.75) xarray.DataArray

Perform Robust standardization on an xarray time series.

This function standardizes the data using the median and interquartile range (IQR), with the formula:

\[x' = \frac{x - \text{median}}{\text{IQR}},\]

where \(\mathrm{IQR = Q_3 - Q_1}\).

Parameters

da: xarray.DataArray.

The input time series data to be standardized.

dim: str, default: time.

The dimension along which to compute the median and IQR. By default, standardization is applied over the time dimension.

time_range: slice, default: slice(None, None).

The time range of da to be normalized. The default value is the entire time range.

q_low: float, default: 0.25.

The lower quantile for IQR calculation (\(Q_1\)).

q_high: float, default: 0.75.

The upper quantile for IQR calculation (\(Q_3\)).

Returns

xarray.DataArray.

The standardized data based on median and IQR.

Note

  • Applicable Scenarios: Suitable for data with many outliers or non-normal distributions.

  • Advantages: Robust to outliers, providing a more stable standardization for skewed data.

  • Disadvantages: May lose some distribution information compared to Z-Score standardization.

easyclimate.core.normalized.timeseries_normalize_mean(da: xarray.DataArray, dim: str = 'time', time_range: slice = slice(None, None)) xarray.DataArray

Perform Mean normalization on an xarray time series.

This function centers the data around zero and scales it by the range, using the formula:

\[x' = \frac{x - \mu}{x_{\max} - x_{\min}},\]

where \(\mu\) is the mean.

Parameters

da: xarray.DataArray.

The input time series data to be standardized.

dim: str, default: time.

The dimension along which to compute the mean and range. By default, standardization is applied over the time dimension.

time_range: slice, default: slice(None, None).

The time range of da to be normalized. The default value is the entire time range.

Returns

xarray.DataArray.

The normalized data centered around zero.

Note

  • Applicable Scenarios: Useful when centering data is needed without enforcing a standard deviation of 1.

  • Advantages: Simple, partially preserves data distribution characteristics.

  • Disadvantages: Sensitive to outliers, and the scaling range is not fixed.

easyclimate.core.normalized.calc_precip_anomaly_percentage(precip_data: xarray.DataArray, freq: Literal['monthly', 'seasonly', 'yearly'] = 'monthly', time_range: slice = slice(None, None), time_dim: str = 'time') xarray.DataArray

Calculate Precipitation Anomaly Percentage (PAP, 降水距平百分率)

\[P _{a} = \frac {P - \bar {P}} {\bar {P}} \times 100 \%\]

Where, \(P_a\) is PAP, \(P\) is the rainfall of a certain period, \(\bar{P}= \frac {1} {n}\sum _{i=1}^{n} P_{ i }\) is the long-term average rainfall of the period, \(n\) is \(1\) to \(n\) years, \(i=1,2,\cdots,n\).

Parameters

precip_dataxarray.DataArray.

Precipitation data, recommended units: mm/month or mm/day (converted to monthly cumulative). Dimensions must include time, e.g., (time, lat, lon)

Caution

precip_data should be applied to monthly means precipitation.

time_range: slice, default: slice(None, None).

The time range of baseline climatology period, e.g., slice('1991-01', '2020-12'). The default value is the entire time range.

time_dimstr, default "time"

Time dimension name

freq{“monthly”, “seasonly”, “yearly”, or custom seasons}.

Time grouping method, options:

  • “monthly”: Calculate climatology by month

  • “seasonly”: Calculate by meteorological seasons (DJF, MAM, JJA, SON)

  • “yearly”: Calculate climatology by year

  • custom seasons: Calculate climatology by custom seasons, e.g., JJAS, OND.

Note

This method is based on xarray.groupers.SeasonResampler for resampling, and you can customize the seasons by referring to the examples.

Returns

papxarray.DataArray (%)

Precipitation Anomaly Percentage (PAP).

Reference

  • Zhai, P., Zhang, X., Wan, H., & Pan, X. (2005). Trends in total precipitation and frequency of daily precipitation extremes over China. Journal of Climate, 18(7), 1096–1108. https://doi.org/10.1175/JCLI-3318.1

  • Zou, Y., Wu, H., Lin, X., & Wang, Y. (2019). A quantitative method for the assessment of annual state of climate (气候年景定量化评价方法). Acta Meteorologica Sinica (in Chinese), 77(6), 1124–1133. https://doi.org/10.11676/qxxb2019.067

  • GB/T 20481-2017, Classification of meteorological drought (气象干旱等级, in Chinese) https://std.samr.gov.cn/gb/search/gbDetailed?id=71F772D81C2DD3A7E05397BE0A0AB82A.

  • Yang Shao-E and Wu Bing-fang, “Calculation of monthly precipitation anomaly percentage using web-serviced remote sensing data,” 2010 2nd International Conference on Advanced Computer Control, Shenyang, China, 2010, pp. 621-625, doi: http://doi.org/10.1109/ICACC.2010.5486796.

  • Ma, Y., Zhao, L., Wang, J.-S., & Yu, T. (2021). Increasing difference of China summer precipitation statistics between percentage anomaly and probability distribution methods due to tropical warming. Earth and Space Science, 8, e2021EA001777. https://doi.org/10.1029/2021EA001777

  • Wang, Y., Wang, S., Luo, F., & Wang, H. (2022). Strengthened impacts of Indian Ocean Dipole on the Yangtze precipitation contribute to the extreme rainfall of 2020 Meiyu season. Journal of Geophysical Research: Atmospheres, 127, e2022JD037028. https://doi.org/10.1029/2022JD037028