easyclimate.core.normalized¶

Normalized Data

Functions¶

`timeseries_normalize_zscore`(, ddof)	Perform Z-Score standardization on an xarray time series.
`timeseries_normalize_minmax`(, feature_range, ...)	Perform Min-Max standardization on an xarray time series.
`timeseries_normalize_robust`(, q_low, q_high)	Perform Robust standardization on an xarray time series.
`timeseries_normalize_mean`() → xarray.DataArray)	Perform Mean normalization on an xarray time series.
`calc_precip_anomaly_percentage`(, time_dim)	Calculate Precipitation Anomaly Percentage (PAP, 降水距平百分率)

Module Contents¶

easyclimate.core.normalized.timeseries_normalize_zscore(da: xarray.DataArray, dim: str = 'time', time_range: slice = slice(None, None), ddof: int = 1) → xarray.DataArray¶

Perform Z-Score standardization on an xarray time series.

This function standardizes the input data by transforming it to have a mean of 0 and a standard deviation of 1, using the formula:

\[z = \frac{x - \mu}{\sigma}\]

where \(\mu\) is the mean and \(\sigma\) is the standard deviation.

Parameters¶

da: xarray.DataArray.: The input time series data to be standardized.
dim: str, default: time.: The dimension along which to compute the mean and standard deviation. By default, standardization is applied over the time dimension.
time_range: slice, default: slice(None, None).: The time range of da to be normalized. The default value is the entire time range.
ddof: int, default: 1.: Delta degrees of freedom for standard deviation calculation. The divisor used in calculations is \(N - \mathrm{ddof}\), where \(N\) is the number of elements.

Returns¶

xarray.DataArray.: The standardized data with mean 0 and standard deviation 1 along the specified dimension.

Note

Applicable Scenarios: Suitable for data that is approximately normally distributed or when comparing variables with different units in machine learning or statistical analysis.
Advantages: Retains the relative distribution characteristics of the data, widely used in algorithms requiring standardized inputs.
Disadvantages: Sensitive to outliers, which can skew the mean and standard deviation.

Regression and Correlation Analyses

easyclimate.core.normalized.timeseries_normalize_minmax(da: xarray.DataArray, dim: str = 'time', time_range: slice = slice(None, None), feature_range: tuple[float, float] = (0, 1)) → xarray.DataArray¶

Perform Min-Max standardization on an xarray time series.

This function linearly scales the data to a specified range (default \([0, 1]\)) using the formula:

\[x' = \frac{x - x_{\min}}{x_{\max} - x_{\min}} \cdot (b - a) + a\]

where \((a, b)\) are the target range bounds.

Parameters¶

da: xarray.DataArray.: The input time series data to be standardized.
dim: str, default: time.: The dimension along which to compute the minimum and maximum values. By default, standardization is applied over the time dimension.
time_range: slice, default: slice(None, None).: The time range of da to be normalized. The default value is the entire time range.
feature_range: tuple[float, float], default: (0, 1).: The target range for scaling the data, specified as (min, max).

Returns¶

xarray.DataArray.: The standardized data scaled to the specified range.

Note

Applicable Scenarios: Ideal for neural network inputs or when data needs to be constrained to a fixed range.
Advantages: Simple and intuitive, preserves relative relationships in the data.
Disadvantages: Sensitive to outliers, as the range depends on the minimum and maximum values.

easyclimate.core.normalized.timeseries_normalize_robust(da: xarray.DataArray, dim: str = 'time', time_range: slice = slice(None, None), q_low: float = 0.25, q_high: float = 0.75) → xarray.DataArray¶

Perform Robust standardization on an xarray time series.

This function standardizes the data using the median and interquartile range (IQR), with the formula:

\[x' = \frac{x - \text{median}}{\text{IQR}},\]

where \(\mathrm{IQR = Q_3 - Q_1}\).

Parameters¶

da: xarray.DataArray.: The input time series data to be standardized.
dim: str, default: time.: The dimension along which to compute the median and IQR. By default, standardization is applied over the time dimension.
time_range: slice, default: slice(None, None).: The time range of da to be normalized. The default value is the entire time range.
q_low: float, default: 0.25.: The lower quantile for IQR calculation (\(Q_1\)).
q_high: float, default: 0.75.: The upper quantile for IQR calculation (\(Q_3\)).

Returns¶

xarray.DataArray.: The standardized data based on median and IQR.

Note

Applicable Scenarios: Suitable for data with many outliers or non-normal distributions.
Advantages: Robust to outliers, providing a more stable standardization for skewed data.
Disadvantages: May lose some distribution information compared to Z-Score standardization.

easyclimate.core.normalized.timeseries_normalize_mean(da: xarray.DataArray, dim: str = 'time', time_range: slice = slice(None, None)) → xarray.DataArray¶

Perform Mean normalization on an xarray time series.

This function centers the data around zero and scales it by the range, using the formula:

\[x' = \frac{x - \mu}{x_{\max} - x_{\min}},\]

where \(\mu\) is the mean.

Parameters¶

da: xarray.DataArray.: The input time series data to be standardized.
dim: str, default: time.: The dimension along which to compute the mean and range. By default, standardization is applied over the time dimension.
time_range: slice, default: slice(None, None).: The time range of da to be normalized. The default value is the entire time range.

Returns¶

xarray.DataArray.: The normalized data centered around zero.

Note

Applicable Scenarios: Useful when centering data is needed without enforcing a standard deviation of 1.
Advantages: Simple, partially preserves data distribution characteristics.
Disadvantages: Sensitive to outliers, and the scaling range is not fixed.

easyclimate.core.normalized.calc_precip_anomaly_percentage(precip_data: xarray.DataArray, freq: Literal['monthly', 'seasonly', 'yearly'] = 'monthly', time_range: slice = slice(None, None), time_dim: str = 'time') → xarray.DataArray¶

Calculate Precipitation Anomaly Percentage (PAP, 降水距平百分率)

\[P _{a} = \frac {P - \bar {P}} {\bar {P}} \times 100 \%\]

Where, \(P_a\) is PAP, \(P\) is the rainfall of a certain period, \(\bar{P}= \frac {1} {n}\sum _{i=1}^{n} P_{ i }\) is the long-term average rainfall of the period, \(n\) is \(1\) to \(n\) years, \(i=1,2,\cdots,n\).

Parameters¶

precip_dataxarray.DataArray.: Precipitation data, recommended units: mm/month or mm/day (converted to monthly cumulative). Dimensions must include time, e.g., (time, lat, lon)

Caution

precip_data should be applied to monthly means precipitation.

time_range: slice, default: slice(None, None).

The time range of baseline climatology period, e.g., slice('1991-01', '2020-12'). The default value is the entire time range.

time_dimstr, default "time"

Time dimension name

freq{“monthly”, “seasonly”, “yearly”, or custom seasons}.

Time grouping method, options:

“monthly”: Calculate climatology by month
“seasonly”: Calculate by meteorological seasons (DJF, MAM, JJA, SON)
“yearly”: Calculate climatology by year
custom seasons: Calculate climatology by custom seasons, e.g., JJAS, OND.

Note

This method is based on xarray.groupers.SeasonResampler for resampling, and you can customize the seasons by referring to the examples.

Returns¶

papxarray.DataArray (%): Precipitation Anomaly Percentage (PAP).

Reference¶

Zhai, P., Zhang, X., Wan, H., & Pan, X. (2005). Trends in total precipitation and frequency of daily precipitation extremes over China. Journal of Climate, 18(7), 1096–1108. https://doi.org/10.1175/JCLI-3318.1
Zou, Y., Wu, H., Lin, X., & Wang, Y. (2019). A quantitative method for the assessment of annual state of climate (气候年景定量化评价方法). Acta Meteorologica Sinica (in Chinese), 77(6), 1124–1133. https://doi.org/10.11676/qxxb2019.067
GB/T 20481-2017, Classification of meteorological drought (气象干旱等级, in Chinese) https://std.samr.gov.cn/gb/search/gbDetailed?id=71F772D81C2DD3A7E05397BE0A0AB82A.
Yang Shao-E and Wu Bing-fang, “Calculation of monthly precipitation anomaly percentage using web-serviced remote sensing data,” 2010 2nd International Conference on Advanced Computer Control, Shenyang, China, 2010, pp. 621-625, doi: http://doi.org/10.1109/ICACC.2010.5486796.
Ma, Y., Zhao, L., Wang, J.-S., & Yu, T. (2021). Increasing difference of China summer precipitation statistics between percentage anomaly and probability distribution methods due to tropical warming. Earth and Space Science, 8, e2021EA001777. https://doi.org/10.1029/2021EA001777
Wang, Y., Wang, S., Luo, F., & Wang, H. (2022). Strengthened impacts of Indian Ocean Dipole on the Yangtze precipitation contribute to the extreme rainfall of 2020 Meiyu season. Journal of Geophysical Research: Atmospheres, 127, e2022JD037028. https://doi.org/10.1029/2022JD037028