easyclimate.core.normalized¶
Normalized Data
Functions¶
|
Perform Z-Score standardization on an xarray time series. |
|
Perform Min-Max standardization on an xarray time series. |
|
Perform Robust standardization on an xarray time series. |
|
Perform Mean normalization on an xarray time series. |
|
Calculate Precipitation Anomaly Percentage (PAP, 降水距平百分率) |
Module Contents¶
- easyclimate.core.normalized.timeseries_normalize_zscore(da: xarray.DataArray, dim: str = 'time', time_range: slice = slice(None, None), ddof: int = 1) xarray.DataArray¶
Perform Z-Score standardization on an xarray time series.
This function standardizes the input data by transforming it to have a mean of 0 and a standard deviation of 1, using the formula:
\[z = \frac{x - \mu}{\sigma}\]where \(\mu\) is the mean and \(\sigma\) is the standard deviation.
Parameters¶
- da:
xarray.DataArray. The input time series data to be standardized.
- dim:
str, default: time. The dimension along which to compute the mean and standard deviation. By default, standardization is applied over the time dimension.
- time_range:
slice, default: slice(None, None). The time range of
dato be normalized. The default value is the entire time range.- ddof:
int, default: 1. Delta degrees of freedom for standard deviation calculation. The divisor used in calculations is \(N - \mathrm{ddof}\), where \(N\) is the number of elements.
Returns¶
xarray.DataArray.The standardized data with mean 0 and standard deviation 1 along the specified dimension.
Note
Applicable Scenarios: Suitable for data that is approximately normally distributed or when comparing variables with different units in machine learning or statistical analysis.
Advantages: Retains the relative distribution characteristics of the data, widely used in algorithms requiring standardized inputs.
Disadvantages: Sensitive to outliers, which can skew the mean and standard deviation.
- da:
- easyclimate.core.normalized.timeseries_normalize_minmax(da: xarray.DataArray, dim: str = 'time', time_range: slice = slice(None, None), feature_range: tuple[float, float] = (0, 1)) xarray.DataArray¶
Perform Min-Max standardization on an xarray time series.
This function linearly scales the data to a specified range (default \([0, 1]\)) using the formula:
\[x' = \frac{x - x_{\min}}{x_{\max} - x_{\min}} \cdot (b - a) + a\]where \((a, b)\) are the target range bounds.
Parameters¶
- da:
xarray.DataArray. The input time series data to be standardized.
- dim:
str, default: time. The dimension along which to compute the minimum and maximum values. By default, standardization is applied over the time dimension.
- time_range:
slice, default: slice(None, None). The time range of
dato be normalized. The default value is the entire time range.- feature_range:
tuple[float, float], default:(0, 1). The target range for scaling the data, specified as (min, max).
Returns¶
xarray.DataArray.The standardized data scaled to the specified range.
Note
Applicable Scenarios: Ideal for neural network inputs or when data needs to be constrained to a fixed range.
Advantages: Simple and intuitive, preserves relative relationships in the data.
Disadvantages: Sensitive to outliers, as the range depends on the minimum and maximum values.
- da:
- easyclimate.core.normalized.timeseries_normalize_robust(da: xarray.DataArray, dim: str = 'time', time_range: slice = slice(None, None), q_low: float = 0.25, q_high: float = 0.75) xarray.DataArray¶
Perform Robust standardization on an xarray time series.
This function standardizes the data using the median and interquartile range (IQR), with the formula:
\[x' = \frac{x - \text{median}}{\text{IQR}},\]where \(\mathrm{IQR = Q_3 - Q_1}\).
Parameters¶
- da:
xarray.DataArray. The input time series data to be standardized.
- dim:
str, default: time. The dimension along which to compute the median and IQR. By default, standardization is applied over the time dimension.
- time_range:
slice, default: slice(None, None). The time range of
dato be normalized. The default value is the entire time range.- q_low:
float, default: 0.25. The lower quantile for IQR calculation (\(Q_1\)).
- q_high:
float, default: 0.75. The upper quantile for IQR calculation (\(Q_3\)).
Returns¶
xarray.DataArray.The standardized data based on median and IQR.
Note
Applicable Scenarios: Suitable for data with many outliers or non-normal distributions.
Advantages: Robust to outliers, providing a more stable standardization for skewed data.
Disadvantages: May lose some distribution information compared to Z-Score standardization.
- da:
- easyclimate.core.normalized.timeseries_normalize_mean(da: xarray.DataArray, dim: str = 'time', time_range: slice = slice(None, None)) xarray.DataArray¶
Perform Mean normalization on an xarray time series.
This function centers the data around zero and scales it by the range, using the formula:
\[x' = \frac{x - \mu}{x_{\max} - x_{\min}},\]where \(\mu\) is the mean.
Parameters¶
- da:
xarray.DataArray. The input time series data to be standardized.
- dim:
str, default: time. The dimension along which to compute the mean and range. By default, standardization is applied over the time dimension.
- time_range:
slice, default: slice(None, None). The time range of
dato be normalized. The default value is the entire time range.
Returns¶
xarray.DataArray.The normalized data centered around zero.
Note
Applicable Scenarios: Useful when centering data is needed without enforcing a standard deviation of 1.
Advantages: Simple, partially preserves data distribution characteristics.
Disadvantages: Sensitive to outliers, and the scaling range is not fixed.
- da:
- easyclimate.core.normalized.calc_precip_anomaly_percentage(precip_data: xarray.DataArray, freq: Literal['monthly', 'seasonly', 'yearly'] = 'monthly', time_range: slice = slice(None, None), time_dim: str = 'time') xarray.DataArray¶
Calculate Precipitation Anomaly Percentage (PAP, 降水距平百分率)
\[P _{a} = \frac {P - \bar {P}} {\bar {P}} \times 100 \%\]Where, \(P_a\) is PAP, \(P\) is the rainfall of a certain period, \(\bar{P}= \frac {1} {n}\sum _{i=1}^{n} P_{ i }\) is the long-term average rainfall of the period, \(n\) is \(1\) to \(n\) years, \(i=1,2,\cdots,n\).
Parameters¶
- precip_data
xarray.DataArray. Precipitation data, recommended units:
mm/monthormm/day(converted to monthly cumulative). Dimensions must include time, e.g., (time, lat, lon)
Caution
precip_datashould be applied to monthly means precipitation.- time_range:
slice, default: slice(None, None). The time range of baseline climatology period, e.g.,
slice('1991-01', '2020-12'). The default value is the entire time range.- time_dimstr, default
"time" Time dimension name
- freq{“monthly”, “seasonly”, “yearly”, or custom seasons}.
Time grouping method, options:
“monthly”: Calculate climatology by month
“seasonly”: Calculate by meteorological seasons (DJF, MAM, JJA, SON)
“yearly”: Calculate climatology by year
custom seasons: Calculate climatology by custom seasons, e.g.,
JJAS,OND.
Note
This method is based on
xarray.groupers.SeasonResamplerfor resampling, and you can customize the seasons by referring to the examples.
Returns¶
- pap
xarray.DataArray(%) Precipitation Anomaly Percentage (PAP).
Reference¶
Zhai, P., Zhang, X., Wan, H., & Pan, X. (2005). Trends in total precipitation and frequency of daily precipitation extremes over China. Journal of Climate, 18(7), 1096–1108. https://doi.org/10.1175/JCLI-3318.1
Zou, Y., Wu, H., Lin, X., & Wang, Y. (2019). A quantitative method for the assessment of annual state of climate (气候年景定量化评价方法). Acta Meteorologica Sinica (in Chinese), 77(6), 1124–1133. https://doi.org/10.11676/qxxb2019.067
GB/T 20481-2017, Classification of meteorological drought (气象干旱等级, in Chinese) https://std.samr.gov.cn/gb/search/gbDetailed?id=71F772D81C2DD3A7E05397BE0A0AB82A.
Yang Shao-E and Wu Bing-fang, “Calculation of monthly precipitation anomaly percentage using web-serviced remote sensing data,” 2010 2nd International Conference on Advanced Computer Control, Shenyang, China, 2010, pp. 621-625, doi: http://doi.org/10.1109/ICACC.2010.5486796.
Ma, Y., Zhao, L., Wang, J.-S., & Yu, T. (2021). Increasing difference of China summer precipitation statistics between percentage anomaly and probability distribution methods due to tropical warming. Earth and Space Science, 8, e2021EA001777. https://doi.org/10.1029/2021EA001777
Wang, Y., Wang, S., Luo, F., & Wang, H. (2022). Strengthened impacts of Indian Ocean Dipole on the Yangtze precipitation contribute to the extreme rainfall of 2020 Meiyu season. Journal of Geophysical Research: Atmospheres, 127, e2022JD037028. https://doi.org/10.1029/2022JD037028
- precip_data