robpy.univariate
Base
- class robpy.univariate.base.RobustScale(*, can_handle_nan: bool = False)[source]
Bases:
ABCBase class for robust univariate scale estimators
- Parameters:
can_handle_nan (bool, optional) – Attribute specifying if the robust scaler can handles nans. Defaults to False.
- fit(X: ndarray, ignore_nan: bool = False) RobustScale[source]
- property location
- property scale
Minimum Covariance Determinant
- class robpy.univariate.mcd.UnivariateMCD(alpha: float | int | None = None, consistency_correction: bool = True)[source]
Bases:
RobustScaleImplementation of univariate MCD (Hubert & Debruyne, 2009)
- Parameters:
alpha (float or int, optional) – size of the h subset. If an integer between n/2 and n is passed, it is interpreted as an absolute value. If a float between 0.5 and 1 is passed, it is interpreted as a proportation of n (the training set size). If None, it is set to floor(n/2) + 1. Defaults to None.
consistency_correction (boolean, optional) – whether the estimates should be consistent at the normal model. Defaults to True.
References
- Hubert, M., & Debruyne, M. (2010). Minimum covariance determinant.
Wiley interdisciplinary reviews: Computational statistics, 2(1), 36-43.
Onestep M-estimator
- class robpy.univariate.onestep_m.CellwiseOneStepM[source]
Bases:
OneStepMImplementation of the single step M estimator (robLoc and robScale) proposed in Rousseeuw, P. J., & Bossche, W. V. D. (2018). In this paper, the location rho function is set to TukeyBiWeight(c=3) and the scale rho function to Huber(b=2.5)
References
Rousseeuw, P. J., & Bossche, W. V. D. (2018). Detecting deviating data cells. Technometrics, 60(2), 135-145.
- class robpy.univariate.onestep_m.HuberOneStepM[source]
Bases:
OneStepMImplementation of Huber M-estimator with 1 step: location and scale
[analoguous to estLocScale {cellWise}: type =”hubhub” https://github.com/cran/cellWise/blob/master/src/LocScaleEstimators.cpp]
- class robpy.univariate.onestep_m.OneStepM(loc_rho: BaseRho, scale_rho: BaseRho, delta: float, min_abs_scale: float = 1e-12)[source]
Bases:
RobustScaleImplementation of the single-step M-estimator for location and scale
- Parameters:
loc_rho – rho function for scale estimation (e.g. Huber(b=1.5))
scale_rho – rho function for scale estimation (e.g. Huber(b=2.5))
delta (float, optional) – Consistency factor at normal model depending on b: quad(np.minimum(np.abs(x**2), b**2) * norm.pdf(x), -np.inf, np.inf, args=(b))
min_abs_scale (float, optional) – Only if mad is larger than min_abs_scale the M estimator will be calculated. Defaults to 1e-12.
References
Rousseeuw, P. J., & Bossche, W. V. D. (2018). Detecting deviating data cells. Technometrics, 60(2), 135-145. –> loc_rho = TukeyBiWeight(c=3) and scale_rho = Huber(b=2.5)
See also r code: https://rdrr.io/cran/cellWise/man/estLocScale.html
- class robpy.univariate.onestep_m.OneStepWrapping[source]
Bases:
RobustScale[analoguous to estLocScale {cellWise}: type =”wrap” https://github.com/cran/cellWise/blob/master/src/LocScaleEstimators.cpp]
Qn Estimator
- class robpy.univariate.qn.Qn(location_func: ~robpy.univariate.base.LocationOrScaleEstimator = <function median>, consistency_correction: bool = True, finite_correction: bool = True)[source]
Bases:
RobustScaleImplementation of Qn estimator
[Time-efficient algorithms for two highly robust estimators of scale, Christophe Croux and Peter J. Rousseeuw (1992)] [Selecting the k^th element in X+Y and X1+…+Xm, Donald B. Johnson and Tetsuo Mizoguchi (1978)]
- Parameters:
location_func (LocationOrScaleEstimator, optional) – as the Qn estimator does not estimate location, a location function should be explicitly passed.
consistency_correction (bool, optional) – boolean indicating if consistency for normality should be applied. Defaults to True.
finite_correction (bool, optional) – boolean indicating if finite sample correction should be applied. Defaults to True.
Tau Estimator
- class robpy.univariate.tau.Tau(c1: float = 4.5, c2: float = 3.0, consistency_correction: bool = True)[source]
Bases:
RobustScaleImplementation of tau estimator of scale
[Robust Estimates of Location and Dispersion for High-Dimensional Datasets, Ricarco A Maronna and Ruben H Zamar (2002)]
- Parameters:
c1 (float, optional) – constant for the weight function, defaults to 4.5
c2 (float, optional) – constant for the rho function, defaults to 3.0
consistency_correction (bool, optional) – boolean indicating if consistency for normality should be applied. Defaults to True.
Adjusted Boxplot
- class robpy.univariate.adjusted_boxplot.Boxplot(median: float, q1: float, q3: float, upper_whisker: float, lower_whisker: float)[source]
Bases:
objectContainer for boxplot statistics
- lower_whisker: float
- median: float
- q1: float
- q3: float
- upper_whisker: float
- robpy.univariate.adjusted_boxplot.adjusted_boxplot(X: ndarray | Series | DataFrame, plot: bool = True, ax: Axes | None = None, figsize: tuple[int, int] = (6, 6), **bxp_kwargs) list[Boxplot][source]
Calculate and visualize an adjusted boxplot as described in Huber and Vandervieren (2004)
- Parameters:
X (np.ndarray or pd.Series or pd.DataFrame) – An array of float values
plot (bool, optional) – Whether to plot the boxplot. Defaults to True.
ax (Axes, optional) – The matplotlib axes to plot the boxplot. If None, a new figure and axes will be created. Defaults to None.
figsize (tuple[int, int], optional) – Size of the plot. Defaults to (6,6).
bxp_kwargs (optional) – Additional keyword arguments to pass to matplotlib.axes.Axes.bxp.
- Returns:
A list of containers with boxplot statistics for each variable in X
- Return type:
list[Boxplot]
References
Hubert, M., & Vandervieren, E. (2008). An adjusted boxplot for skewed distributions. Computational statistics & data analysis, 52(12), 5186-5201.