robpy.univariate

Base

class robpy.univariate.base.LocationOrScaleEstimator(*args, **kwargs)[source]: Bases: Protocol

class robpy.univariate.base.RobustScale(*, can_handle_nan: bool = False)[source]

Bases: ABC

Base class for robust univariate scale estimators

Parameters:: can_handle_nan (bool, optional) – Attribute specifying if the robust scaler can handles nans. Defaults to False.

fit(X: ndarray, ignore_nan: bool = False) → RobustScale[source]

property location

property scale

Minimum Covariance Determinant

class robpy.univariate.mcd.UnivariateMCD(alpha: float | int | None = None, consistency_correction: bool = True)[source]

Bases: RobustScale

Implementation of univariate MCD (Hubert & Debruyne, 2009)

Parameters:

alpha (float or int, optional) – size of the h subset. If an integer between n/2 and n is passed, it is interpreted as an absolute value. If a float between 0.5 and 1 is passed, it is interpreted as a proportation of n (the training set size). If None, it is set to floor(n/2) + 1. Defaults to None.
consistency_correction (boolean, optional) – whether the estimates should be consistent at the normal model. Defaults to True.

References

Hubert, M., & Debruyne, M. (2010). Minimum covariance determinant.: Wiley interdisciplinary reviews: Computational statistics, 2(1), 36-43.

Onestep M-estimator

class robpy.univariate.onestep_m.CellwiseOneStepM[source]

Bases: OneStepM

Implementation of the single step M estimator (robLoc and robScale) proposed in Rousseeuw, P. J., & Bossche, W. V. D. (2018). In this paper, the location rho function is set to TukeyBiWeight(c=3) and the scale rho function to Huber(b=2.5)

References

Rousseeuw, P. J., & Bossche, W. V. D. (2018). Detecting deviating data cells. Technometrics, 60(2), 135-145.

class robpy.univariate.onestep_m.HuberOneStepM[source]

Bases: OneStepM

Implementation of Huber M-estimator with 1 step: location and scale

[analoguous to estLocScale {cellWise}: type =”hubhub” https://github.com/cran/cellWise/blob/master/src/LocScaleEstimators.cpp]

class robpy.univariate.onestep_m.OneStepM(loc_rho: BaseRho, scale_rho: BaseRho, delta: float, min_abs_scale: float = 1e-12)[source]

Bases: RobustScale

Implementation of the single-step M-estimator for location and scale

Parameters:

loc_rho – rho function for scale estimation (e.g. Huber(b=1.5))
scale_rho – rho function for scale estimation (e.g. Huber(b=2.5))
delta (float, optional) – Consistency factor at normal model depending on b: quad(np.minimum(np.abs(x**2), b**2) * norm.pdf(x), -np.inf, np.inf, args=(b))
min_abs_scale (float, optional) – Only if mad is larger than min_abs_scale the M estimator will be calculated. Defaults to 1e-12.

References

Rousseeuw, P. J., & Bossche, W. V. D. (2018). Detecting deviating data cells. Technometrics, 60(2), 135-145. –> loc_rho = TukeyBiWeight(c=3) and scale_rho = Huber(b=2.5)

class robpy.univariate.onestep_m.OneStepWrapping[source]

Bases: RobustScale

[analoguous to estLocScale {cellWise}: type =”wrap” https://github.com/cran/cellWise/blob/master/src/LocScaleEstimators.cpp]

Qn Estimator

class robpy.univariate.qn.Qn(location_func: ~robpy.univariate.base.LocationOrScaleEstimator = <function median>, consistency_correction: bool = True, finite_correction: bool = True)[source]

Bases: RobustScale

Implementation of Qn estimator

[Time-efficient algorithms for two highly robust estimators of scale, Christophe Croux and Peter J. Rousseeuw (1992)] [Selecting the k^th element in X+Y and X1+…+Xm, Donald B. Johnson and Tetsuo Mizoguchi (1978)]

Parameters:

location_func (LocationOrScaleEstimator, optional) – as the Qn estimator does not estimate location, a location function should be explicitly passed.
consistency_correction (bool, optional) – boolean indicating if consistency for normality should be applied. Defaults to True.
finite_correction (bool, optional) – boolean indicating if finite sample correction should be applied. Defaults to True.

Tau Estimator

class robpy.univariate.tau.Tau(c1: float = 4.5, c2: float = 3.0, consistency_correction: bool = True)[source]

Bases: RobustScale

Implementation of tau estimator of scale

[Robust Estimates of Location and Dispersion for High-Dimensional Datasets, Ricarco A Maronna and Ruben H Zamar (2002)]

Parameters:

c1 (float, optional) – constant for the weight function, defaults to 4.5
c2 (float, optional) – constant for the rho function, defaults to 3.0
consistency_correction (bool, optional) – boolean indicating if consistency for normality should be applied. Defaults to True.

Adjusted Boxplot

class robpy.univariate.adjusted_boxplot.Boxplot(median: float, q1: float, q3: float, upper_whisker: float, lower_whisker: float)[source]

Bases: object

Container for boxplot statistics

lower_whisker: float

median: float

q1: float

q3: float

upper_whisker: float

robpy.univariate.adjusted_boxplot.adjusted_boxplot(X: ndarray | Series | DataFrame, plot: bool = True, ax: Axes | None = None, figsize: tuple[int, int] = (6, 6), **bxp_kwargs) → list[Boxplot][source]

Calculate and visualize an adjusted boxplot as described in Huber and Vandervieren (2004)

Parameters:

X (np.ndarray or pd.Series or pd.DataFrame) – An array of float values
plot (bool, optional) – Whether to plot the boxplot. Defaults to True.
ax (Axes, optional) – The matplotlib axes to plot the boxplot. If None, a new figure and axes will be created. Defaults to None.
figsize (tuple[int, int], optional) – Size of the plot. Defaults to (6,6).
bxp_kwargs (optional) – Additional keyword arguments to pass to matplotlib.axes.Axes.bxp.

Returns:

A list of containers with boxplot statistics for each variable in X

Return type:

list[Boxplot]

References

Hubert, M., & Vandervieren, E. (2008). An adjusted boxplot for skewed distributions. Computational statistics & data analysis, 52(12), 5186-5201.