robpy.univariate

Base

class robpy.univariate.base.LocationOrScaleEstimator(*args, **kwargs)[source]

Bases: Protocol

class robpy.univariate.base.RobustScale(*, can_handle_nan: bool = False)[source]

Bases: ABC

Base class for robust univariate scale estimators.

Parameters:

can_handle_nan (bool, optional) – Attribute specifying if the robust scaler can handle nans. Defaults to False.

fit(X: ndarray, ignore_nan: bool = False) RobustScale[source]
property location
property scale

Minimum Covariance Determinant

class robpy.univariate.mcd.UnivariateMCD(alpha: float | int | None = None, consistency_correction: bool = True)[source]

Bases: RobustScale

Implementation of the \(O(n \log n)\) algorithm for the univariate MCD on pages 171-172 of Rousseeuw, P.J., & Leroy, A. (1987).

Parameters:
  • alpha (float | int | None, optional) – Size of the h subset. If an integer between n/2 and n is passed, it is interpreted as an absolute value. If a float between 0.5 and 1 is passed, it is interpreted as a proportion of n (the training set size). If None or below [n/2] + 1, it is set to [n/2] + 1. Defaults to None.

  • consistency_correction (boolean, optional) – Whether the estimates should be consistent at the normal model. Defaults to True.

References

  • Rousseeuw, P.J., & Leroy, A. (1987). Robust Regression and Outlier Detection. John Wiley & Sons, New York.

Onestep M-estimator

class robpy.univariate.onestep_m.CellwiseOneStepM[source]

Bases: OneStepM

Implementation of the single step M estimator (robLoc and robScale) proposed in Rousseeuw, P. J., & Van Den Bossche, W. (2018). In this paper, the location rho function is set to TukeyBiWeight(c=3) and the scale rho function to Huber(b=2.5).

References

  • Rousseeuw, P. J., & Van Den Bossche, W. (2018). Detecting deviating data cells. Technometrics, 60(2), 135-145.

class robpy.univariate.onestep_m.HuberOneStepM[source]

Bases: OneStepM

Implementation of Huber M-estimator with 1 step: location and scale. Analogous to the R function estLocScale in the package cellWise using type = hubhub. (cfr. https://github.com/cran/cellWise/blob/master/src/LocScaleEstimators.cpp)

class robpy.univariate.onestep_m.OneStepM(loc_rho: BaseRho, scale_rho: BaseRho, delta: float, min_abs_scale: float = 1e-12)[source]

Bases: RobustScale

Implementation of the single-step M-estimator for location and scale.

Parameters:
  • loc_rho (BaseRho) – Rho function for scale estimation (e.g. Huber(b=1.5)).

  • scale_rho (BaseRho) – Rho function for scale estimation (e.g. Huber(b=2.5)).

  • delta (float) – Consistency factor at normal model depending on b: quad(np.minimum(np.abs(x**2), b**2) * norm.pdf(x), -np.inf, np.inf, args=(b)).

  • min_abs_scale (float, optional) – Only if MAD is larger than min_abs_scale the M-estimator will be calculated. Defaults to 1e-12.

References

Rousseeuw, P. J., & Van Den Bossche, W. (2018). Detecting deviating data cells. Technometrics, 60(2), 135-145.

class robpy.univariate.onestep_m.OneStepWrapping[source]

Bases: RobustScale

Analogous to the R function estLocScale in the package cellWise using type = wrap. (cfr. https://github.com/cran/cellWise/blob/master/src/LocScaleEstimators.cpp)

Qn Estimator

class robpy.univariate.qn.Qn(location_func: ~robpy.univariate.base.LocationOrScaleEstimator = <function median>, consistency_correction: bool = True, finite_correction: bool = True)[source]

Bases: RobustScale

The Qn estimator of Rousseeuw, P.J. & Croux, C. (1993) as implemented in the \(O(n \log n)\) algorithm of Croux, C. & Rousseeuw, P.J. (1992).

Parameters:
  • location_func (LocationOrScaleEstimator, optional) – As the Qn estimator does not estimate location, a location function should be explicitly passed. Defaults to np.median.

  • consistency_correction (boolean, optional) – Boolean indicating if consistency for normality should be applied. Defaults to True.

  • finite_correction (boolean, optional) – Boolean indicating if finite sample correction should be applied. Defaults to True.

References

  • Croux, C., & Rousseeuw, P. J. (1992). Time-efficient algorithms for two highly robust estimators of scale. In Computational Statistics: Volume 1: Proceedings of the 10th Symposium on Computational Statistics (pp. 411-428). Heidelberg: Physica-Verlag HD.

  • Rousseeuw P.J., & Croux, C. (1993). Alternatives to the Median Absolute Deviation. Journal of the American Statistical Association, 88(424), 1273–1283

Tau Estimator

class robpy.univariate.tau.Tau(c1: float = 4.5, c2: float = 3.0, consistency_correction: bool = True)[source]

Bases: RobustScale

Implementation of tau estimator of scale of Yohai, V.J. & Zamar, R.H. (1988).

Parameters:
  • c1 (float, optional) – Constant for the weight function, defaults to 4.5.

  • c2 (float, optional) – Constant for the rho function, defaults to 3.0.

  • consistency_correction (bool, optional) – boolean indicating if consistency for normality should be applied. Defaults to True.

References

  • Yohai, V.J. & Zamar, R.H. (1988). High breakdown estimates of regression by means of the minimization of an efficient scale. Journal of the American Statistical Association, 83(402), 406-413.

Adjusted Boxplot

class robpy.univariate.adjusted_boxplot.Boxplot(median: float, q1: float, q3: float, upper_whisker: float, lower_whisker: float)[source]

Bases: object

Container for boxplot statistics.

lower_whisker: float
median: float
q1: float
q3: float
upper_whisker: float
robpy.univariate.adjusted_boxplot.adjusted_boxplot(X: ndarray | Series | DataFrame, plot: bool = True, ax: Axes | None = None, figsize: tuple[int, int] = (6, 6), **bxp_kwargs) list[Boxplot][source]

Calculate and visualize an adjusted boxplot as described in Hubert, M., & Vandervieren, E. (2008).

Parameters:
  • X (np.ndarray or pd.Series or pd.DataFrame) – An array of float values.

  • plot (bool, optional) – Whether to plot the boxplot. Defaults to True.

  • ax (Axes, optional) – The matplotlib axes to plot the boxplot. If None, a new figure and axes will be created. Defaults to None.

  • figsize (tuple[int, int], optional) – Size of the plot. Defaults to (6,6).

  • bxp_kwargs (optional) – Additional keyword arguments to pass to matplotlib.axes.Axes.bxp.

Returns:

A list of containers with boxplot statistics for each variable in X.

Return type:

  • list[Boxplot]

References

  • Hubert, M., & Vandervieren, E. (2008). An adjusted boxplot for skewed distributions. Computational statistics & data analysis, 52(12), 5186-5201.