robpy.regression

Base

class robpy.regression.base.RobustRegression[source]

Bases: RegressorMixin, BaseEstimator

fit(X, y) RobustRegression[source]
outlier_map(X, y, robust_scaling: bool = True, robust_distance: bool = True, vertical_outlier_threshold: float = 2.5, leverage_threshold_percentile: float = 0.975, figsize: tuple[int, int] = (4, 4), return_data: bool = False) None | tuple[ndarray, ndarray, ndarray, float, float][source]

Create a diagnostic plot where robust residuals are plotted against the robust mahalabobis distances of the training data.

Parameters:
  • X (array like of shape (n_samples, n_features)) – training features

  • y (array like of shape (n_samples, )) – training targets

  • robust_scaling (bool) – whether to scale residuals using MAD instead of std

  • robust_distance (bool) – whether to use MCD as loc/scale estimator instead of mean/cov for calculating the Mahalanobis distances

  • vertical_outlier_threshold – where to draw the upper (and lower) limit for the standardized residuals to indicate outliers

  • leverage_threshold_percentile – which percentile from the chisquare distribution to use to set as threshold for leverage points

  • figsize (tuple[int, int], optional) – Size of the plot. Defaults to (10, 4).

  • return_data (bool, optional) – Whether to return the residuals, the standardized residuals and the distances. Defaults to False.

predict(X)[source]
property scale: float

Least Trimmed Squares

class robpy.regression.lts.FastLTSRegression(alpha: float = 0.5, n_initial_subsets: int = 500, n_initial_c_steps: int = 2, n_best_models: int = 10, reweighting: bool = True, tolerance: float = 1e-15, random_state: int = 42)[source]

Bases: RobustRegression

Implementation of FAST-LTS model based on R implementation of the ltsReg method in the robustbase R package (cfr. https://www.rdocumentation.org/packages/robustbase/versions/0.93-8/topics/ltsReg) and the python implementation Reweighted-FastLTS (cfr. https://github.com/GiuseppeCannata/Reweighted-FastLTS/blob/master/Reweighted_FastLTS.py)

Initialize a FAST LTS regression

Parameters:
  • alpha (float) – percentage of data to consider as subset for calculating the trimmed squared error. Must be between 0.5 and 1, with 1 being equal to normal LS regression. Defaults to 0.5.

  • n_initial_subset (int) – number of initial subsets to apply C-steps on (cfr m in original R implementatino). Defaults to 500.

  • n_initial_c_steps (int) – number of c-steps to apply on n_initial_subsets before final c-steps until convergenge . Defaults to 2.

  • n_best_models (int) – number of best models after initial c-steps to consider until convergence. Defaults to 10.

  • reweighting (bool) – Whether to apply reweighting to the raw estimates. Defaults to True.

  • tolerance (float) – Acceptable delta in loss value between C-steps. If current loss - previous loss <= tolerance, model is converged. Defaults to 1e-15.

fit(X: ndarray | DataFrame, y: ndarray | Series, initial_weights: ndarray | None = None, verbosity: int = 20) FastLTSRegression[source]

Fit the model to the data

Parameters:
  • X (np.ndarray) – Training features

  • y (np.ndarray) – Training labels

  • initial_weights (Optional[np.ndarray], optional) – Optionally pass fixed initial weights, in case of n_initial_subsets > 1, this means all models start from the same initial weights. There is therefore no benefit from setting n_initial_subsets > 1 Defaults to None.

  • verbosity (int, optional) – [description]. Defaults to logging.INFO.

Returns:

The fitted FastLTS object

predict(X: ndarray | DataFrame) ndarray[source]
robpy.regression.lts.get_correction_factor(p: int, n: int, alpha: float) float[source]

Calculate the small sample correction factor for the scale resulting from LTS regression.

References

Pison, G., Van Aelst, S. & Willems, G. Small sample corrections for LTS and MCD. Metrika 55, 111–123 (2002). https://doi.org/10.1007/s001840200191

https://github.com/cran/robustbase/blob/c4b9d21cfc4beb64653bb2ffba9e549e2dbb98ed/R/ltsReg.R

robpy.regression.lts.get_correction_factor_reweighting(p: int, n: int, alpha: float) float[source]

Calculate the small sample correction factor for the scale resulting from LTS regression.

References

Pison, G., Van Aelst, S. & Willems, G. Small sample corrections for LTS and MCD. Metrika 55, 111–123 (2002). https://doi.org/10.1007/s001840200191

https://github.com/cran/robustbase/blob/c4b9d21cfc4beb64653bb2ffba9e549e2dbb98ed/R/ltsReg.R

S Regression

class robpy.regression.s.SRegression(rho: ~robpy.utils.rho.BaseRho = <robpy.utils.rho.TukeyBisquare object>, n_initial_subsets: int = 500, n_initial_i_steps: int = 2, n_best_subsets: int = 5, max_scale_iterations: int = 2, b: float = 0.5, fit_intercept: bool = True, relative_tolerance: float = 1e-07, scale_tolerance: float = 1e-10, random_state: int = 101)[source]

Bases: RobustRegression

Fast S algorithm similar to lmrob.S on robustbase (https://search.r-project.org/CRAN/refmans/robustbase/html/lmrob.S.html). S-estimation was initially described in

Rousseeuw, P. J., and Yohai, V. J. (1984), “Robust Regression by Means of S-Estimators,” in Robust and Nonlinear Time Series, eds. J. Franke, W. Hardie, and D. Martin, Lecture Notes in Statistics, 26, Berlin: Springer-Verlag, pp. 256-272.

This code is an implementation of the Fast S algorithm described in

Salibian-Barrera, M., & Yohai, V. J. (2006). A Fast Algorithm for S-Regression Estimates. Journal of Computational and Graphical Statistics, 15(2), 414–427. http://www.jstor.org/stable/27594186

Fast S estimator.

Parameters:
  • rho (BaseRho, optional) – score function to use on the residuals. Defaults to bisquare.

  • n_initial_subsets (int, optional) – Number of initial subsets to sample (N in the original paper). Defaults to 500.

  • n_initial_i_steps (int, optional) – Number of i-steps to take on the initial subsets (k in the original paper). Defaults to 2.

  • n_best_subsets (int, optional) – Number of subsets with the best M-scales (residuals transformered by score function) (t in the original paper). Defaults to 5.

  • max_scale_iterations (int, optional) – number of iterative steps to derive M-scale estimates (r in the original paper).

  • b (float, optional) – constant on the RHS of the M scale equation

  • fit_intercept (bool, optional) – Whether an intercept should be included in the linear regression

  • relative_tolerance (float, optional) – Determines the stopping criterium for the i-steps untill convergence (diff in beta norm should be higher then relative_tolerance * max(relative_tolerance, beta_norm))

  • scale_tolerance (float, optional) – If the difference between 2 subsequent scale estimates is below this threshold, the iterations are stopped and it is assumed the scale estimate converged.

fit(X, y, verbosity=30)[source]
predict(X) ndarray[source]

MM Regression

class robpy.regression.mm.MMRegression(initial_estimator: ~robpy.regression.base.RobustRegression = SRegression(), rho: ~robpy.utils.rho.BaseRho = <robpy.utils.rho.TukeyBisquare object>, max_iterations: int = 500, epsilon: float = 1e-07)[source]

Bases: RobustRegression

Implementation of MM-regression estimator

References

https://www.jstor.org/stable/2241331

fit(X: ndarray | DataFrame, y: ndarray | Series) MMRegression[source]
predict(X: ndarray | DataFrame) ndarray[source]