robpy.outliers
Module containing all algorithms related to outlier detection.
Detect Deviating Cells
- class robpy.outliers.ddc.DDCEstimator(chi2_quantile: float = 0.99, min_correlation: float = 0.5, scale_estimator: ~robpy.univariate.base.RobustScaleEstimator = <robpy.univariate.onestep_m.CellwiseOneStepMEstimator object>)[source]
Bases:
OutlierMixinImplementation of the Detecting Deviating Cells (DDC) algorithm.
- Parameters:
chi2_quantile (float, optional) – Quantile of the chi-squared distribution to use as threshold for univariate outlier detection in step 2. Default is 0.99.
min_correlation (float, optional) – Minimum correlation between variables to consider them
scale_estimator (RobustScaleEstimator, optional) – robust scale estimator to scale the initial data with. Defaults to CellwiseOneStepMEstimator().
References
Rousseeuw, P. J., & Bossche, W. V. D. (2018). Detecting Deviating Data Cells. Technometrics, 60(2), 135–145. https://doi.org/10.1080/00401706.2017.1340909
R Implementation: https://www.rdocumentation.org/packages/cellWise/versions/2.5.3/topics/DDC
- cellmap(X: DataFrame, annotate: bool = False, fmt: str = '.1f', figsize: tuple[int, int] = (7, 10), row_zoom: tuple[int, int] | Index | None = None, col_zoom: tuple[int, int] | Index | None = None, vmax_clip: float = 3.290526731491895, cmap: str | Colormap = 'custom') Axes[source]
Visualize the standardized residuals of the DDC model as a heatmap.
- Parameters:
X (pd.DataFrame) – The original data used to fit the model.
annotate (bool, optional) – Whether to annotate the heatmap cells with the original values. Defaults to False.
fmt (str, optional) – Format to use for annotations. Defaults to “.1f”.
figsize (tuple[int, int], optional) – Figure size. Defaults to (7, 10).
row_zoom (tuple[int, int] | pd.Index | None, optional) – If not None, a subset of the rows is selected for visualization. A tuple is interpreted as a slice, a pd.Index as a selection. Defaults to None.
col_zoom (tuple[int, int] | pd.Index | None, optional) – Similar to row_zoom but for columns. Defaults to None.
vmax_clip (float) – standardized absolute residuals larger than vmax will get the darkest color and hence get clipped
cmap (str | matplotlib.colors.Colormap, optional) – matplotlib colormap or string, maps the data to the color space.
- Returns:
the matplotlib axes with the heatmap
- Return type:
Axes