robpy.pca

Base

class robpy.pca.base.RobustPCA(*, n_components: int | None = None)[source]

Bases: _BasePCA

Base class for robust PCA estimators

Parameters:: n_components (int | None, optional) – Number of components to select. If None, it is set during fit to min (X.shape)

abstract fit(X: ndarray)[source]

Fit the robust PCA model to the data

Parameters:: X (np.ndarray) – Data to fit the model to

plot_outlier_map(X: ndarray | DataFrame, figsize: tuple[int, int] = (4, 4), return_distances: bool = False) → None | tuple[ndarray, ndarray, float, float][source]

Plot Orthogonal distances vs Score distances to identify different types of outliers

Parameters:

X (np.ndarray) – Data matrix (n x p)
figsize (tuple[int, int], optional) – Size of the plot. Defaults to (10, 4).
return_distances (bool, optional) – Whether to return the distances and cutoff values. Defaults to False.

project(X: ndarray) → ndarray[source]

Project the data onto the subspace spanned by the principal components.

Parameters:: X (np.ndarray) – Data to project
Returns:: Projected data
Return type:: np.ndarray

transform(X: ndarray) → ndarray[source]

Apply dimensionality reduction to X.

X is projected on the first principal components previously extracted from a training set.

Parameters:: X ({array-like, sparse matrix} of shape (n_samples, n_features)) – New data, where n_samples is the number of samples and n_features is the number of features.
Returns:: X_new – Projection of X in the first principal components, where n_samples is the number of samples and n_components is the number of the components.
Return type:: array-like of shape (n_samples, n_components)

robpy.pca.base.get_od_cutoff(orthogonal_distances: ndarray) → float[source]

ROBPCA

class robpy.pca.robpca.ROBPCA(*, n_components: int | None = None, k_min_var_explained: float = 0.8, alpha: float = 0.75, final_MCD_step: bool = True)[source]

Bases: RobustPCA

Implementation of ROBPCA algorithm as described in Hubert, Rousseeuw & Vanden Branden (2005) and Hubert, Rousseeuw & Verdonck (2009)

Parameters:

n_components (int | None, optional) – Number of components to select. If None, it is set during fit to min (X.shape)
k_min_var_explained (float, optional) – minimum variance explained by the n_components Only used if n_components is None
alpha (float, optional) – coverage parameter, determines the robustness and efficiency trade off of the estimator. Smaller alpha gives more robust but less accurate estimates
final_MCD_step (bool, optional) – whether to apply the final MCD step to get maximally robust estimates. If False, the eigenvectors after projection onto V1 (subspace determined by points with OD < cutoff) are used as the final estimates. Defaults to True.

References

Hubert, Rousseeuw & Vanden Branden (2005), ROBPCA: A new approach to robust principal component analysis
Hubert, Rousseeuw & Verdonck (2009) Robust PCA for skewed data and its outlier map

fit(X: ndarray) → ROBPCA[source]

Fit the robust PCA model to the data

Parameters:: X (np.ndarray) – Data to fit the model to

Spherical

class robpy.pca.spca.PCALocantore(*, n_components: int | None = None, k_min_var_explained: float = 0.8)[source]

Bases: RobustPCA

Spherical PCA

Parameters:

n_components (int | None, optional) – Number of components to select. If None, it is set during fit to explain the minimum variance.
k_min_var_explained (float, optional) – Minimum variance explained by the n_components Only used if n_components is None.

fit(X: ndarray) → PCALocantore[source]

Fit the robust PCA model to the data

Parameters:: X (np.ndarray) – Data to fit the model to