robpy.pca

Base

class robpy.pca.base.RobustPCA(*, n_components: int | None = None)[source]

Bases: _BasePCA

Base class for robust PCA estimators.

Parameters:: n_components (int | None, optional) – Number of components to select. If None, it is set during fit.

abstract fit(X: ndarray)[source]

Fit the robust PCA model to the data.

Parameters:: X (np.ndarray) – Data to fit the model to.

plot_outlier_map(X: ndarray | DataFrame, figsize: tuple[int, int] = (4, 4), return_distances: bool = False) → None | tuple[ndarray, ndarray, float, float][source]

Plot Orthogonal distances vs Score distances to identify different types of outliers.

Parameters:

X (np.ndarray) – Data matrix (n x p).
figsize (tuple[int, int], optional) – Size of the plot. Defaults to (4, 4).
return_distances (bool, optional) – Whether to return the distances and cutoff values. Defaults to False.

project(X: ndarray) → ndarray[source]

Project the data onto the subspace spanned by the principal components.

Parameters:: X (np.ndarray) – Data to project.
Returns:: Projected data.
Return type:: np.ndarray

transform(X: ndarray) → ndarray[source]

Apply dimensionality reduction to X.

X is projected on the first principal components previously extracted from a training set.

Parameters:: X ({array-like, sparse matrix} of shape (n_samples, n_features)) – New data, where n_samples is the number of samples and n_features is the number of features.
Returns:: X_new – Projection of X on the first principal components, where n_samples is the number of samples and n_components is the number of components.
Return type:: np.ndarray of shape (n_samples, n_components)

robpy.pca.base.get_od_cutoff(orthogonal_distances: ndarray) → float[source]

ROBPCA

class robpy.pca.robpca.ROBPCA(*, n_components: int | None = None, k_min_var_explained: float = 0.8, alpha: float = 0.75, final_MCD_step: bool = True, random_seed: int | None = None, verbosity: int = 30)[source]

Bases: RobustPCA

Implementation of ROBPCA algorithm as described in Hubert, M., Rousseeuw, P. J., & Vanden Branden, K. (2005).

Parameters:

n_components (int | None, optional) – Number of components to select. If None, it is set during fit. Defaults to None.
k_min_var_explained (float, optional) – Minimum variance explained by the components. Only used if n_components is None. Defaults to 0.8.
alpha (float, optional) – Coverage parameter, determines the robustness and efficiency trade off of the estimator. Smaller alpha gives more robust but less accurate estimates. Must be a number between 0.5 and 1. Defaults to 0.75.
final_MCD_step (bool, optional) – Whether to apply the final MCD step to get maximally robust estimates. If False, the eigenvectors after projection onto V1 (subspace determined by points with OD < cutoff) are used as the final estimates. Defaults to True.
random_seed (int | None, optional) – Can be used to provide a random seed. Defaults to None.

References

Hubert, M., Rousseeuw, P. J., & Vanden Branden, K. (2005). ROBPCA: a new approach to robust principal component analysis. Technometrics, 47(1), 64-79.

fit(X: ndarray) → ROBPCA[source]

Fit the robust PCA model to the data.

Parameters:: X (np.ndarray) – Data to fit the model to.

Spherical

class robpy.pca.spca.PCALocantore(*, n_components: int | None = None, k_min_var_explained: float = 0.8)[source]

Bases: RobustPCA

Spherical PCA, as introduced in Locantore et al. (1999).

Parameters:

n_components (int | None, optional) – Number of components to select. If None, it is set during fit to explain the minimum variance.
k_min_var_explained (float, optional) – Minimum variance explained by the components. Only used if n_components is None. Defaults to 0.8.

References

Locantore, N., Marron, J. S., Simpson, D. G., Tripoli, N., Zhang, J. T., Cohen, K. L., … & Cohen, K. L. (1999). Robust principal component analysis for functional data. Test, 8(1), 1-73.

fit(X: ndarray) → PCALocantore[source]

Fit the robust PCA model to the data.

Parameters:: X (np.ndarray) – Data to fit the model to.