robpy.pca

Base

class robpy.pca.base.RobustPCA(*, n_components: int | None = None)[source]

Bases: _BasePCA

Base class for robust PCA estimators

Parameters:

n_components (int | None, optional) – Number of components to select. If None, it is set during fit to min (X.shape)

abstract fit(X: ndarray)[source]

Fit the robust PCA model to the data

Parameters:

X (np.ndarray) – Data to fit the model to

plot_outlier_map(X: ndarray | DataFrame, figsize: tuple[int, int] = (4, 4), return_distances: bool = False) None | tuple[ndarray, ndarray, float, float][source]

Plot Orthogonal distances vs Score distances to identify different types of outliers

Parameters:
  • X (np.ndarray) – Data matrix (n x p)

  • figsize (tuple[int, int], optional) – Size of the plot. Defaults to (10, 4).

  • return_distances (bool, optional) – Whether to return the distances and cutoff values. Defaults to False.

project(X: ndarray) ndarray[source]

Project the data onto the subspace spanned by the principal components.

Parameters:

X (np.ndarray) – Data to project

Returns:

Projected data

Return type:

np.ndarray

transform(X: ndarray) ndarray[source]

Apply dimensionality reduction to X.

X is projected on the first principal components previously extracted from a training set.

Parameters:

X ({array-like, sparse matrix} of shape (n_samples, n_features)) – New data, where n_samples is the number of samples and n_features is the number of features.

Returns:

X_new – Projection of X in the first principal components, where n_samples is the number of samples and n_components is the number of the components.

Return type:

array-like of shape (n_samples, n_components)

robpy.pca.base.get_od_cutoff(orthogonal_distances: ndarray) float[source]

ROBPCA

class robpy.pca.robpca.ROBPCA(*, n_components: int | None = None, k_min_var_explained: float = 0.8, alpha: float = 0.75, final_MCD_step: bool = True, random_seed: int | None = None)[source]

Bases: RobustPCA

Implementation of ROBPCA algorithm as described in Hubert, Rousseeuw & Vanden Branden (2005) and Hubert, Rousseeuw & Verdonck (2009)

Parameters:
  • n_components (int | None, optional) – Number of components to select. If None, it is set during fit to min (X.shape)

  • k_min_var_explained (float, optional) – minimum variance explained by the n_components Only used if n_components is None

  • alpha (float, optional) – coverage parameter, determines the robustness and efficiency trade off of the estimator. Smaller alpha gives more robust but less accurate estimates

  • final_MCD_step (bool, optional) – whether to apply the final MCD step to get maximally robust estimates. If False, the eigenvectors after projection onto V1 (subspace determined by points with OD < cutoff) are used as the final estimates. Defaults to True.

  • random_seed (int | None, optional) – Can be used to provide a random seed.

References

  • Hubert, Rousseeuw & Vanden Branden (2005), ROBPCA: A new approach to robust principal component analysis

  • Hubert, Rousseeuw & Verdonck (2009) Robust PCA for skewed data and its outlier map

fit(X: ndarray) ROBPCA[source]

Fit the robust PCA model to the data

Parameters:

X (np.ndarray) – Data to fit the model to

Spherical

class robpy.pca.spca.PCALocantore(*, n_components: int | None = None, k_min_var_explained: float = 0.8)[source]

Bases: RobustPCA

Spherical PCA

Parameters:
  • n_components (int | None, optional) – Number of components to select. If None, it is set during fit to explain the minimum variance.

  • k_min_var_explained (float, optional) – Minimum variance explained by the n_components Only used if n_components is None.

fit(X: ndarray) PCALocantore[source]

Fit the robust PCA model to the data

Parameters:

X (np.ndarray) – Data to fit the model to