robpy.pca
Base
- class robpy.pca.base.RobustPCA(*, n_components: int | None = None)[source]
Bases:
_BasePCABase class for robust PCA estimators
- Parameters:
n_components (int | None, optional) – Number of components to select. If None, it is set during fit to min (X.shape)
- abstract fit(X: ndarray)[source]
Fit the robust PCA model to the data
- Parameters:
X (np.ndarray) – Data to fit the model to
- plot_outlier_map(X: ndarray | DataFrame, figsize: tuple[int, int] = (4, 4), return_distances: bool = False) None | tuple[ndarray, ndarray, float, float][source]
Plot Orthogonal distances vs Score distances to identify different types of outliers
- Parameters:
X (np.ndarray) – Data matrix (n x p)
figsize (tuple[int, int], optional) – Size of the plot. Defaults to (10, 4).
return_distances (bool, optional) – Whether to return the distances and cutoff values. Defaults to False.
- project(X: ndarray) ndarray[source]
Project the data onto the subspace spanned by the principal components.
- Parameters:
X (np.ndarray) – Data to project
- Returns:
Projected data
- Return type:
np.ndarray
- transform(X: ndarray) ndarray[source]
Apply dimensionality reduction to X.
X is projected on the first principal components previously extracted from a training set.
- Parameters:
X ({array-like, sparse matrix} of shape (n_samples, n_features)) – New data, where n_samples is the number of samples and n_features is the number of features.
- Returns:
X_new – Projection of X in the first principal components, where n_samples is the number of samples and n_components is the number of the components.
- Return type:
array-like of shape (n_samples, n_components)
ROBPCA
- class robpy.pca.robpca.ROBPCA(*, n_components: int | None = None, k_min_var_explained: float = 0.8, alpha: float = 0.75, final_MCD_step: bool = True)[source]
Bases:
RobustPCAImplementation of ROBPCA algorithm as described in Hubert, Rousseeuw & Vanden Branden (2005) and Hubert, Rousseeuw & Verdonck (2009)
- Parameters:
n_components (int | None, optional) – Number of components to select. If None, it is set during fit to min (X.shape)
k_min_var_explained (float, optional) – minimum variance explained by the n_components Only used if n_components is None
alpha (float, optional) – coverage parameter, determines the robustness and efficiency trade off of the estimator. Smaller alpha gives more robust but less accurate estimates
final_MCD_step (bool, optional) – whether to apply the final MCD step to get maximally robust estimates. If False, the eigenvectors after projection onto V1 (subspace determined by points with OD < cutoff) are used as the final estimates. Defaults to True.
References
Hubert, Rousseeuw & Vanden Branden (2005), ROBPCA: A new approach to robust principal component analysis
Hubert, Rousseeuw & Verdonck (2009) Robust PCA for skewed data and its outlier map
Spherical
- class robpy.pca.spca.PCALocantore(*, n_components: int | None = None, k_min_var_explained: float = 0.8)[source]
Bases:
RobustPCASpherical PCA
- Parameters:
n_components (int | None, optional) – Number of components to select. If None, it is set during fit to explain the minimum variance.
k_min_var_explained (float, optional) – Minimum variance explained by the n_components Only used if n_components is None.
- fit(X: ndarray) PCALocantore[source]
Fit the robust PCA model to the data
- Parameters:
X (np.ndarray) – Data to fit the model to