robpy.pca
Base
- class robpy.pca.base.RobustPCA(*, n_components: int | None = None)[source]
Bases:
_BasePCABase class for robust PCA estimators.
- Parameters:
n_components (int | None, optional) – Number of components to select. If None, it is set during fit.
- abstract fit(X: ndarray)[source]
Fit the robust PCA model to the data.
- Parameters:
X (np.ndarray) – Data to fit the model to.
- plot_outlier_map(X: ndarray | DataFrame, figsize: tuple[int, int] = (4, 4), return_distances: bool = False) None | tuple[ndarray, ndarray, float, float][source]
Plot Orthogonal distances vs Score distances to identify different types of outliers.
- Parameters:
X (np.ndarray) – Data matrix (n x p).
figsize (tuple[int, int], optional) – Size of the plot. Defaults to (4, 4).
return_distances (bool, optional) – Whether to return the distances and cutoff values. Defaults to False.
- project(X: ndarray) ndarray[source]
Project the data onto the subspace spanned by the principal components.
- Parameters:
X (np.ndarray) – Data to project.
- Returns:
Projected data.
- Return type:
np.ndarray
- transform(X: ndarray) ndarray[source]
Apply dimensionality reduction to X.
X is projected on the first principal components previously extracted from a training set.
- Parameters:
X ({array-like, sparse matrix} of shape (n_samples, n_features)) – New data, where n_samples is the number of samples and n_features is the number of features.
- Returns:
X_new – Projection of X on the first principal components, where n_samples is the number of samples and n_components is the number of components.
- Return type:
np.ndarray of shape (n_samples, n_components)
ROBPCA
- class robpy.pca.robpca.ROBPCA(*, n_components: int | None = None, k_min_var_explained: float = 0.8, alpha: float = 0.75, final_MCD_step: bool = True, random_seed: int | None = None, verbosity: int = 30)[source]
Bases:
RobustPCAImplementation of ROBPCA algorithm as described in Hubert, M., Rousseeuw, P. J., & Vanden Branden, K. (2005).
- Parameters:
n_components (int | None, optional) – Number of components to select. If None, it is set during fit. Defaults to None.
k_min_var_explained (float, optional) – Minimum variance explained by the components. Only used if n_components is None. Defaults to 0.8.
alpha (float, optional) – Coverage parameter, determines the robustness and efficiency trade off of the estimator. Smaller alpha gives more robust but less accurate estimates. Must be a number between 0.5 and 1. Defaults to 0.75.
final_MCD_step (bool, optional) – Whether to apply the final MCD step to get maximally robust estimates. If False, the eigenvectors after projection onto V1 (subspace determined by points with OD < cutoff) are used as the final estimates. Defaults to True.
random_seed (int | None, optional) – Can be used to provide a random seed. Defaults to None.
References
Hubert, M., Rousseeuw, P. J., & Vanden Branden, K. (2005). ROBPCA: a new approach to robust principal component analysis. Technometrics, 47(1), 64-79.
Spherical
- class robpy.pca.spca.PCALocantore(*, n_components: int | None = None, k_min_var_explained: float = 0.8)[source]
Bases:
RobustPCASpherical PCA, as introduced in Locantore et al. (1999).
- Parameters:
n_components (int | None, optional) – Number of components to select. If None, it is set during fit to explain the minimum variance.
k_min_var_explained (float, optional) – Minimum variance explained by the components. Only used if n_components is None. Defaults to 0.8.
References
Locantore, N., Marron, J. S., Simpson, D. G., Tripoli, N., Zhang, J. T., Cohen, K. L., … & Cohen, K. L. (1999). Robust principal component analysis for functional data. Test, 8(1), 1-73.
- fit(X: ndarray) PCALocantore[source]
Fit the robust PCA model to the data.
- Parameters:
X (np.ndarray) – Data to fit the model to.