robpy.utils
Distance
- robpy.utils.distance.mahalanobis_distance(data: ndarray | DataFrame, location: ndarray, covariance: ndarray)[source]
Calculate the Mahalanobis distance for multiple data vectors.
- Parameters:
data (np.ndarray or pd.DataFrame) – An array-like object where each row is a data vector.
location (np.ndarray) – The center of the data.
covariance (np.ndarray) – The scatter estimator of the data.
- Returns:
An array of Mahalanobis distances for each data vector.
- Return type:
np.ndarray
Rho functions
Other
- robpy.utils.general.inverse_submatrix(A: ndarray, A_inv: ndarray, indices: array) ndarray[source]
Given a matrix A and its inverse A_inv, this function calculates the inverse of the submatrix of A consisting of the rows and columns in indices.
- Parameters:
A (np.ndarray) – The matrix of interest.
A_inv (np.ndarray) – The inverse of the matrix of interest.
indices (np.array) – The indices corresponding to the submatrix of interest.
- robpy.utils.median.l1median(X: ndarray) float[source]
Implementation of the L1-median.
- Parameters:
X (np.ndarray) – Data to compute the L1-median on.
References
Fritz, H., Filzmoser, P., & Croux, C. (2012). A comparison of algorithms for the multivariate L1-median. Computational Statistics, 27, 393-410.
- robpy.utils.median.weighted_median(X: ndarray, weights: ndarray) float[source]
Computes a weighted median.
- Parameters:
X (np.ndarray) – Data to compute the weighted median on.
weights (np.ndarray) – The weigths used.
References
Croux, C., & Rousseeuw, P. J. (1992). Time-efficient algorithms for two highly robust estimators of scale. In Computational Statistics: Volume 1: Proceedings of the 10th Symposium on Computational Statistics (pp. 411-428). Heidelberg: Physica-Verlag HD.
- robpy.utils.outlyingness.stahel_donoho(X: ndarray, n_points: int = 2, n_dir: int = 250, random_seed: int | None = None) ndarray[source]
Calculate a degree of outlyingness for multivariate points. Based on the principle proposed by Stahel, W. A. (1981) and Donoho, D. L. (1982).
- Parameters:
X (np.ndarray) – Data matrix of shape (n_obs, n_features).
n_points (int, optional) – Number of points to determine the direction to project on. Defaults to 2. For n_points = 2, each projection is on a line passing through 2 data points, as in Hubert et al. (2005). If not, each projection is on the direction orthogonal to a hyperplane passing through n_points data points.
n_dir (int, optional) – Number of random directions to consider. Defaults to 250.
random_seed (int | None, optional) – Can be used to provide a random seed. Defaults to None.
- Returns:
Single column of outlyingness values.
- Return type:
np.ndarray
References
Donoho, D. L. (1982). Breakdown properties of multivariate location estimators. Technical report, Harvard University, Boston.
Hubert, M., Rousseeuw, P. J., & Vanden Branden, K. (2005). ROBPCA: a new approach to robust principal component analysis. Technometrics, 47(1), 64-79.
Stahel, W. A. (1981). Robuste schätzungen: infinitesimale optimalität und schätzungen von kovarianzmatrizen (Doctoral dissertation, ETH Zurich).