robpy.utils
Distance
- robpy.utils.distance.mahalanobis_distance(data: ndarray | DataFrame, location: ndarray, covariance: ndarray)[source]
Calculate the Mahalanobis distance for multiple data vectors.
- Parameters:
data (np.ndarray or pd.DataFrame) – An array-like object where each row is a data vector.
location (np.ndarray) – the center of the data
covariance (np.ndarray) – the scatter estimator of the data
- Returns:
an array of Mahalanobis distances for each data vector.
- Return type:
np.ndarray
Rho functions
Other
- robpy.utils.general.inverse_submatrix(A: ndarray, A_inv: ndarray, indices: array) ndarray[source]
Given a matrix A and its inverse A_inv, this function calculates the inverse of the submatrix of A consisting of the rows and columns in indices.
- Parameters:
A (np.ndarray) – the matrix of interest
A_inv (np.ndarray) – the inverse of the matrix of interest
indices (np.array) – the indices corresponding to the submatrix of interest
- robpy.utils.median.l1median(X: ndarray) float[source]
Implementation of the L1-median
- Parameters:
X (np.ndarray) – Data to compute the L1-median on.
References
Fritz, H. and Filzmoser, P. and Croux, C. (2012) A comparison of algorithms for the multivariate L1-median. Computational Statistics 27, 393–410
- robpy.utils.median.weighted_median(X: ndarray, weights: ndarray) float[source]
Computes a weighted median.
References
Time-efficient algorithms for two highly robust estimators of scale, Christophe Croux and Peter J. Rousseeuw (1992)
- robpy.utils.outlyingness.stahel_donoho(X: ndarray, n_points: int = 2, n_dir: int = 250, random_seed: int | None = None) ndarray[source]
Calculate the degree of outlyingness for multivariate points. Based on the algorithm proposed by Stahel (1981) and Donoho (1982).
- Parameters:
X (np.ndarray) – data matric of shape (n_obs, n_features)
n_points (int, optional) – number of points to determine the hyperplane. Defaults to 2.
n_dir (int, optional) – number of random directions to consider. Defaults to 250.
random_seed (int | None, optional) – can be used to provide a random seed.
- Returns:
single column of outlyingness values
- Return type:
np.ndarray
References
Stahel W.A. (1981). Robuste Schatzungen: infinitesimale Optimalitat und Schatzungen von Kovarianzmatrizen. PhD Thesis, ETH Zurich.
Donoho D.L. (1982). Breakdown properties of multivariate location estimators. Ph.D. Qualifying paper, Dept. Statistics, Harvard University, Boston.