robpy.utils

Distance

robpy.utils.distance.mahalanobis_distance(data: ndarray | DataFrame, location: ndarray, covariance: ndarray)[source]

Calculate the Mahalanobis distance for multiple data vectors.

Parameters:

data (np.ndarray or pd.DataFrame) – An array-like object where each row is a data vector.
location (np.ndarray) – The center of the data.
covariance (np.ndarray) – The scatter estimator of the data.

Returns:

An array of Mahalanobis distances for each data vector.

Return type:

np.ndarray

Rho functions

class robpy.utils.rho.BaseRho[source]

Bases: object

Base class for robust loss functions.

psi(X: ndarray) → ndarray[source]

rho(X: ndarray) → ndarray[source]

class robpy.utils.rho.Huber(b: float = 1.5)[source]

Bases: BaseRho

Huber’s loss function.

Parameters:: b (float, optional) – Threshold between the quadratic and linear regions of loss. Defaults to 1.5.

psi(X: ndarray) → ndarray[source]

rho(X: ndarray) → ndarray[source]

class robpy.utils.rho.TukeyBisquare(c: float = 1.56)[source]

Bases: BaseRho

Tukey’s bisquare loss function.

Parameters:: c (float, optional) – Tuning constant controlling the cutoff. Defaults to 1.56.

psi(X: ndarray) → ndarray[source]

rho(X: ndarray) → ndarray[source]

Other

robpy.utils.general.inverse_submatrix(A: ndarray, A_inv: ndarray, indices: array) → ndarray[source]

Given a matrix A and its inverse A_inv, this function calculates the inverse of the submatrix of A consisting of the rows and columns in indices.

Parameters:

A (np.ndarray) – The matrix of interest.
A_inv (np.ndarray) – The inverse of the matrix of interest.
indices (np.array) – The indices corresponding to the submatrix of interest.

robpy.utils.median.l1median(X: ndarray) → float[source]

Implementation of the L1-median.

Parameters:: X (np.ndarray) – Data to compute the L1-median on.

References

Fritz, H., Filzmoser, P., & Croux, C. (2012). A comparison of algorithms for the multivariate L1-median. Computational Statistics, 27, 393-410.

robpy.utils.median.weighted_median(X: ndarray, weights: ndarray) → float[source]

Computes a weighted median.

Parameters:

X (np.ndarray) – Data to compute the weighted median on.
weights (np.ndarray) – The weigths used.

References

Croux, C., & Rousseeuw, P. J. (1992). Time-efficient algorithms for two highly robust estimators of scale. In Computational Statistics: Volume 1: Proceedings of the 10th Symposium on Computational Statistics (pp. 411-428). Heidelberg: Physica-Verlag HD.

robpy.utils.outlyingness.stahel_donoho(X: ndarray, n_points: int = 2, n_dir: int = 250, random_seed: int | None = None) → ndarray[source]

Calculate a degree of outlyingness for multivariate points. Based on the principle proposed by Stahel, W. A. (1981) and Donoho, D. L. (1982).

Parameters:

X (np.ndarray) – Data matrix of shape (n_obs, n_features).
n_points (int, optional) – Number of points to determine the direction to project on. Defaults to 2. For n_points = 2, each projection is on a line passing through 2 data points, as in Hubert et al. (2005). If not, each projection is on the direction orthogonal to a hyperplane passing through n_points data points.
n_dir (int, optional) – Number of random directions to consider. Defaults to 250.
random_seed (int | None, optional) – Can be used to provide a random seed. Defaults to None.

Returns:

Single column of outlyingness values.

Return type:

np.ndarray

References

Donoho, D. L. (1982). Breakdown properties of multivariate location estimators. Technical report, Harvard University, Boston.
Hubert, M., Rousseeuw, P. J., & Vanden Branden, K. (2005). ROBPCA: a new approach to robust principal component analysis. Technometrics, 47(1), 64-79.
Stahel, W. A. (1981). Robuste schätzungen: infinitesimale optimalität und schätzungen von kovarianzmatrizen (Doctoral dissertation, ETH Zurich).