robpy.utils

Distance

robpy.utils.distance.mahalanobis_distance(data: ndarray | DataFrame, location: ndarray, covariance: ndarray)[source]

Calculate the Mahalanobis distance for multiple data vectors.

Parameters:
  • data (np.ndarray or pd.DataFrame) – An array-like object where each row is a data vector.

  • location (np.ndarray) – the center of the data

  • covariance (np.ndarray) – the scatter estimator of the data

Returns:

an array of Mahalanobis distances for each data vector.

Return type:

np.ndarray

Rho functions

class robpy.utils.rho.BaseRho[source]

Bases: object

psi(X: ndarray) ndarray[source]
rho(X: ndarray) ndarray[source]
class robpy.utils.rho.Huber(b: float = 1.5)[source]

Bases: BaseRho

psi(X: ndarray) ndarray[source]
rho(X: ndarray) ndarray[source]
class robpy.utils.rho.TukeyBisquare(c: float = 1.56)[source]

Bases: BaseRho

psi(X: ndarray) ndarray[source]
rho(X: ndarray) ndarray[source]

Other

robpy.utils.general.inverse_submatrix(A: ndarray, A_inv: ndarray, indices: array) ndarray[source]

Given a matrix A and its inverse A_inv, this function calculates the inverse of the submatrix of A consisting of the rows and columns in indices.

Parameters:
  • A (np.ndarray) – the matrix of interest

  • A_inv (np.ndarray) – the inverse of the matrix of interest

  • indices (np.array) – the indices corresponding to the submatrix of interest

robpy.utils.median.l1median(X: ndarray) float[source]

Implementation of the L1-median

Parameters:

X (np.ndarray) – Data to compute the L1-median on.

References

Fritz, H. and Filzmoser, P. and Croux, C. (2012) A comparison of algorithms for the multivariate L1-median. Computational Statistics 27, 393–410

robpy.utils.median.weighted_median(X: ndarray, weights: ndarray) float[source]

Computes a weighted median.

References

Time-efficient algorithms for two highly robust estimators of scale, Christophe Croux and Peter J. Rousseeuw (1992)

robpy.utils.outlyingness.stahel_donoho(X: ndarray, n_points: int = 2, n_dir: int = 250) ndarray[source]

Calculate the degree of outlyingness for multivariate points. Based on the algorithm proposed by Stahel (1981) and Donoho (1982).

Parameters:
  • X (np.ndarray) – data matric of shape (n_obs, n_features)

  • n_points (int, optional) – number of points to determine the hyperplane. Defaults to 2.

  • n_dir (int, optional) – number of random directions to consider. Defaults to 250.

Returns:

single column of outlyingness values

Return type:

np.ndarray

References

Stahel W.A. (1981). Robuste Schatzungen: infinitesimale Optimalitat und Schatzungen von Kovarianzmatrizen. PhD Thesis, ETH Zurich.

Donoho D.L. (1982). Breakdown properties of multivariate location estimators. Ph.D. Qualifying paper, Dept. Statistics, Harvard University, Boston.