robpy.datasets

Module that contains convenience functions for loading commonly used datasets.

robpy.datasets.base.load_animals(*, as_frame=False)[source]

Load and return the Animals dataset from MASS (R) (covariance / regression).

The animals dataset is a bivariate dataset used for demonstrating robust covariance estimators.

Samples	28
Dimensionality	2
Features	real, positive

Parameters:

as_frame (bool, default=False) – If True, the data is a pandas DataFrame including columns with appropriate dtypes (numeric).

Returns:

data – Dictionary-like object, with the following attributes:

data{ndarray, dataframe} of shape (28, 2)
The data matrix. If as_frame=True, data will be a pandas DataFrame.
feature_names: list
The names of the dataset columns.
DESCR: str
The full description of the dataset.
filename: str
The path to the location of the data.

Return type:

Bunch

Examples

Fitting a robust covariance estimator:

>>> from robpy.datasets import load_animals
>>> from robpy.covariance import FastMCD
>>> data = load_animals()
>>> mcd = FastMCD().fit(data.data)

robpy.datasets.base.load_glass(*, as_frame=False)[source]

Load and return the glass dataset from cellWise (R) (outlier detection).

The glass dataset is a high dimensional dataset used for demonstrating outlier detection.

Samples	180
Dimensionality	750
Features	real, positive

Parameters:

as_frame (bool, default=False) – If True, the data is a pandas DataFrame including columns with appropriate dtypes (numeric).

Returns:

data – Dictionary-like object, with the following attributes:

data{ndarray, dataframe} of shape (180, 750)
The data matrix. If as_frame=True, data will be a pandas DataFrame.
feature_names: list
The names of the dataset columns.
DESCR: str
The full description of the dataset.
filename: str
The path to the location of the data.

Return type:

Bunch

robpy.datasets.base.load_stars(*, as_frame=False)[source]

Load and return the Hertzsprung-Russell Diagram Data of Star Cluster CYG OB1 (covariance/regression).

The stars dataset is well-known bivariate dataset used for demonstrating robust covariance and regression estimators.

Samples	47
Dimensionality	2
Features	real, positive

Parameters:

as_frame (bool, default=False) – If True, the data is a pandas DataFrame including columns with appropriate dtypes (numeric).

Returns:

data – Dictionary-like object, with the following attributes:

data{ndarray, dataframe} of shape (47, 2)
The data matrix. If as_frame=True, data will be a pandas DataFrame.
feature_names: list
The names of the dataset columns.
DESCR: str
The full description of the dataset.
filename: str
The path to the location of the data.

Return type:

Bunch

Examples

Fitting a robust covariance estimator:

>>> from robpy.datasets import load_stars
>>> from robpy.covariance import FastMCD
>>> data = load_stars()
>>> mcd = FastMCD().fit(data.data)

robpy.datasets.base.load_telephone(*, as_frame=False)[source]

Load and return the telephone dataset (regression with outliers).

The telephone dataset is a well-known univariate regression problem with outliers.

Samples	24
Dimensionality	2
Features	real, positive

Parameters:

as_frame (bool, default=False) – If True, the data is a pandas DataFrame including columns with appropriate dtypes (numeric). The target is a pandas DataFrame or Series depending on the number of target columns.

Returns:

data – Dictionary-like object, with the following attributes:

data{ndarray, dataframe} of shape (24, 2)
The data matrix. If as_frame=True, data will be a pandas DataFrame.
feature_names: list
The names of the dataset columns.
DESCR: str
The full description of the dataset.
filename: str
The path to the location of the data.

Return type:

Bunch

Examples

Fitting a robust regression:

>>> from robpy.datasets import load_telephone
>>> from robpy.regression import MMRegression
>>> data = load_telephone()
>>> mm = MMRegression().fit(data.data[:, 0], data.data[:, 1])

robpy.datasets.base.load_topgear(*, as_frame=False)[source]

Load and return the TopGear dataset from robustHD (R) (regression).

The TopGear dataset is a mixed variable dataset used for demonstrating robust regression estimators.

Samples	297
Dimensionality	32
Features	mixed

Parameters:

as_frame (bool, default=False) – If True, the data is a pandas DataFrame including columns with appropriate dtypes (numeric).

Returns:

data – Dictionary-like object, with the following attributes:

data{ndarray, dataframe} of shape (297, 32)
The data matrix. If as_frame=True, data will be a pandas DataFrame.
feature_names: list
The names of the dataset columns.
categorical_features: list
The names of the categorical features.
DESCR: str
The full description of the dataset.
filename: str
The path to the location of the data.

Return type:

Bunch

Examples

Fitting a robust regression estimator:

>>> from robpy.datasets import load_topgear
>>> from robpy.regression import FastLTSRegression
>>> data = load_topgear(as_frame=True)
>>> data.data = data.data.dropna(subset=["Cylinders", "Torque", "TopSpeed", "Price"])
>>> lts = FastLTSRegression().fit(
        data.data[["Cylinders", "Torque", "TopSpeed"]], data.data["Price"]
    )