robpy.datasets

Module that contains convenience functions for loading commonly used datasets.

robpy.datasets.base.load_animals(*, as_frame=False)[source]

Load and return the Animals dataset from MASS (R) (covariance / regression).

The animals dataset is a bivariate dataset used for demonstrating robust covariance estimators.

Samples

28

Dimensionality

2

Features

real, positive

Parameters:

as_frame (bool, default=False) – If True, the data is a pandas DataFrame including columns with appropriate dtypes (numeric).

Returns:

data – Dictionary-like object, with the following attributes:

  • data{ndarray, dataframe} of shape (28, 2)

    The data matrix. If as_frame=True, data will be a pandas DataFrame.

  • feature_names: list

    The names of the dataset columns.

  • DESCR: str

    The full description of the dataset.

  • filename: str

    The path to the location of the data.

Return type:

Bunch

Examples

Fitting a robust covariance estimator:

>>> from robpy.datasets import load_animals
>>> from robpy.covariance import FastMCD
>>> data = load_animals()
>>> mcd = FastMCD().fit(data.data)
robpy.datasets.base.load_glass(*, as_frame=False)[source]

Load and return the glass dataset from cellWise (R) (outlier detection).

The glass dataset is a high dimensional dataset used for demonstrating outlier detection.

Samples

180

Dimensionality

750

Features

real, positive

Parameters:

as_frame (bool, default=False) – If True, the data is a pandas DataFrame including columns with appropriate dtypes (numeric).

Returns:

data – Dictionary-like object, with the following attributes:

  • data{ndarray, dataframe} of shape (180, 750)

    The data matrix. If as_frame=True, data will be a pandas DataFrame.

  • feature_names: list

    The names of the dataset columns.

  • DESCR: str

    The full description of the dataset.

  • filename: str

    The path to the location of the data.

Return type:

Bunch

robpy.datasets.base.load_stars(*, as_frame=False)[source]

Load and return the Hertzsprung-Russell Diagram Data of Star Cluster CYG OB1 (covariance/regression).

The stars dataset is well-known bivariate dataset used for demonstrating robust covariance and regression estimators.

Samples

47

Dimensionality

2

Features

real, positive

Parameters:

as_frame (bool, default=False) – If True, the data is a pandas DataFrame including columns with appropriate dtypes (numeric).

Returns:

data – Dictionary-like object, with the following attributes:

  • data{ndarray, dataframe} of shape (47, 2)

    The data matrix. If as_frame=True, data will be a pandas DataFrame.

  • feature_names: list

    The names of the dataset columns.

  • DESCR: str

    The full description of the dataset.

  • filename: str

    The path to the location of the data.

Return type:

Bunch

Examples

Fitting a robust covariance estimator:

>>> from robpy.datasets import load_stars
>>> from robpy.covariance import FastMCD
>>> data = load_stars()
>>> mcd = FastMCD().fit(data.data)
robpy.datasets.base.load_telephone(*, as_frame=False)[source]

Load and return the telephone dataset (regression with outliers).

The telephone dataset is a well-known univariate regression problem with outliers.

Samples

24

Dimensionality

2

Features

real, positive

Parameters:

as_frame (bool, default=False) – If True, the data is a pandas DataFrame including columns with appropriate dtypes (numeric). The target is a pandas DataFrame or Series depending on the number of target columns.

Returns:

data – Dictionary-like object, with the following attributes:

  • data{ndarray, dataframe} of shape (24, 2)

    The data matrix. If as_frame=True, data will be a pandas DataFrame.

  • feature_names: list

    The names of the dataset columns.

  • DESCR: str

    The full description of the dataset.

  • filename: str

    The path to the location of the data.

Return type:

Bunch

Examples

Fitting a robust regression:

>>> from robpy.datasets import load_telephone
>>> from robpy.regression import MMRegression
>>> data = load_telephone()
>>> mm = MMRegression().fit(data.data[:, 0], data.data[:, 1])
robpy.datasets.base.load_topgear(*, as_frame=False)[source]

Load and return the TopGear dataset from robustHD (R) (regression).

The TopGear dataset is a mixed variable dataset used for demonstrating robust regression estimators.

Samples

297

Dimensionality

32

Features

mixed

Parameters:

as_frame (bool, default=False) – If True, the data is a pandas DataFrame including columns with appropriate dtypes (numeric).

Returns:

data – Dictionary-like object, with the following attributes:

  • data{ndarray, dataframe} of shape (297, 32)

    The data matrix. If as_frame=True, data will be a pandas DataFrame.

  • feature_names: list

    The names of the dataset columns.

  • categorical_features: list

    The names of the categorical features.

  • DESCR: str

    The full description of the dataset.

  • filename: str

    The path to the location of the data.

Return type:

Bunch

Examples

Fitting a robust regression estimator:

>>> from robpy.datasets import load_topgear
>>> from robpy.regression import FastLTSRegression
>>> data = load_topgear(as_frame=True)
>>> data.data = data.data.dropna(subset=["Cylinders", "Torque", "TopSpeed", "Price"])
>>> lts = FastLTSRegression().fit(
        data.data[["Cylinders", "Torque", "TopSpeed"]], data.data["Price"]
    )