robpy.datasets
Module that contains convenience functions for loading commonly used datasets.
- robpy.datasets.base.load_animals(*, as_frame=False)[source]
Load and return the Animals dataset from MASS (R) (covariance / regression).
The animals dataset is a bivariate dataset used for demonstrating robust covariance estimators.
Samples
28
Dimensionality
2
Features
real, positive
- Parameters:
as_frame (bool, default=False) – If True, the data is a pandas DataFrame including columns with appropriate dtypes (numeric).
- Returns:
data – Dictionary-like object, with the following attributes:
- data{ndarray, dataframe} of shape (28, 2)
The data matrix. If as_frame=True, data will be a pandas DataFrame.
- feature_names: list
The names of the dataset columns.
- DESCR: str
The full description of the dataset.
- filename: str
The path to the location of the data.
- Return type:
Bunch
Examples
Fitting a robust covariance estimator:
>>> from robpy.datasets import load_animals >>> from robpy.covariance import FastMCD >>> data = load_animals() >>> mcd = FastMCD().fit(data.data)
- robpy.datasets.base.load_glass(*, as_frame=False)[source]
Load and return the glass dataset from cellWise (R) (outlier detection).
The glass dataset is a high dimensional dataset used for demonstrating outlier detection.
Samples
180
Dimensionality
750
Features
real, positive
- Parameters:
as_frame (bool, default=False) – If True, the data is a pandas DataFrame including columns with appropriate dtypes (numeric).
- Returns:
data – Dictionary-like object, with the following attributes:
- data{ndarray, dataframe} of shape (180, 750)
The data matrix. If as_frame=True, data will be a pandas DataFrame.
- feature_names: list
The names of the dataset columns.
- DESCR: str
The full description of the dataset.
- filename: str
The path to the location of the data.
- Return type:
Bunch
- robpy.datasets.base.load_stars(*, as_frame=False)[source]
Load and return the Hertzsprung-Russell Diagram Data of Star Cluster CYG OB1 (covariance/regression).
The stars dataset is well-known bivariate dataset used for demonstrating robust covariance and regression estimators.
Samples
47
Dimensionality
2
Features
real, positive
- Parameters:
as_frame (bool, default=False) – If True, the data is a pandas DataFrame including columns with appropriate dtypes (numeric).
- Returns:
data – Dictionary-like object, with the following attributes:
- data{ndarray, dataframe} of shape (47, 2)
The data matrix. If as_frame=True, data will be a pandas DataFrame.
- feature_names: list
The names of the dataset columns.
- DESCR: str
The full description of the dataset.
- filename: str
The path to the location of the data.
- Return type:
Bunch
Examples
Fitting a robust covariance estimator:
>>> from robpy.datasets import load_stars >>> from robpy.covariance import FastMCD >>> data = load_stars() >>> mcd = FastMCD().fit(data.data)
- robpy.datasets.base.load_telephone(*, as_frame=False)[source]
Load and return the telephone dataset (regression with outliers).
The telephone dataset is a well-known univariate regression problem with outliers.
Samples
24
Dimensionality
2
Features
real, positive
- Parameters:
as_frame (bool, default=False) – If True, the data is a pandas DataFrame including columns with appropriate dtypes (numeric). The target is a pandas DataFrame or Series depending on the number of target columns.
- Returns:
data – Dictionary-like object, with the following attributes:
- data{ndarray, dataframe} of shape (24, 2)
The data matrix. If as_frame=True, data will be a pandas DataFrame.
- feature_names: list
The names of the dataset columns.
- DESCR: str
The full description of the dataset.
- filename: str
The path to the location of the data.
- Return type:
Bunch
Examples
Fitting a robust regression:
>>> from robpy.datasets import load_telephone >>> from robpy.regression import MMRegression >>> data = load_telephone() >>> mm = MMRegression().fit(data.data[:, 0], data.data[:, 1])
- robpy.datasets.base.load_topgear(*, as_frame=False)[source]
Load and return the TopGear dataset from robustHD (R) (regression).
The TopGear dataset is a mixed variable dataset used for demonstrating robust regression estimators.
Samples
297
Dimensionality
32
Features
mixed
- Parameters:
as_frame (bool, default=False) – If True, the data is a pandas DataFrame including columns with appropriate dtypes (numeric).
- Returns:
data – Dictionary-like object, with the following attributes:
- data{ndarray, dataframe} of shape (297, 32)
The data matrix. If as_frame=True, data will be a pandas DataFrame.
- feature_names: list
The names of the dataset columns.
- categorical_features: list
The names of the categorical features.
- DESCR: str
The full description of the dataset.
- filename: str
The path to the location of the data.
- Return type:
Bunch
Examples
Fitting a robust regression estimator:
>>> from robpy.datasets import load_topgear >>> from robpy.regression import FastLTSRegression >>> data = load_topgear(as_frame=True) >>> data.data = data.data.dropna(subset=["Cylinders", "Torque", "TopSpeed", "Price"]) >>> lts = FastLTSRegression().fit( data.data[["Cylinders", "Torque", "TopSpeed"]], data.data["Price"] )