load_monster_dataset

load_monster_dataset(dataset_name: str, fold: int = 0) tuple[ndarray, ndarray, ndarray, ndarray][source]

Load a Monster dataset from Hugging Face Hub.

MONSTER— the MONash Scalable Time Series Evaluation Repository, introduced in [1], is a collection of large datasets for time series classification.The collection is hosted on Hugging Face Hub.

Parameters:
dataset_namestr

The name of the dataset to load (e.g., “CornellWhaleChallenge”, “AudioMNIST”).

foldint, default=0

The specific cross-validation fold index to load. This determines which samples are used for the test set. Defaults to fold 0.

Returns:
X_trainnp.ndarray

The training data, shape (n_train_cases, n_channels, n_timepoints). (n_channels=1 for these univariate datasets).

y_trainnp.ndarray

The training class labels, shape (n_train_cases,).

X_testnp.ndarray

The testing data, shape (n_test_cases, n_channels, n_timepoints).

y_testnp.ndarray

The testing class labels, shape (n_test_cases,).

Raises:
ModuleNotFoundError

If required optional dependency ‘huggingface-hub’ not installed.

ValueError

If the dataset_name is not recognized or the fold number is invalid.

OSError

If the download fails due to network issues

Notes

The data files are cached locally by the huggingface-hub library, avoiding repeated downloads. This function requires the optional dependency huggingface-hub.

References

[1]

Dempster, A., Mohammadi Foumani, N., Tan, C. W., Miller, L., Mishra, A., Salehi, M., Pelletier, C., Schmidt, D. F., & Webb, G. I. (2025). MONSTER: Monash Scalable Time Series Evaluation Repository. arXiv preprint arXiv:2502.15122.