Downloading and loading benchmarking datasets#

It is common to use standard collections of data to compare different estimators for classification, clustering, regression and forecasting. Some of these datasets are shipped with aeon in the datasets/data directory. However, the files are far too big to include them all. aeon p[rovides tools to download these data to use in benchmarking experiments. Classification and regression data are stored in .ts format. Forecasting data are stored in the equivalent .tsf format. See the data loading notebook for more info.

Classification and regression are loaded into 3D numpy arrays of shape (n_cases, n_channels, n_timepoints) if equal length or a list of [n_cases] of 2D numpy if n_timepoints is different for different cases. Forecasting data are loaded into pd.DataFrame. For more information on aeon data types see the data structures notebook.

Note that this notebook is dependent on external websites, so will not function if you are not online or the associated website is down. We use the following three functions

from aeon.datasets import load_classification, load_forecasting, load_regression

Time Series Classification Archive#

UCR/TSML Time Series Classification Archive hosts the UCR univariate TSC archive [1], also available from UCR and the multivariate archive [2] (previously called the UEA archive, soon to change). We provide seven of these in the datasets/data directort: ACSF1, ArrowHead, BasicMotions, GunPoint, ItalyPowerDemand, JapaneseVowels and PLAID. The archive is much bigger. The last batch release was for 128 univariate [1] and 33 multivariate [2]. If you just want to download them all, please go to the [website] (

from aeon.datasets.tsc_data_lists import multivariate, univariate

# This file also contains sub lists by type, e.g. unequal length
print("Univariate length = ", len(univariate))
print("Multivariate length = ", len(multivariate))
Univariate length =  128
Multivariate length =  30

A default train and test split is provided for this data. The file structure for a problem such as Chinatown is


You can load these problems directly from and load them into memory. Note by default, these functions return the data and associated metadata. This usage combines the train and test splits and loads them into one X and one y array.

X, y, meta = load_classification("Chinatown", return_metadata=True)
print("Shape of X = ", X.shape)
print("First case = ", X[0][0], " has label = ", y[0])
print("\nMeta data = ", meta)
Shape of X =  (363, 1, 24)
First case =  [ 573.  375.  301.  212.   55.   34.   25.   33.  113.  143.  303.  615.
 1226. 1281. 1221. 1081.  866. 1096. 1039.  975.  746.  581.  409.  182.]  has label =  1

Meta data =  {'problemname': 'chinatown', 'timestamps': False, 'missing': False, 'univariate': True, 'equallength': True, 'classlabel': True, 'targetlabel': False, 'class_values': ['1', '2']}

If you look in aeon/datasets you should see a directory called local_data containing the Chinatown datasets. All of the zips have .ts files. Some also have .arff and .txt files. If you load again, it will not download again if the file is already there. If you want to store data somewhere else, you can specify a file path. Also, you can load the train and test separately. This code will download the data to Temp once, and load into separate train/test splits. The split argument is not case sensitive. Once downloaded, load_classification is a equivalent to a call to load_from_tsfile

X_train, y_train = load_classification(
    "BeetleFly", extract_path="./Temp/", split="TRAIN"
X_test, y_test = load_classification("BeetleFly", extract_path="./Temp/", split="test")
print("Train shape = ", X_train.shape)
print("Test shape = ", X_test.shape)
from aeon.datasets import load_from_tsfile

X_train, y_train = load_from_tsfile(
print("Loaded directly shape = ", X_train.shape)

Train shape =  (20, 1, 512)
Test shape =  (20, 1, 512)
Loaded directly shape =  (20, 1, 512)
array([1.7400873, 1.7331051, 1.7091917, 1.6333304, 1.5405759])

Time Series (Extrinsic) Regression#

`The Monash Time Series Extrinsic Regression Archive <>`__ [3] repo (called extrinsic to diffentiate if from sliding window based regression) currently contains 19 regression problems in .ts format. One of these, Covid3Month, is in datasets\data. The usage of load_regression is identical to load_classification

from aeon.datasets.dataset_collections import list_available_tser_datasets

X, y, meta = load_regression("FloodModeling1")
print("Shape of X = ", X.shape)
Shape of X =  (673, 1, 266)

Time Series Forecasting#

The Monash time series forecasting repo contains a large number of forecasting data, including competition data such as M1, M3 and M4. Usage is the same as the other problems, although there is no provided train/test splits.

from aeon.datasets.dataset_collections import list_available_tsf_datasets

X, metadata = load_forecasting("m4_yearly_dataset")
data = X.head()
(23000, 3)
{'frequency': 'yearly', 'forecast_horizon': 6, 'contain_missing_values': False, 'contain_equal_length': False}
  series_name     start_timestamp  \
0          T1 1979-01-01 12:00:00
1          T2 1979-01-01 12:00:00
2          T3 1979-01-01 12:00:00
3          T4 1979-01-01 12:00:00
4          T5 1979-01-01 12:00:00

0  [5172.1, 5133.5, 5186.9, 5084.6, 5182.0, 5414....
1  [2070.0, 2104.0, 2394.0, 1651.0, 1492.0, 1348....
2  [2760.0, 2980.0, 3200.0, 3450.0, 3670.0, 3850....
3  [3380.0, 3670.0, 3960.0, 4190.0, 4440.0, 4700....
4  [1980.0, 2030.0, 2220.0, 2530.0, 2610.0, 2720....


[1] Dau et. al, The UCR time series archive, IEEE/CAA Journal of Automatica Sinica, 2019 [2] Ruiz et. al, The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances, Data Mining and Knowledge Discovery 35(2), 2021 [3] Tan et. al, Time Series Extrinsic Regression, Data Mining and Knowledge Discovery, 2021 [4] Godahewa et. al, Monash Time Series Forecasting Archive,Neural Information Processing Systems Track on Datasets and Benchmarks, 2021

Generated using nbsphinx. The Jupyter notebook can be found here.