[1]:

import warnings

warnings.filterwarnings("ignore")

from aeon.benchmarking.forecasting import ForecastingBenchmark
from aeon.datasets import load_airline
from aeon.forecasting.model_selection import ExpandingWindowSplitter
from aeon.forecasting.naive import NaiveForecaster
from aeon.performance_metrics.forecasting import mean_squared_percentage_error

Instantiate an instance of a benchmark class¶

In this example we are comparing forecasting estimators.

[2]:

benchmark = ForecastingBenchmark()

Add competing estimators¶

We add different competing estimators to the benchmark instance. All added estimators will be automatically ran through each added benchmark tasks, and their results compiled.

[3]:

benchmark.add_estimator(
    estimator=NaiveForecaster(strategy="mean", sp=12),
    estimator_id="NaiveForecaster-mean-v1",
)
benchmark.add_estimator(
    estimator=NaiveForecaster(strategy="last", sp=12),
    estimator_id="NaiveForecaster-last-v1",
)

Add benchmarking tasks¶

These are the prediction/validation tasks over which every estimator will be tested and their results compiled.

The exact arguments for a benchmarking task depend on the whether the objective is forecasting, classification, etc., but generally they are similar. The following are the required arguments for defining a forecasting benchmark task.

Specify cross-validation split regime(s)¶

Define cross-validation split regimes, using standard aeon objects.

[4]:

cv_splitter = ExpandingWindowSplitter(
    initial_window=24,
    step_length=12,
    fh=12,
)

Specify performance metric(s)¶

Define performance metrics on which to compare estimators, using standard aeon functions.

[5]:

scorers = [mean_squared_percentage_error]

Specify dataset loaders¶

Define dataset loaders, which are callables (functions) which should return a dataset. Generally this is a callable which returns a dataframe containing the entire dataset. One can use the aeon defined datasets, or define their own. Something as simple as the following example will suffice:

def my_dataset_loader():
    return pd.read_csv("path/to/data.csv")

The datasets will be loaded when running the benchmarking tasks, ran through the cross-validation regime(s) and subsequently the estimators will be tested over the dataset splits.

[6]:

dataset_loaders = [load_airline]

Add tasks to the benchmark instance¶

Use the previously defined objects to add tasks to the benchmark instance. Optionally use loops etc. to easily setup multiple benchmark tasks reusing arguments.

[7]:

for dataset_loader in dataset_loaders:
    benchmark.add_task(
        dataset_loader,
        cv_splitter,
        scorers,
    )

Run all task-estimator combinations and store results¶

Note that run won’t rerun tasks it already has results for, so adding a new estimator and running run again will only run tasks for that new estimator.

[8]:

results_df = benchmark.run("./forecasting_results.csv")
results_df.T

[8]:

	0	1
validation_id	[dataset=load_airline]_[cv_splitter=ExpandingW...	[dataset=load_airline]_[cv_splitter=ExpandingW...
model_id	NaiveForecaster-last-v1	NaiveForecaster-mean-v1
runtime_secs	0.092438	0.108086
MeanSquaredPercentageError_fold_0_test	0.024532	0.049681
MeanSquaredPercentageError_fold_1_test	0.020831	0.0737
MeanSquaredPercentageError_fold_2_test	0.001213	0.05352
MeanSquaredPercentageError_fold_3_test	0.01495	0.081063
MeanSquaredPercentageError_fold_4_test	0.031067	0.138163
MeanSquaredPercentageError_fold_5_test	0.008373	0.145125
MeanSquaredPercentageError_fold_6_test	0.007972	0.154337
MeanSquaredPercentageError_fold_7_test	0.000009	0.123298
MeanSquaredPercentageError_fold_8_test	0.028191	0.185644
MeanSquaredPercentageError_fold_9_test	0.003906	0.184654
MeanSquaredPercentageError_mean	0.014104	0.118918
MeanSquaredPercentageError_std	0.011451	0.051265

Generated using nbsphinx. The Jupyter notebook can be found here.