binder

[1]:
import warnings

warnings.filterwarnings("ignore")

from aeon.benchmarking.forecasting import ForecastingBenchmark
from aeon.datasets import load_airline
from aeon.forecasting.model_selection import ExpandingWindowSplitter
from aeon.forecasting.naive import NaiveForecaster
from aeon.performance_metrics.forecasting import mean_squared_percentage_error

Instantiate an instance of a benchmark class

In this example we are comparing forecasting estimators.

[2]:
benchmark = ForecastingBenchmark()

Add competing estimators

We add different competing estimators to the benchmark instance. All added estimators will be automatically ran through each added benchmark tasks, and their results compiled.

[3]:
benchmark.add_estimator(
    estimator=NaiveForecaster(strategy="mean", sp=12),
    estimator_id="NaiveForecaster-mean-v1",
)
benchmark.add_estimator(
    estimator=NaiveForecaster(strategy="last", sp=12),
    estimator_id="NaiveForecaster-last-v1",
)

Add benchmarking tasks

These are the prediction/validation tasks over which every estimator will be tested and their results compiled.

The exact arguments for a benchmarking task depend on the whether the objective is forecasting, classification, etc., but generally they are similar. The following are the required arguments for defining a forecasting benchmark task.

Specify cross-validation split regime(s)

Define cross-validation split regimes, using standard aeon objects.

[4]:
cv_splitter = ExpandingWindowSplitter(
    initial_window=24,
    step_length=12,
    fh=12,
)

Specify performance metric(s)

Define performance metrics on which to compare estimators, using standard aeon functions.

[5]:
scorers = [mean_squared_percentage_error]

Specify dataset loaders

Define dataset loaders, which are callables (functions) which should return a dataset. Generally this is a callable which returns a dataframe containing the entire dataset. One can use the aeon defined datasets, or define their own. Something as simple as the following example will suffice:

def my_dataset_loader():
    return pd.read_csv("path/to/data.csv")

The datasets will be loaded when running the benchmarking tasks, ran through the cross-validation regime(s) and subsequently the estimators will be tested over the dataset splits.

[6]:
dataset_loaders = [load_airline]

Add tasks to the benchmark instance

Use the previously defined objects to add tasks to the benchmark instance. Optionally use loops etc. to easily setup multiple benchmark tasks reusing arguments.

[7]:
for dataset_loader in dataset_loaders:
    benchmark.add_task(
        dataset_loader,
        cv_splitter,
        scorers,
    )

Run all task-estimator combinations and store results

Note that run won’t rerun tasks it already has results for, so adding a new estimator and running run again will only run tasks for that new estimator.

[8]:
results_df = benchmark.run("./forecasting_results.csv")
results_df.T
[8]:
0 1
validation_id [dataset=load_airline]_[cv_splitter=ExpandingW... [dataset=load_airline]_[cv_splitter=ExpandingW...
model_id NaiveForecaster-last-v1 NaiveForecaster-mean-v1
runtime_secs 0.092438 0.108086
MeanSquaredPercentageError_fold_0_test 0.024532 0.049681
MeanSquaredPercentageError_fold_1_test 0.020831 0.0737
MeanSquaredPercentageError_fold_2_test 0.001213 0.05352
MeanSquaredPercentageError_fold_3_test 0.01495 0.081063
MeanSquaredPercentageError_fold_4_test 0.031067 0.138163
MeanSquaredPercentageError_fold_5_test 0.008373 0.145125
MeanSquaredPercentageError_fold_6_test 0.007972 0.154337
MeanSquaredPercentageError_fold_7_test 0.000009 0.123298
MeanSquaredPercentageError_fold_8_test 0.028191 0.185644
MeanSquaredPercentageError_fold_9_test 0.003906 0.184654
MeanSquaredPercentageError_mean 0.014104 0.118918
MeanSquaredPercentageError_std 0.011451 0.051265

Generated using nbsphinx. The Jupyter notebook can be found here.