[1]:
import warnings
warnings.filterwarnings("ignore")
from aeon.benchmarking.forecasting import ForecastingBenchmark
from aeon.datasets import load_airline
from aeon.forecasting.model_selection import ExpandingWindowSplitter
from aeon.forecasting.naive import NaiveForecaster
from aeon.performance_metrics.forecasting import mean_squared_percentage_error
Instantiate an instance of a benchmark class¶
In this example we are comparing forecasting estimators.
[2]:
benchmark = ForecastingBenchmark()
Add competing estimators¶
We add different competing estimators to the benchmark instance. All added estimators will be automatically ran through each added benchmark tasks, and their results compiled.
[3]:
benchmark.add_estimator(
estimator=NaiveForecaster(strategy="mean", sp=12),
estimator_id="NaiveForecaster-mean-v1",
)
benchmark.add_estimator(
estimator=NaiveForecaster(strategy="last", sp=12),
estimator_id="NaiveForecaster-last-v1",
)
Add benchmarking tasks¶
These are the prediction/validation tasks over which every estimator will be tested and their results compiled.
The exact arguments for a benchmarking task depend on the whether the objective is forecasting, classification, etc., but generally they are similar. The following are the required arguments for defining a forecasting benchmark task.
Specify cross-validation split regime(s)¶
Define cross-validation split regimes, using standard aeon
objects.
[4]:
cv_splitter = ExpandingWindowSplitter(
initial_window=24,
step_length=12,
fh=12,
)
Specify performance metric(s)¶
Define performance metrics on which to compare estimators, using standard aeon
functions.
[5]:
scorers = [mean_squared_percentage_error]
Specify dataset loaders¶
Define dataset loaders, which are callables (functions) which should return a dataset. Generally this is a callable which returns a dataframe containing the entire dataset. One can use the aeon
defined datasets, or define their own. Something as simple as the following example will suffice:
def my_dataset_loader():
return pd.read_csv("path/to/data.csv")
The datasets will be loaded when running the benchmarking tasks, ran through the cross-validation regime(s) and subsequently the estimators will be tested over the dataset splits.
[6]:
dataset_loaders = [load_airline]
Add tasks to the benchmark instance¶
Use the previously defined objects to add tasks to the benchmark instance. Optionally use loops etc. to easily setup multiple benchmark tasks reusing arguments.
[7]:
for dataset_loader in dataset_loaders:
benchmark.add_task(
dataset_loader,
cv_splitter,
scorers,
)
Run all task-estimator combinations and store results¶
Note that run
won’t rerun tasks it already has results for, so adding a new estimator and running run
again will only run tasks for that new estimator.
[8]:
results_df = benchmark.run("./forecasting_results.csv")
results_df.T
[8]:
0 | 1 | |
---|---|---|
validation_id | [dataset=load_airline]_[cv_splitter=ExpandingW... | [dataset=load_airline]_[cv_splitter=ExpandingW... |
model_id | NaiveForecaster-last-v1 | NaiveForecaster-mean-v1 |
runtime_secs | 0.092438 | 0.108086 |
MeanSquaredPercentageError_fold_0_test | 0.024532 | 0.049681 |
MeanSquaredPercentageError_fold_1_test | 0.020831 | 0.0737 |
MeanSquaredPercentageError_fold_2_test | 0.001213 | 0.05352 |
MeanSquaredPercentageError_fold_3_test | 0.01495 | 0.081063 |
MeanSquaredPercentageError_fold_4_test | 0.031067 | 0.138163 |
MeanSquaredPercentageError_fold_5_test | 0.008373 | 0.145125 |
MeanSquaredPercentageError_fold_6_test | 0.007972 | 0.154337 |
MeanSquaredPercentageError_fold_7_test | 0.000009 | 0.123298 |
MeanSquaredPercentageError_fold_8_test | 0.028191 | 0.185644 |
MeanSquaredPercentageError_fold_9_test | 0.003906 | 0.184654 |
MeanSquaredPercentageError_mean | 0.014104 | 0.118918 |
MeanSquaredPercentageError_std | 0.011451 | 0.051265 |
Generated using nbsphinx. The Jupyter notebook can be found here.