binder

Benchmarking time series regression models

Time series extrinsic regression, first properly defined in [1] then recently extended in [2], involves predicting a continuous target variable based on a time series. It differs from time series forecasting regression in that the target is not formed from a sliding window, but is some external variable.

This notebook shows you how to use aeon to get benchmarking datasets with aeon and how to compare results on these datasets with those published in [2].

Loading/Downloading data

aeon comes with two regression problems in the datasets module. You can load these with single problem loaders or the more general load_regression function.

[20]:
from aeon.datasets import load_cardano_sentiment, load_covid_3month, load_regression

trainX, trainy = load_covid_3month(split="train")
testX, testy = load_regression(split="test", name="Covid3Month")
X, y = load_cardano_sentiment()  # Combines train and test splits
print(trainX.shape, testX.shape, X.shape)
(140, 1, 84) (61, 1, 84) (107, 2, 24)

there are currently 63 problems in the TSER archive hosted on timeseriesclassification.com. These are listed in the file datasets.tser_datasets

[21]:
from aeon.datasets.tser_datasets import tser_soton

print(sorted(list(tser_soton)))
['AcousticContaminationMadrid', 'AluminiumConcentration', 'AppliancesEnergy', 'AustraliaRainfall', 'BIDMC32HR', 'BIDMC32RR', 'BIDMC32SpO2', 'BarCrawl6min', 'BeijingIntAirportPM25Quality', 'BeijingPM10Quality', 'BeijingPM25Quality', 'BenzeneConcentration', 'BinanceCoinSentiment', 'BitcoinSentiment', 'BoronConcentration', 'CalciumConcentration', 'CardanoSentiment', 'ChilledWaterPredictor', 'CopperConcentration', 'Covid19Andalusia', 'Covid3Month', 'DailyOilGasPrices', 'DailyTemperatureLatitude', 'DhakaHourlyAirQuality', 'ElectricMotorTemperature', 'ElectricityPredictor', 'EthereumSentiment', 'FloodModeling1', 'FloodModeling2', 'FloodModeling3', 'GasSensorArrayAcetone', 'GasSensorArrayEthanol', 'HotwaterPredictor', 'HouseholdPowerConsumption1', 'HouseholdPowerConsumption2', 'IEEEPPG', 'IronConcentration', 'LPGasMonitoringHomeActivity', 'LiveFuelMoistureContent', 'MadridPM10Quality', 'MagnesiumConcentration', 'ManganeseConcentration', 'MethaneMonitoringHomeActivity', 'MetroInterstateTrafficVolume', 'NaturalGasPricesSentiment', 'NewsHeadlineSentiment', 'NewsTitleSentiment', 'OccupancyDetectionLight', 'PPGDalia', 'ParkingBirmingham', 'PhosphorusConcentration', 'PotassiumConcentration', 'PrecipitationAndalusia', 'SierraNevadaMountainsSnow', 'SodiumConcentration', 'SolarRadiationAndalusia', 'SteamPredictor', 'SulphurConcentration', 'TetuanEnergyConsumption', 'VentilatorPressure', 'WaveDataTension', 'WindTurbinePower', 'ZincConcentration']

You can download these datasets directly with aeon load_regression function. By default it will store the data in a directory called “local_data” in the datasets module. Set extract_path to specify a different location.

[22]:
small_problems = [
    "CardanoSentiment",
    "Covid3Month",
]

for problem in small_problems:
    X, y = load_regression(name=problem)
    print(problem, X.shape, y.shape)
CardanoSentiment (107, 2, 24) (107,)
NaturalGasPricesSentiment (93, 1, 20) (93,)
Covid3Month (201, 1, 84) (201,)
BinanceCoinSentiment (263, 2, 24) (263,)
Covid19Andalusia (204, 1, 91) (204,)

This stores the data in a format like this

If you call the function again, it will load from disk rather than downloading again. You can specify train/test splits.

[23]:
for problem in small_problems:
    trainX, trainy = load_regression(name=problem, split="train")
    print(problem, X.shape, y.shape)
CardanoSentiment (204, 1, 91) (204,)
NaturalGasPricesSentiment (204, 1, 91) (204,)
Covid3Month (204, 1, 91) (204,)
BinanceCoinSentiment (204, 1, 91) (204,)
Covid19Andalusia (204, 1, 91) (204,)

Evaluating a regressor on benchmark data

With the data, it is easy to assess an algorithm performance. We will use the DummyRegressor as a baseline, and the default scoring

[24]:
from sklearn.metrics import mean_squared_error

from aeon.regression import DummyRegressor

dummy = DummyRegressor()
performance = []
for problem in small_problems:
    trainX, trainy = load_regression(name=problem, split="train")
    dummy.fit(trainX, trainy)
    testX, testy = load_regression(name=problem, split="test")
    predictions = dummy.predict(testX)
    mse = mean_squared_error(testy, predictions)
    performance.append(mse)
    print(problem, " Dummy score = ", mse)
CardanoSentiment  Dummy score =  0.09015657223327135
NaturalGasPricesSentiment  Dummy score =  0.008141822846139452
Covid3Month  Dummy score =  0.0019998715745554777
BinanceCoinSentiment  Dummy score =  0.1317760422312482
Covid19Andalusia  Dummy score =  0.0009514194090128098

Comparing to published results

How does the dummy compare to the published results in [2]? We can use the method get_estimator_results to obtain published results.

[25]:
from aeon.benchmarking import get_available_estimators, get_estimator_results

print(get_available_estimators(task="regression"))
results = get_estimator_results(
    estimators=["DrCIF", "FreshPRINCE"],
    task="regression",
    datasets=small_problems,
    measure="mse",
)
print(results)
                1NN-DTW
0                1NN-ED
1               5NN-DTW
2                5NN-ED
3                   CNN
4                 DrCIF
5                   FCN
6                  FPCR
7         FPCR-b-spline
8           FreshPRINCE
9               GridSVR
10        InceptionTime
11                RandF
12               ResNet
13                Ridge
14               ROCKET
15                 RotF
16  SingleInceptionTime
17              XGBoost
{'DrCIF': {'CardanoSentiment': 0.0982120290102569, 'NaturalGasPricesSentiment': 0.0028579077510607, 'Covid3Month': 0.0018498023495186, 'BinanceCoinSentiment': 0.1147393141970096, 'Covid19Andalusia': 0.0002131578438176}, 'FreshPRINCE': {'CardanoSentiment': 0.0837979724566994, 'NaturalGasPricesSentiment': 0.0030199503412975, 'Covid3Month': 0.0016153407842645, 'BinanceCoinSentiment': 0.1153756755654242, 'Covid19Andalusia': 0.0001865776186658}}

this is organised as a dictionary of dictionaries. because we cannot be sure all results are present for all datasets.

[26]:
from aeon.benchmarking import get_estimator_results_as_array

results, names = get_estimator_results_as_array(
    estimators=["DrCIF", "FreshPRINCE"],
    task="regression",
    datasets=small_problems,
    measure="mse",
)
print(results)
print(names)
[[0.09821203 0.08379797]
 [0.00285791 0.00301995]
 [0.0018498  0.00161534]
 [0.11473931 0.11537568]
 [0.00021316 0.00018658]]
['CardanoSentiment', 'NaturalGasPricesSentiment', 'Covid3Month', 'BinanceCoinSentiment', 'Covid19Andalusia']

we just need to align our results from the website so they are aligned with the results from our dummy regressor

[27]:
import numpy as np

paired_sorted = sorted(zip(names, results))
names, _ = zip(*paired_sorted)
sorted_rows = [row for _, row in paired_sorted]
sorted_results = np.array(sorted_rows)
print(names)
print(sorted_results)
('BinanceCoinSentiment', 'CardanoSentiment', 'Covid19Andalusia', 'Covid3Month', 'NaturalGasPricesSentiment')
[[0.11473931 0.11537568]
 [0.09821203 0.08379797]
 [0.00021316 0.00018658]
 [0.0018498  0.00161534]
 [0.00285791 0.00301995]]

Do the same for our dummy regressor results

[28]:
paired = sorted(zip(small_problems, performance))
small_problems, performance = zip(*paired)
print(small_problems)
print(performance)
all_results = np.column_stack((sorted_results, performance))
print(all_results)
regressors = ["DrCIF", "FreshPRINCE", "Dummy"]
('BinanceCoinSentiment', 'CardanoSentiment', 'Covid19Andalusia', 'Covid3Month', 'NaturalGasPricesSentiment')
(0.1317760422312482, 0.09015657223327135, 0.0009514194090128098, 0.0019998715745554777, 0.008141822846139452)
[[0.11473931 0.11537568 0.13177604]
 [0.09821203 0.08379797 0.09015657]
 [0.00021316 0.00018658 0.00095142]
 [0.0018498  0.00161534 0.00199987]
 [0.00285791 0.00301995 0.00814182]]

Comparing Regressors

aeon provides visualisation tools to compare regressors.

Comparing two regressors

We can plot the results against each other. This also presents the wins and losses and some summary statistics.

[31]:
from aeon.visualisation import plot_pairwise_scatter

fig, ax = plot_pairwise_scatter(
    all_results[:, 1],
    all_results[:, 2],
    "FreshPRINCE",
    "Dummy",
    metric="mse",
    lower_better=True,
)
C:\Code\aeon\aeon\visualisation\results_plotting.py:476: UserWarning: This figure includes Axes that are not compatible with tight_layout, so results might be incorrect.
  fig.tight_layout()
../../_images/examples_benchmarking_regression_20_1.png

Comparing multiple regressors

We can plot the results of multiple regressors on a critical difference diagram, which shows the average rank and groups estimators by whether they are significantly different from each other.

[33]:
from aeon.visualisation import plot_critical_difference

res = plot_critical_difference(
    all_results,
    regressors,
    lower_better=True,
)
../../_images/examples_benchmarking_regression_22_0.png
[36]:
from aeon.visualisation import plot_boxplot_median

res = plot_boxplot_median(
    all_results,
    regressors,
)
../../_images/examples_benchmarking_regression_23_0.png

Generated using nbsphinx. The Jupyter notebook can be found here.