binder

Hierarchical, Global, and Panel Forecasting with aeon

Overview of this notebook

  • introduction - hierarchical time series

  • hierarchical and panel data format in aeon

  • automated vectorization of forecasters and transformers

  • hierarchical reconciliation in aeon

  • hierarchical/global forecasting using summarization/reduction models, “M5 winner”

  • extender guide

Hierarchical time series

Time series are often present in “hierarchical” form, example:

arrow heads

Examples include: * Product sales in different categories (e.g. M5 time series competition) * Tourism demand in different regions * Balance sheet structures across cost centers / accounts

Many hierarchical time series datasets can be found here: https://forecastingdata.org/ (access loader in development)

For literature see also: https://otexts.com/fpp2/hierarchical.html

A time series can also exhibit multiple independent hierarchies:

arrow heads


Representation of hierarchical and panel datatypes

aeon distinguishes as abstract scientific data types for time series (=”scitypes”):

  • Series type = one time series (uni or multivariate)

  • Panel type = collection of multiple time series, flat hierarchy

  • Hierarchical type = hierarchical collection of time series, as above

Each scitype has multiple mtype representations.

For simplicitly, we focus only on pandas.DataFrame based representations below.

[1]:
# import to retrieve examples
import warnings

warnings.filterwarnings("ignore")
from aeon.testing.utils.data_gen import get_examples

Individual time series - Series scitype, "pd.DataFrame" mtype

In the "pd.DataFrame" mtype, time series are represented by an in-memory container obj: pandas.DataFrame as follows.

  • structure convention: obj.index must be monotonous, and one of Int64Index, RangeIndex, DatetimeIndex, PeriodIndex.

  • variables: columns of obj correspond to different variables

  • variable names: column names obj.columns

  • time points: rows of obj correspond to different, distinct time points

  • time index: obj.index is interpreted as a time index.

  • capabilities: can represent multivariate series; can represent unequally spaced series

Example of a univariate series in "pd.DataFrame" representation.
This series has two variables, named "a" and "b". Both are observed at the same four timepoints 0, 1, 2, 3.
[2]:
get_examples("pd.DataFrame")[0]
[2]:
a
0 1.0
1 4.0
2 0.5
3 -3.0
Example of a bivariate series in "pd.DataFrame" representation.
This series has two variables, named "a" and "b". Both are observed at the same four timepoints 0, 1, 2, 3.
[3]:
get_examples("pd.DataFrame")[1]
[3]:
a b
0 1.0 3.000000
1 4.0 7.000000
2 0.5 2.000000
3 -3.0 -0.428571

Panels (= flat collections) of time series - Panel scitype, "pd-multiindex" mtype

In the "pd-multiindex" mtype, time series panels are represented by an in-memory container obj: pandas.DataFrame as follows.

  • structure convention: obj.index must be a pair multi-index of type (RangeIndex, t), where t is one of Int64Index, RangeIndex, DatetimeIndex, PeriodIndex and monotonous.

  • instances: rows with the same "instances" index correspond to the same instance; rows with different "instances" index correspond to different instances.

  • instance index: the first element of pairs in obj.index is interpreted as an instance index.

  • variables: columns of obj correspond to different variables

  • variable names: column names obj.columns

  • time points: rows of obj with the same "timepoints" index correspond to the same time point; rows of obj with different "timepoints" index correspond correspond to the different time points.

  • time index: the second element of pairs in obj.index is interpreted as a time index.

  • capabilities: can represent panels of multivariate series; can represent unequally spaced series; can represent panels of unequally supported series; cannot represent panels of series with different sets of variables.

Example of a panel of multivariate series in "pd-multiindex" mtype representation. The panel contains three multivariate series, with instance indices 0, 1, 2. All series have two variables with names "var_0", "var_1". All series are observed at three time points 0, 1, 2.

[4]:
get_examples("pd-multiindex")[0]
[4]:
var_0 var_1
instances timepoints
0 0 1 4
1 2 5
2 3 6
1 0 1 4
1 2 55
2 3 6
2 0 1 42
1 2 5
2 3 6

Hierarchical time series - Hierarchical scitype, "pd_multiindex_hier" mtype

  • structure convention: obj.index must be a 3 or more level multi-index of type (RangeIndex, ..., RangeIndex, t), where t is one of Int64Index, RangeIndex, DatetimeIndex, PeriodIndex and monotonous. We call the last index the “time-like” index

  • hierarchy level: rows with the same non-time-like indices correspond to the same hierarchy unit; rows with different non-time-like index combination correspond to different hierarchy unit.

  • hierarchy: the non-time-like indices in obj.index are interpreted as a hierarchy identifying index.

  • variables: columns of obj correspond to different variables

  • variable names: column names obj.columns

  • time points: rows of obj with the same "timepoints" index correspond to the same time point; rows of obj with different "timepoints" index correspond correspond to the different time points.

  • time index: the last element of tuples in obj.index is interpreted as a time index.

  • capabilities: can represent hierarchical series; can represent unequally spaced series; can represent unequally supported hierarchical series; cannot represent hierarchical series with different sets of variables.

[5]:
X = get_examples("pd_multiindex_hier")[0]
X
[5]:
var_0 var_1
foo bar timepoints
a 0 0 1 4
1 2 5
2 3 6
1 0 1 4
1 2 55
2 3 6
2 0 1 42
1 2 5
2 3 6
b 0 0 1 4
1 2 5
2 3 6
1 0 1 4
1 2 55
2 3 6
2 0 1 42
1 2 5
2 3 6
[6]:
X.index.get_level_values(level=-1)
[6]:
Index([0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2], dtype='int64', name='timepoints')

Automated vectorization (casting) of forecasters and transformers

Many transformers and forecasters implemented for single series

aeon automatically vectorizes/”up-casts” them to hierarchical data

“apply by individual series/panel in the hierarchical structure”

constructing a hierarchical time series:

[7]:
from aeon.testing.utils.data_gen import _make_hierarchical

y = _make_hierarchical()
y
[7]:
c0
h0 h1 time
h0_0 h1_0 2000-01-01 4.243268
2000-01-02 2.799669
2000-01-03 2.332597
2000-01-04 3.796075
2000-01-05 4.111372
... ... ... ...
h0_1 h1_3 2000-01-08 3.336429
2000-01-09 1.760840
2000-01-10 2.270776
2000-01-11 1.739318
2000-01-12 3.023744

96 rows × 1 columns

all forecasters and non-collection transformers are applicable to this! if forecaster implements logic for single series only, “apply to all hierarchy units”

[8]:
from aeon.forecasting.arima import ARIMA

forecaster = ARIMA()

y_pred = forecaster.fit(y, fh=[1, 2]).predict()
y_pred
[8]:
c0
h0 h1 time
h0_0 h1_0 2000-01-13 3.383312
2000-01-14 3.390127
h1_1 2000-01-13 2.934019
2000-01-14 2.956622
h1_2 2000-01-13 3.014593
2000-01-14 2.853883
h1_3 2000-01-13 2.155107
2000-01-14 3.517850
h0_1 h1_0 2000-01-13 3.141559
2000-01-14 2.919211
h1_1 2000-01-13 3.529239
2000-01-14 3.613187
h1_2 2000-01-13 3.357147
2000-01-14 3.372126
h1_3 2000-01-13 2.971088
2000-01-14 2.971939

individual forecasters fitted per hierarchy level are stored in the forecasters_ attribute this contains a pandas.DataFrame in the same format as the hierarchy levels, and fitted forecasters inside (for transformers, the attribute is called transformers_)

[9]:
forecaster.forecasters_
[9]:
forecasters
h0 h1
h0_0 h1_0 ARIMA()
h1_1 ARIMA()
h1_2 ARIMA()
h1_3 ARIMA()
h0_1 h1_0 ARIMA()
h1_1 ARIMA()
h1_2 ARIMA()
h1_3 ARIMA()

to access or inspect individual forecasters, access elements of the forecasters_ data frame:

[10]:
forecaster.forecasters_.iloc[0, 0].summary()
[10]:
SARIMAX Results
Dep. Variable: y No. Observations: 12
Model: SARIMAX(1, 0, 0) Log Likelihood -10.861
Date: Sun, 28 Jan 2024 AIC 27.723
Time: 17:37:07 BIC 29.177
Sample: 01-01-2000 HQIC 27.184
- 01-12-2000
Covariance Type: opg
coef std err z P>|z| [0.025 0.975]
intercept 3.1868 1.268 2.514 0.012 0.702 5.671
ar.L1 0.0601 0.374 0.161 0.872 -0.673 0.793
sigma2 0.3578 0.318 1.124 0.261 -0.266 0.982
Ljung-Box (L1) (Q): 0.01 Jarque-Bera (JB): 1.02
Prob(Q): 0.91 Prob(JB): 0.60
Heteroskedasticity (H): 0.24 Skew: -0.17
Prob(H) (two-sided): 0.20 Kurtosis: 1.62


Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).

auto-vectorization is also performed for probabilistic forecasts (see notebook 2):

[11]:
forecaster.predict_interval()
[11]:
Coverage
0.9
lower upper
h0 h1 time
h0_0 h1_0 2000-01-13 2.399463 4.367160
2000-01-14 2.404504 4.375749
h1_1 2000-01-13 1.593624 4.274414
2000-01-14 1.616028 4.297217
h1_2 2000-01-13 1.755286 4.273899
2000-01-14 1.557800 4.149966
h1_3 2000-01-13 1.054305 3.255910
2000-01-14 2.233164 4.802536
h0_1 h1_0 2000-01-13 1.580965 4.702153
2000-01-14 1.275854 4.562567
h1_1 2000-01-13 2.287797 4.770680
2000-01-14 2.365405 4.860969
h1_2 2000-01-13 2.340769 4.373525
2000-01-14 2.134796 4.609455
h1_3 2000-01-13 1.272842 4.669334
2000-01-14 1.273471 4.670406

which level a forecaster/natively implements and when vectorization occurs is determined by the “inner type” attributes.

For forecasters, inspect y_inner_type, this is a list of internally supported data types:

[12]:
forecaster.get_tag("y_inner_type")
[12]:
'pd.Series'
ARIMA supports only one Series level mtype. For a register of all mtype codes and their corresponding scitype (series, panel or hierarchical),
see aeon.datatypes.TYPE_REGISTER (convert to pandas.DataFrame for pretty-printing)
[13]:
import pandas as pd

from aeon.datatypes import TYPE_REGISTER

pd.DataFrame(TYPE_REGISTER)
[13]:
0 1 2
0 pd.Series Series pd.Series representation of a univariate series
1 pd.DataFrame Series pd.DataFrame representation of a uni- or multi...
2 np.ndarray Series 2D numpy.ndarray with rows=samples, cols=varia...
3 xr.DataArray Series xr.DataArray representation of a uni- or multi...
4 dask_series Series xdas representation of a uni- or multivariate ...
5 nested_univ Panel pd.DataFrame with one column per channel, pd.S...
6 numpy3D Panel 3D np.ndarray of format (n_cases, n_channels, ...
7 numpy2D Panel 2D np.ndarray of format (n_cases, n_timepoints)
8 pd-multiindex Panel pd.DataFrame with multi-index (instances, time...
9 pd-wide Panel pd.DataFrame in wide format, cols = (instance*...
10 pd-long Panel pd.DataFrame in long format, cols = (index, ti...
11 df-list Panel list of pd.DataFrame
12 dask_panel Panel dask frame with one instance and one time inde...
13 np-list Panel list of length [n_cases], each case a 2D np.nd...
14 pd_multiindex_hier Hierarchical pd.DataFrame with MultiIndex
15 dask_hierarchical Hierarchical dask frame with multiple hierarchical indices,...
16 pd_DataFrame_Table Table pd.DataFrame representation of a data table
17 numpy1D Table 1D np.narray representation of a univariate table
18 numpy_Table Table 2D np.narray representation of a univariate table
19 pd_Series_Table Table pd.Series representation of a data table
20 list_of_dict Table list of dictionaries with primitive entries
21 pred_interval Proba predictive intervals
22 pred_quantiles Proba quantile predictions
23 pred_var Proba variance predictions

roadmap items:

  • interfacing and implementing common methods with native hierarchical support

    • ARIMA using hierarchy factors, GEE, mixed effects

  • wrappers for “using hierarchy levels” as covariates or exogeneous features

  • full vectorization over variables to render univariate forecasters multivariate

contributions are appreciated!


Hierarchical reconciliation

Why do we need hierarchical reconciliation?

forecast reconciliation = ensuring that linear hierarchy dependencies are met, e.g., “sum of individual shop sales in Berlin must equal sum of total sales in Berlin” requires hierarchical (or panel) data, usually involves totals

Bottom up reconciliation works by producing only forecasts at the lowest level and then adding up to totals across all higher levels.

* Arguably the most simple of all algorithms to reconcile across hierarchies.
* Advantages: easy to implement
* Disadvantages: lower levels of hierarchy are prone to excess volatility. This excess volatility is aggregated up, often producing much less accurate top level forecasts.

Top down reconciliation works by producing top level forecasts and breaking them down to the lowest level based e.g. on relative proportions of those lower levels.

* Advantages: still easy to implement, top level is stable
* Disadvantages: peculiarities of lower levels of hierarchy ignored

Optimal forecast reconciliation works by producing forecasts at all levels of the hierarchy and adjusting all of them in a way that seeks to minimize the forecast errors

* Advantages: often found to be most accurate implementation
* Disadvantages: more difficult to implement

aeon provides functionality for reconciliation:

  • data container convention for node-wise aggregates

  • functionality to compute node-wise aggregates - Aggregator

    • can be used for bottom-up reconciliation

  • transformer implementing reconiliation logic - Reconciler

    • implements top-down reconciliation

    • implements transformer-like optimal forecast reconciliation

The node-wise aggregate data format

aeon uses a special case of the pd_multiindex_hier format to store node-wise aggregates:

  • a __total index element in an instance (non-time-like) level indicates summation over all instances below that level

  • the __total index element is reserved and cannot be used for anything else

  • entries below a __total index element are sums of entries over all other instances in the same levels where a __total element is found

The aggregation transformer

The node-wise aggregated format can be obtained by applying the Aggregator transformer.

In a pipeline with non-aggregate dinput, this allows making forecasts by totals.

[14]:
from aeon.testing.utils.data_gen import get_examples

y_hier = get_examples("pd_multiindex_hier")[1]
y_hier
[14]:
var_0
foo bar timepoints
a 0 0 1
1 2
2 3
1 0 1
1 2
2 3
2 0 1
1 2
2 3
b 0 0 1
1 2
2 3
1 0 1
1 2
2 3
2 0 1
1 2
2 3
[15]:
from aeon.transformations.hierarchical.aggregate import Aggregator

Aggregator().fit_transform(y_hier)
[15]:
var_0
foo bar timepoints
__total __total 0 6
1 12
2 18
a 0 0 1
1 2
2 3
1 0 1
1 2
2 3
2 0 1
1 2
2 3
__total 0 3
1 6
2 9
b 0 0 1
1 2
2 3
1 0 1
1 2
2 3
2 0 1
1 2
2 3
__total 0 3
1 6
2 9

If used at the start of a pipeline, forecasts are made for node __total-s as well as individual instances.

Note: in general, this does not result in a reconciled forecast, i.e., forecast totals will not add up.

[16]:
from aeon.forecasting.naive import NaiveForecaster

pipeline_to_forecast_totals = Aggregator() * NaiveForecaster()

pipeline_to_forecast_totals.fit(y_hier, fh=[1, 2])
pipeline_to_forecast_totals.predict()
[16]:
var_0
foo bar timepoints
__total __total 3 18
4 18
a 0 3 3
4 3
1 3 3
4 3
2 3 3
4 3
__total 3 9
4 9
b 0 3 3
4 3
1 3 3
4 3
2 3 3
4 3
__total 3 9
4 9

If used at the end of a pipeline, forecasts are reconciled bottom-up.

That will result in a reconciled forecast, although bottom-up may not be the method of choice.

[17]:
from aeon.forecasting.naive import NaiveForecaster

pipeline_to_forecast_totals = NaiveForecaster() * Aggregator()

pipeline_to_forecast_totals.fit(y_hier, fh=[1, 2])
pipeline_to_forecast_totals.predict()
[17]:
var_0
foo bar timepoints
__total __total 3 18
4 18
a 0 3 3
4 3
1 3 3
4 3
2 3 3
4 3
__total 3 9
4 9
b 0 3 3
4 3
1 3 3
4 3
2 3 3
4 3
__total 3 9
4 9

Advanced reconciliation

For transformer-like reconciliation, use the Reconciler. It supports advanced techniques such as OLS and WLS:

[18]:
from aeon.transformations.hierarchical.reconcile import Reconciler

pipeline_with_reconciliation = (
    Aggregator() * NaiveForecaster() * Reconciler(method="ols")
)
[19]:
pipeline_to_forecast_totals.fit(y_hier, fh=[1, 2])
pipeline_to_forecast_totals.predict()
[19]:
var_0
foo bar timepoints
__total __total 3 18
4 18
a 0 3 3
4 3
1 3 3
4 3
2 3 3
4 3
__total 3 9
4 9
b 0 3 3
4 3
1 3 3
4 3
2 3 3
4 3
__total 3 9
4 9

Roadmap items:

  • reconciliation of wrapper type

  • reconciliation & global forecasting

  • probabilistic reconciliation


Global/panel forecasting - introduction

Global forecasting = training across sets of time series, i.e., on time series panels. Typically better than “fit one forecaster per time series instance”.

Also called “panel forecasting” for homogeneous/contemporaneous index sets.

Why does global forecasting matter? * In practice, we often have time series of limited range * Estimation is difficult, and we cannot model complex dependencies * Assumption of global forecasting: We can observe the identical data generating process (DGP) multiple times * Non-identical DGPs can be fine too, as long as the degree of dissimilarity is captured by exogeneous information * Now we have much more information and can estimate more reliably and more complex models (caveat: unless complexity is purely driven by time dynamics)

As a result of these advantages, global forecasting models have been very successful in competition, e.g. * Rossmann Store Sales * Walmart Sales in Stormy Weather * M5 competition

Many business problems in practice are essentially global forecasting problem - often also reflecting hierarchical information (see above) * Product sales in different categories (e.g. M5 time series competition) * Balance sheet structures across cost centers / accounts * Dynamics of pandemics observed at different points in time

Distinction to multivariate forecasting * Multivariate forecasting focuses on modeling interdependence between time series * Global can model interdependence, but focus lies on enhancing observation space

Implementation in aeon * Multivariate forecasting models are supported in aeon via ? VAR… Global forecasting

For the following example we will use the "pd-multiindex" representation of the "Panel" scitype (see above)

Row multiindex level 0 is the unique identifier for the individual time series, level 1 contains the time index.

[20]:
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn.pipeline import make_pipeline

from aeon.forecasting.compose import make_reduction
from aeon.forecasting.model_selection import temporal_train_test_split
from aeon.transformations.date import DateTimeFeatures

pd.options.mode.chained_assignment = None
pd.set_option("display.max_columns", None)

# %%
# Load M5 Data and prepare
y = pd.read_pickle("global_fc/y.pkl")
X = pd.read_pickle("global_fc/X.pkl")

y/X are based on the M5 competition. The data features sales of products in different stores, different states and different product categories.

For a detailed analysis of the competition, please take a look at the paper “M5 accuracy competition: Results, findings, and conclusions”.

https://doi.org/10.1016/j.ijforecast.2021.11.013

You can see a glimpse of the data here:

[21]:
print(y.head())
print(X.head())
                            y
instances timepoints
1         2016-03-15   756.67
          2016-03-16   679.13
          2016-03-17   633.40
          2016-03-18  1158.04
          2016-03-19   914.24
                      dept_id  cat_id  store_id  state_id  event_name_1  \
instances timepoints
1         2016-03-15        1       1        10         3             1
          2016-03-16        1       1        10         3             1
          2016-03-17        1       1        10         3             7
          2016-03-18        1       1        10         3             1
          2016-03-19        1       1        10         3             1

                      event_type_1  event_name_2  event_type_2  snap  \
instances timepoints
1         2016-03-15             1             1             1     3
          2016-03-16             1             1             1     0
          2016-03-17             3             1             1     0
          2016-03-18             1             1             1     0
          2016-03-19             1             1             1     0

                      no_stock_days
instances timepoints
1         2016-03-15              0
          2016-03-16              0
          2016-03-17              0
          2016-03-18              0
          2016-03-19              0
  • time series grouped via the instances argument in the first column = Panel

  • focus on modeling individual products

  • hierarchical information is provided as exgoneous information.

For the M5 competition, winning solution used exogeneous features about the hierarchies like "dept_id", "store_id" etc. to capture similarities and dissimilarities of the products. Other features include holiday events and snap days (specific assisstance program of US social security paid on certain days).

now split into test and train sets using temporal_train_test_split. We can cut every instance of the time series individually:

[22]:
y_train, y_test, X_train, X_test = temporal_train_test_split(y, X)
print(y_train.head(5))
print(y_test.head(5))
                            y
instances timepoints
1         2016-03-15   756.67
          2016-03-16   679.13
          2016-03-17   633.40
          2016-03-18  1158.04
          2016-03-19   914.24
                            y
instances timepoints
1         2016-04-14   874.57
          2016-04-15   895.29
          2016-04-16  1112.63
          2016-04-17  1014.86
          2016-04-18   691.91

both y and X are split in the same way, and hierarchy structures are preserved.

Rationale for tree-based models

Tree ensembles exploit complex non-linear relationships / dependencies between time series and covariates.

In univariate time series forecasting, tree-based models often do not perform well due to lack of data.

Due to large effective sample sizes in global forecasting , tree ensembles can become a good choice (e.g., 42,840 time series in the M5 competition).

aeon can interface any sklearn compatible model via reduction, e.g., RandomForestRegressor.

[23]:
regressor = make_pipeline(
    RandomForestRegressor(random_state=1),
)

Caveat: reduction applies a supervised regressor to a time series, i.e., to a task for which they were not originally designed.

This issue can be addressed by generating features that capture the dynamics of the time series.
"WindowSummarizer" can be used to generate features useful for time series forecasting,
based on a provided dictionary of functions, window shifts and window lengths:
[24]:
import pandas as pd

from aeon.forecasting.base import ForecastingHorizon
from aeon.forecasting.compose import ForecastingPipeline
from aeon.transformations.summarize import WindowSummarizer

kwargs = {
    "lag_feature": {
        "lag": [1],
        "mean": [[1, 3], [3, 6]],
        "std": [[1, 4]],
    }
}

transformer = WindowSummarizer(**kwargs)
y_transformed = transformer.fit_transform(y_train)

y_transformed.head(10)
[24]:
y_lag_1 y_mean_1_3 y_mean_3_6 y_std_1_4
instances timepoints
1 2016-03-15 NaN NaN NaN NaN
2016-03-16 756.67 NaN NaN NaN
2016-03-17 679.13 NaN NaN NaN
2016-03-18 633.40 689.733333 NaN NaN
2016-03-19 1158.04 823.523333 NaN 239.617572
2016-03-20 914.24 901.893333 NaN 241.571143
2016-03-21 965.27 1012.516667 NaN 216.690775
2016-03-22 630.77 836.760000 NaN 217.842052
2016-03-23 702.79 766.276667 851.125000 161.669232
2016-03-24 728.15 687.236667 830.141667 145.007117

The notation "mean": [[1, 3]] (captured in the column "y_mean_1_3") is read as:

summarization function "mean": [[1, 3]] is applied to:

  • window of length 3

  • start (inclusive) is lagged by one period

Visualization:

For z = target time stamp:

window = [1, 3], means a lag of 1 and window_length of 3, selecting the three last days (exclusive z).

Summarization is done across windows like this:


By default, "WindowSummarizer" uses pandas rolling window functions to allow for a speedy generation of features. * “sum”, * “mean”, * “median”, * “std”, * “var”, * “kurt”, * “min”, * “max”, * “corr”, * “cov”, * “skew”, * “sem”

typically very fast since optimized for rolling, grouped operations

In the M5 competition, arguably the most relevant features were:

  • mean calculations to capture level shifts, e.g. last week sales, sales of the week prior to the last month etc.

  • standard deviation to capture increases / decreases in volatility in sales, and how it impacts future sales

  • rolling skewness / kurtosis calculations, to capture changes in store sales tendencies.

  • various different calculations to capture periods of zero sales (e.g. out of stock scenarios)

First three are available via native pandas functions.
WindowSummarizer can also be provided with arbitrary summarizer functions.

Example: function count_gt130 to count how many observations lie above the threshold of 130 within a window of length 3, lagged by 2 periods.

[25]:
import numpy as np


def count_gt130(x):
    """Count how many observations lie above threshold 130."""
    return np.sum((x > 700)[::-1])


kwargs = {
    "lag_feature": {
        "lag": [1],
        count_gt130: [[2, 3]],
        "std": [[1, 4]],
    }
}

transformer = WindowSummarizer(**kwargs)
y_transformed = transformer.fit_transform(y_train)

y_transformed.head(10)
[25]:
y_lag_1 y_count_gt130_2_3 y_std_1_4
instances timepoints
1 2016-03-15 NaN NaN NaN
2016-03-16 756.67 NaN NaN
2016-03-17 679.13 NaN NaN
2016-03-18 633.40 NaN NaN
2016-03-19 1158.04 1.0 239.617572
2016-03-20 914.24 1.0 241.571143
2016-03-21 965.27 2.0 216.690775
2016-03-22 630.77 3.0 217.842052
2016-03-23 702.79 2.0 161.669232
2016-03-24 728.15 2.0 145.007117

Above applies "WindowSummarizer" to the forecasting targey y.

To apply "WindowSummarizer" to columns in X, use "WindowSummarizer" within a "ForecastingPipeline" and specify "target_cols".

In the M5 competition, lagging of exogeneous features was especially useful for lags around holiday dummies (often sales are affected for a few days before and after major holidays) as well as changes in item prices (discounts as well as persistent price changes)

[26]:
from aeon.datasets import load_longley
from aeon.forecasting.naive import NaiveForecaster

y_ll, X_ll = load_longley()
y_train_ll, y_test_ll, X_train_ll, X_test_ll = temporal_train_test_split(y_ll, X_ll)
fh = ForecastingHorizon(X_test_ll.index, is_relative=False)
# Example transforming only X
pipe = ForecastingPipeline(
    steps=[
        ("a", WindowSummarizer(n_jobs=1, target_cols=["POP", "GNPDEFL"])),
        ("b", WindowSummarizer(n_jobs=1, target_cols=["GNP"], **kwargs)),
        ("forecaster", NaiveForecaster(strategy="drift")),
    ]
)
pipe_return = pipe.fit(y_train_ll, X_train_ll)
y_pred1 = pipe_return.predict(fh=fh, X=X_test_ll)

y_pred1
[26]:
Period
1959    67075.727273
1960    67638.454545
1961    68201.181818
1962    68763.909091
Freq: A-DEC, dtype: float64
For efficiency, relevant features are computed in a parallel way.
For maximum parallelization, pass WindowSummarizer as a single transformer within make_reduction.

In this case, window_length is inferred from the WindowSummarizer and need not be passed to make_reduction.

[27]:
forecaster = make_reduction(
    regressor,
    transformers=[WindowSummarizer(**kwargs, n_jobs=1)],
    window_length=None,
    strategy="recursive",
    pooling="global",
)

Concepts relating to calendar seasonalities need to be provided by means of feature engineering to most models. Examples:

  • day of the week effects historically observed for stock prices (prices on Fridays used to differ from Monday prices).

  • used car prices being higher in spring than in summer

  • spendings at the beginning of the month differing from end of month due to salary effects.

Calendar seasonalities can be modeled by means of dummy variables or fourier terms. As a rule of thumb, use dummy variables for discontinous effects and fourier/period/seasonality terms when you believe there is a certain degree of smoothness in the seasonality.

aeon supports calendar dummy features via the DateTimeFeatures transformer. Manually specify the desired seasonality, or provide base frequency of the time series (daily, weekly etc.) and the desired complexity (few vs many features), DateTimeFeatures can infer sensible seasonality.

[28]:
transformer = DateTimeFeatures(ts_freq="D")
X_hat = transformer.fit_transform(X_train)

new_cols = [i for i in X_hat if i not in X_train.columns]
X_hat[new_cols]
[28]:
year month_of_year day_of_week
instances timepoints
1 2016-03-15 2016 3 1
2016-03-16 2016 3 2
2016-03-17 2016 3 3
2016-03-18 2016 3 4
2016-03-19 2016 3 5
2016-03-20 2016 3 6
2016-03-21 2016 3 0
2016-03-22 2016 3 1
2016-03-23 2016 3 2
2016-03-24 2016 3 3
2016-03-25 2016 3 4
2016-03-26 2016 3 5
2016-03-27 2016 3 6
2016-03-28 2016 3 0
2016-03-29 2016 3 1
2016-03-30 2016 3 2
2016-03-31 2016 3 3
2016-04-01 2016 4 4
2016-04-02 2016 4 5
2016-04-03 2016 4 6
2016-04-04 2016 4 0
2016-04-05 2016 4 1
2016-04-06 2016 4 2
2016-04-07 2016 4 3
2016-04-08 2016 4 4
2016-04-09 2016 4 5
2016-04-10 2016 4 6
2016-04-11 2016 4 0
2016-04-12 2016 4 1
2016-04-13 2016 4 2
2 2016-03-15 2016 3 1
2016-03-16 2016 3 2
2016-03-17 2016 3 3
2016-03-18 2016 3 4
2016-03-19 2016 3 5
2016-03-20 2016 3 6
2016-03-21 2016 3 0
2016-03-22 2016 3 1
2016-03-23 2016 3 2
2016-03-24 2016 3 3
2016-03-25 2016 3 4
2016-03-26 2016 3 5
2016-03-27 2016 3 6
2016-03-28 2016 3 0
2016-03-29 2016 3 1
2016-03-30 2016 3 2
2016-03-31 2016 3 3
2016-04-01 2016 4 4
2016-04-02 2016 4 5
2016-04-03 2016 4 6
2016-04-04 2016 4 0
2016-04-05 2016 4 1
2016-04-06 2016 4 2
2016-04-07 2016 4 3
2016-04-08 2016 4 4
2016-04-09 2016 4 5
2016-04-10 2016 4 6
2016-04-11 2016 4 0
2016-04-12 2016 4 1
2016-04-13 2016 4 2

DateTimeFeatures supports the following frequencies: * Y - year * Q - quarter * M - month * W - week * D - day * H - hour * T - minute * S - second * L - millisecond

You can specify the manual generation of dummy features with the notation e.g. “day_of_month”, “day_of_week”, “week_of_quarter”.

[29]:
transformer = DateTimeFeatures(manual_selection=["week_of_month", "day_of_quarter"])
X_hat = transformer.fit_transform(X_train)

new_cols = [i for i in X_hat if i not in X_train.columns]
X_hat[new_cols]
[29]:
day_of_quarter week_of_month
instances timepoints
1 2016-03-15 75 3
2016-03-16 76 3
2016-03-17 77 3
2016-03-18 78 3
2016-03-19 79 3
2016-03-20 80 3
2016-03-21 81 3
2016-03-22 82 4
2016-03-23 83 4
2016-03-24 84 4
2016-03-25 85 4
2016-03-26 86 4
2016-03-27 87 4
2016-03-28 88 4
2016-03-29 89 5
2016-03-30 90 5
2016-03-31 91 5
2016-04-01 1 1
2016-04-02 2 1
2016-04-03 3 1
2016-04-04 4 1
2016-04-05 5 1
2016-04-06 6 1
2016-04-07 7 1
2016-04-08 8 2
2016-04-09 9 2
2016-04-10 10 2
2016-04-11 11 2
2016-04-12 12 2
2016-04-13 13 2
2 2016-03-15 75 3
2016-03-16 76 3
2016-03-17 77 3
2016-03-18 78 3
2016-03-19 79 3
2016-03-20 80 3
2016-03-21 81 3
2016-03-22 82 4
2016-03-23 83 4
2016-03-24 84 4
2016-03-25 85 4
2016-03-26 86 4
2016-03-27 87 4
2016-03-28 88 4
2016-03-29 89 5
2016-03-30 90 5
2016-03-31 91 5
2016-04-01 1 1
2016-04-02 2 1
2016-04-03 3 1
2016-04-04 4 1
2016-04-05 5 1
2016-04-06 6 1
2016-04-07 7 1
2016-04-08 8 2
2016-04-09 9 2
2016-04-10 10 2
2016-04-11 11 2
2016-04-12 12 2
2016-04-13 13 2

Putting it all together

Using the "WindowSummarizer", "DateTimeFeatures" and the "make_reduction" function we can now set up a working example of a an end to end global forecasting pipeline based on a sample of the M5 competition data:

[30]:
pipe = ForecastingPipeline(
    steps=[
        (
            "event_dynamics",
            WindowSummarizer(
                n_jobs=-1, **kwargs, target_cols=["event_type_1", "event_type_2"]
            ),
        ),
        ("snap_dynamics", WindowSummarizer(n_jobs=-1, target_cols=["snap"])),
        ("daily_season", DateTimeFeatures(ts_freq="D")),
        ("forecaster", forecaster),
    ]
)

pipe_return = pipe.fit(y_train, X_train)
y_pred1 = pipe_return.predict(fh=1, X=X_test)

y_pred1
[30]:
y
instances timepoints
1 2016-04-14 873.6690
2 2016-04-14 1979.3916

Building your own hierarchical forecaster

Getting started:

Extension template = python “fill-in” template with to-do blocks that allow you to implement your own, aeon-compatible forecasting algorithm.

Check estimators using check_estimator

For hierarchical forecasting:

  • ensure to pick supported mtypes for panel and hierarchical data

  • recommended: y_inner_type = ["pd.DataFrame", "pd-multiindex", "pd_multiindex_hier"], same for X_inner_type

    • this ensures the inputs y, X seen in _fit, _predict are pd.DataFrame, with 1, 2, 3 or more row levels

  • you can implement vectorization over rows if efficient implementation is available

    • but: automated vectorization already loops over row index sets, don’t implement that if that’s what “hierarchical” is

    • to ensure automated vectorization, do not include Hierarchical or Panel types in y_inner_type, X_inner_type

  • think carefully whether your estimator is a forecaster, or can be decomposed in a transformer

    • “do X and then apply forecaster already in aeon” is a strong hint that you actually want to implement a transformer


Generated using nbsphinx. The Jupyter notebook can be found here.