BaseSeriesEstimator#

The BaseSeriesEstimatorclass is a base class for estimators that take a single series as input rather than a collection of time series (see BaseCollectionEstimator). This notebook describes the major design issues to bare in mind if using any class that inherits from BaseSeriesEstimator. To use any base estimator all you need to understand is the meaning of axis and the capability tags.

BaseSeriesEstimator handles the preprocessing required for a single before being used in a method such as fit. These are used in base classes by applying the protected method preprocess_series. The key steps to note are: 1. Input data type should be a np.ndarray, a pd.Series or a pd.DataFrame. 2. The input data will be transformed into the type required by the estimator as determined by the tag X_inner_type. 3. If the estimator can only work with univariate time series (capability:multivariate set to False) then the input data will be converted to a 1D numpy array or a pandas Series. 4. If the estimator has the ability to handle multivariate time series as determined by the tag capability:multivariate, then the input data will stored in either a 2 D numpy array or a pandas DataFrames. 5. If the data is multivariate, then the axis varaible of the estimator controls how it is interpreted. If axis==0 then each column is a time series, and each row is a time point: i.e. the shape of the data is (n_timepoints,n_channels) . If axis == 1 indicates the time series are in rows, i.e. the shape of the data is (n_channels, n_timepoints).

We demonstrate this with calls to private methods. This is purely to aide understanding and should not be used in practice.

[28]:

# Univariate examples
import numpy as np
import pandas as pd
import pytest

from aeon.base import BaseSeriesEstimator

bs = BaseSeriesEstimator()
# By default, "capability:multivariate" is False, axis is 0 and
# X_inner_type is np.ndarray
d1 = np.random.random(size=(100))
# With this config, the output should always be an np.ndarray
# shape (100,)
d2 = bs._preprocess_series(d1, axis=0)
print(
    "1. Input shape = ",
    d1.shape,
    " output type = ",
    type(d2),
    " output shape = ",
    d2.shape,
)
# 2D numpy shape (m,1) or (1,m) get converted to 1D numpy array
# if multivariate is False
d1 = np.random.random(size=(1, 100))
d2 = bs._preprocess_series(d1, axis=0)
print(
    "2. Input shape = ",
    d1.shape,
    " output type = ",
    type(d2),
    " output shape = ",
    d2.shape,
)
d1 = pd.Series(np.random.random(size=(100)))
d2 = bs._preprocess_series(d1, axis=0)
print(
    "3. Input shape = ",
    d1.shape,
    " output type = ",
    type(d2),
    " output shape = ",
    d2.shape,
)
# Axis is irrelevant for univariate data
d2 = bs._preprocess_series(d1, axis=1)
print(
    "4. Input shape = ",
    d1.shape,
    " output type = ",
    type(d2),
    " output shape = ",
    d2.shape,
)
d1 = pd.DataFrame(np.random.random(size=(100, 1)))
d2 = bs._preprocess_series(d1, axis=0)
print(
    "5. Input shape = ",
    d1.shape,
    " output type = ",
    type(d2),
    " output shape = ",
    d2.shape,
)

# Passing a multivariate array will raise an error
with pytest.raises(ValueError, match=r"Multivariate data not supported"):
    bs._check_X(np.random.random(size=(4, 100)))

1. Input shape =  (100,)  output type =  <class 'numpy.ndarray'>  output shape =  (100,)
2. Input shape =  (1, 100)  output type =  <class 'numpy.ndarray'>  output shape =  (100,)
3. Input shape =  (100,)  output type =  <class 'numpy.ndarray'>  output shape =  (100,)
4. Input shape =  (100,)  output type =  <class 'numpy.ndarray'>  output shape =  (100,)
5. Input shape =  (100, 1)  output type =  <class 'numpy.ndarray'>  output shape =  (100,)

[25]:

# Multivariate examples
# Set tags
bs.set_tags(**{"capability:multivariate": True})
d1 = np.random.random(size=(4, 100))
# Axis 0 means each row is a time series
d2 = bs._preprocess_series(d1, axis=0)
print(
    "1. Input shape = ",
    d1.shape,
    " output type = ",
    type(d2),
    " output shape = ",
    d2.shape,
)
# Axis 1 means each column is a time series
d2 = bs._preprocess_series(d1, axis=1)
print(
    "2. Input shape = ",
    d1.shape,
    " output type = ",
    type(d2),
    " output shape = ",
    d2.shape,
)
d1 = pd.DataFrame(d1)
d2 = bs._preprocess_series(d1, axis=1)
print(
    "2. Input type =",
    type(d1),
    "Input shape = ",
    d1.shape,
    " output type = ",
    type(d2),
    "output shape = ",
    d2.shape,
)

1. Input shape =  (4, 100)  output type =  <class 'numpy.ndarray'>  output shape =  (4, 100)
2. Input shape =  (4, 100)  output type =  <class 'numpy.ndarray'>  output shape =  (100, 4)
2. Input type = <class 'pandas.core.frame.DataFrame'> Input shape =  (4, 100)  output type =  <class 'numpy.ndarray'> output shape =  (100, 4)

If implementing a new estimator that extends BaseSeriesEstimator then just set the axis to the shape you want to work with by passing it to the BaseSeriesEstimator constructor. If your estimator can handle multivariate series, set the tag and set the capability:multivariate tag to True. The data will always then be passed to your estimator in (n_channels, n_timepoints) if axis is 1, or (n_timepoints, n_channels) if axis is 0, either in numpy arrays or pandas DataFrame, dependning on X_inner_type tag. If a univariate series is passed it will be passed in (1, n_timepoints) if axis is 0, or (n_timepoints, 1) if the estimator axis is 0.

Generated using nbsphinx. The Jupyter notebook can be found here.