binder

BaseSeriesEstimator

The BaseSeriesEstimator class is a base class for estimators that take a single series (both univariate and multivariate) as input rather than a collection of time series (see BaseCollectionEstimator). This notebook describes the major design issues to keep in mind if using any class that inherits from BaseSeriesEstimator. These are: - BaseSeriesTransformer for single series transformations - BaseSegmenter for segmentation - BaseAnomalyDetector for anomaly detection

To use any algorithms extending the base estimator all you need to understand is the meaning of the axis parameter and the capability tags. BaseSeriesEstimator handles the preprocessing required before being used in methods such as fit and predict. These are used in inheriting base classes by applying the protected method _preprocess_series. The key steps to note are: 1. Input data type should be a np.ndarray, a pd.Series or a pd.DataFrame. 2. Unless the X_inner_type of the estimator is pd.Series, the axis variable of the estimator controls how the input data is interpreted in methods such as fit, predict and transform. If axis==0 then each column is a time series, and each row is a time point: i.e. the shape of the input data is (n_timepoints, n_channels). If axis==1 indicates the time series are in rows, i.e. the shape of the data is (n_channels, n_timepoints). It is important to set this correctly or check the default used, otherwise your data may be processed incorrectly. 3. The input data will be transformed into the type required by the estimator as determined by the tag X_inner_type. This will also reshape the array to use the correct time point axis and expand the input to 2D if it is a 1D np.ndarray. 4. If the estimator can only work with univariate time series (capability:multivariate set to False) then the input data shape must be 1D or 2D with the selected channel axis being size 1. 5. If the estimator can only work with multivariate time series (capability:univariate set to False), then the input data must be 2D, with the selected channel axis greater than 1. pd.Series is not supported in this case.

We demonstrate this with calls to private methods. This is purely to aid understanding and should not be used in practice.

[1]:
import numpy as np
import pandas as pd

from aeon.base import BaseSeriesEstimator

# We use the abstract base class for example purposes, regular classes will not
# have a class axis parameter.
bs = BaseSeriesEstimator(axis=0)

Univariate examples

[2]:
# By default, "capability:multivariate" is False, axis is 0 and X_inner_type is
# np.ndarray
# With this config, the output should always be an np.ndarray shape (100, 1)
d1 = np.random.random(size=(100))
d2 = bs._preprocess_series(d1, axis=0, store_metadata=True)
print(
    f"1. Input type = {type(d1)}, input shape = {d1.shape}, "
    f"output type = {type(d2)}, output shape = {d2.shape}"
)
1. Input type = <class 'numpy.ndarray'>, input shape = (100,), output type = <class 'numpy.ndarray'>, output shape = (100, 1)
[3]:
# The axis parameter will not change the output shape of 1D inputs such as pd.Series
# or univariate np.ndarray
d1 = np.random.random(size=(100))
d2 = bs._preprocess_series(d1, axis=1, store_metadata=True)
print(
    f"2. Input type = {type(d1)}, input shape = {d1.shape}, "
    f"output type = {type(d2)}, output shape = {d2.shape}"
)
2. Input type = <class 'numpy.ndarray'>, input shape = (100,), output type = <class 'numpy.ndarray'>, output shape = (100, 1)
[4]:
# A 2D array with the channel axis of size 1 will produce the same result
d1 = np.random.random(size=(100, 1))
d2 = bs._preprocess_series(d1, axis=0, store_metadata=True)
print(
    f"3. Input type = {type(d1)}, input shape = {d1.shape}, "
    f"output type = {type(d2)}, output shape = {d2.shape}"
)
3. Input type = <class 'numpy.ndarray'>, input shape = (100, 1), output type = <class 'numpy.ndarray'>, output shape = (100, 1)
[5]:
# The shape used can be swapped, but the axis parameter must be set correctly
d1 = np.random.random(size=(1, 100))
d2 = bs._preprocess_series(d1, axis=1, store_metadata=True)
print(
    f"4. Input type = {type(d1)}, input shape = {d1.shape}, "
    f"output type = {type(d2)}, output shape = {d2.shape}"
)
4. Input type = <class 'numpy.ndarray'>, input shape = (1, 100), output type = <class 'numpy.ndarray'>, output shape = (100, 1)
[6]:
# Other types will be converted to X_inner_type
d1 = pd.Series(np.random.random(size=(100)))
d2 = bs._preprocess_series(d1, axis=0, store_metadata=True)
print(
    f"5. Input type = {type(d1)}, input shape = {d1.shape}, "
    f"output type = {type(d2)}, output shape = {d2.shape}"
)
5. Input type = <class 'pandas.core.series.Series'>, input shape = (100,), output type = <class 'numpy.ndarray'>, output shape = (100, 1)
[7]:
d1 = pd.DataFrame(np.random.random(size=(100, 1)))
d2 = bs._preprocess_series(d1, axis=0, store_metadata=True)
print(
    f"6. Input type = {type(d1)}, input shape = {d1.shape}, "
    f"output type = {type(d2)}, output shape = {d2.shape}"
)
6. Input type = <class 'pandas.core.frame.DataFrame'>, input shape = (100, 1), output type = <class 'numpy.ndarray'>, output shape = (100, 1)
[8]:
bs = bs.set_tags(**{"X_inner_type": "pd.Series"})
d1 = np.random.random(size=(100))
d2 = bs._preprocess_series(d1, axis=0, store_metadata=True)
print(
    f"7. Input type = {type(d1)}, input shape = {d1.shape}, "
    f"output type = {type(d2)}, output shape = {d2.shape}"
)
bs = bs.set_tags(**{"X_inner_type": "np.ndarray"})
7. Input type = <class 'numpy.ndarray'>, input shape = (100,), output type = <class 'pandas.core.series.Series'>, output shape = (100,)
[9]:
# Passing a multivariate array will raise an error if capability:multivariate is False
d1 = np.random.random(size=(100, 5))
try:
    bs._preprocess_series(d1, axis=0, store_metadata=True)
except ValueError as e:
    print(f"8. {e}")
8. Multivariate data not supported by BaseSeriesEstimator

Multivariate examples

[10]:
# The capability:multivariate tag must be set to True to work with multivariate series
# If the estimator does not have this tag, then the implementation cannot handle the
# input
bs = bs.set_tags(**{"capability:multivariate": True})
# Both of these can be True at the same time, but for examples sake we disable
# univariate
bs = bs.set_tags(**{"capability:univariate": False})
[11]:
# axis 0 means each row is a time series
d1 = np.random.random(size=(100, 5))
d2 = bs._preprocess_series(d1, axis=0, store_metadata=True)
print(
    f"1. Input type = {type(d1)}, input shape = {d1.shape}, "
    f"output type = {type(d2)}, output shape = {d2.shape}"
)
n_channels = bs.metadata_["n_channels"]
print(f"n_channels: {n_channels}")
1. Input type = <class 'numpy.ndarray'>, input shape = (100, 5), output type = <class 'numpy.ndarray'>, output shape = (100, 5)
n_channels: 5
[12]:
# axis 1 means each column is a time series. If the axis is set incorrectly, the
# output shape will be wrong
d1 = np.random.random(size=(100, 5))
d2 = bs._preprocess_series(d1, axis=1, store_metadata=True)
print(
    f"2. Input type = {type(d1)}, input shape = {d1.shape}, "
    f"output type = {type(d2)}, output shape = {d2.shape}"
)
n_channels = bs.metadata_["n_channels"]
print(f"n_channels: {n_channels}")
2. Input type = <class 'numpy.ndarray'>, input shape = (100, 5), output type = <class 'numpy.ndarray'>, output shape = (5, 100)
n_channels: 100
[13]:
# Conversions work similar to univariate series, but there is more emphasis on correctly
# setting the axis parameter
d1 = pd.DataFrame(np.random.random(size=(100, 5)))
d2 = bs._preprocess_series(d1, axis=0, store_metadata=True)
print(
    f"3. Input type = {type(d1)}, input shape = {d1.shape}, "
    f"output type = {type(d2)}, output shape = {d2.shape}"
)
n_channels = bs.metadata_["n_channels"]
print(f"n_channels: {n_channels}")
3. Input type = <class 'pandas.core.frame.DataFrame'>, input shape = (100, 5), output type = <class 'numpy.ndarray'>, output shape = (100, 5)
n_channels: 5
[14]:
# As pd.Series is univariate only, it is not allowed as an inner type for multivariate
# This should not be an issue for usage, just for development
bs = bs.set_tags(**{"X_inner_type": "pd.Series"})
d1 = np.random.random(size=(100, 5))
try:
    d2 = bs._preprocess_series(d1, axis=1, store_metadata=True)
except ValueError as e:
    print(f"4. {e}")
bs = bs.set_tags(**{"X_inner_type": "np.ndarray"})
4. Cannot convert to pd.Series for multivariate capable estimators
[15]:
# Passing a univariate array will raise an error if capability:univariate is False
d1 = pd.Series(np.random.random(size=(100,)))
try:
    d2 = bs._preprocess_series(d1, axis=0, store_metadata=True)
except ValueError as e:
    print(f"5. {e}")
5. Univariate data not supported by BaseSeriesEstimator

If implementing a new estimator that extends BaseSeriesEstimator then just set the axis to the shape you want to work with by passing it to the BaseSeriesEstimator constructor. If your estimator can handle multivariate series, set the tag and set the capability:multivariate tag to True. Set the X_inner_type tag if you wish to use a datatype other than np.ndarray.


Generated using nbsphinx. The Jupyter notebook can be found here.