sbd_distance

sbd_distance(x: ndarray, y: ndarray, standardize: bool = True) float[source]

Compute the shape-based distance (SBD) between two time series.

Shape-based distance (SBD) [1] is a normalized version of cross-correlation (CC) that is shifting and scaling (if standardization is used) invariant.

For two series, possibly of unequal length, \(\mathbf{x}=\{x_1,x_2,\ldots, x_n\}\) and \(\mathbf{y}=\{y_1,y_2, \ldots,y_m\}\), SBD works by (optionally) first standardizing both time series using the z-score (\(x' = \frac{x - \mu}{\sigma}\)), then computing the cross-correlation between x and y (\(CC(\mathbf{x}, \mathbf{y})\)), then deviding it by the geometric mean of both autocorrelations of the individual sequences to normalize it to \([-1, 1]\) (coefficient normalization), and finally detecting the position with the maximum normalized cross-correlation:

\[SBD(\mathbf{x}, \mathbf{y}) = 1 - max_w\left( \frac{ CC_w(\mathbf{x}, \mathbf{y}) }{ \sqrt{ (\mathbf{x} \cdot \mathbf{x}) * (\mathbf{y} \cdot \mathbf{y}) } }\right)\]

This distance measure has values between 0 and 2; 0 is perfect similarity.

The computation of the cross-correlation \(CC(\mathbf{x}, \mathbf{y})\) for all values of w requires \(O(m^2)\) time, where m is the maximum time-series length. We can however use the convolution theorem to our advantage, and use the fast (inverse) fourier transform (FFT) to perform the computation of \(CC(\mathbf{x}, \mathbf{y})\) in \(O(m \cdot log(m))\):

\[CC(x, y) = \mathcal{F}^{-1}\{ \mathcal{F}(\mathbf{x}) * \mathcal{F}(\mathbf{y}) \}\]

For multivariate time series, SBD is computed independently for each channel and then averaged. Both time series must have the same number of channels!

Parameters:
xnp.ndarray

First time series, either univariate, shape (n_timepoints,), or multivariate, shape (n_channels, n_timepoints).

ynp.ndarray

Second time series, either univariate, shape (n_timepoints,), or multivariate, shape (n_channels, n_timepoints).

standardizebool, default=True

Apply z-score to both input time series for standardization before computing the distance. This makes SBD scaling invariant. Default is True.

Returns:
float

SBD distance between x and y.

Raises:
ValueError

If x and y are not 1D or 2D arrays.

See also

sbd_pairwise_distance

Compute the shape-based distance (SBD) between all pairs of time series.

References

[1]

Paparrizos, John, and Luis Gravano: Fast and Accurate Time-Series Clustering. ACM Transactions on Database Systems 42, no. 2 (2017): 8:1-8:49. https://doi.org/10.1145/3044711.

Examples

>>> import numpy as np
>>> from aeon.distances import sbd_distance
>>> x = np.array([[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]])
>>> y = np.array([[11, 12, 13, 14, 15, 16, 17, 18, 19, 20]])
>>> dist = sbd_distance(x, y)