sbd_distance¶
- sbd_distance(x: ndarray, y: ndarray, standardize: bool = True) float [source]¶
Compute the shape-based distance (SBD) between two time series.
Shape-based distance (SBD) [1] is a normalized version of cross-correlation (CC) that is shifting and scaling (if standardization is used) invariant.
For two series, possibly of unequal length, \(\mathbf{x}=\{x_1,x_2,\ldots, x_n\}\) and \(\mathbf{y}=\{y_1,y_2, \ldots,y_m\}\), SBD works by (optionally) first standardizing both time series using the z-score (\(x' = \frac{x - \mu}{\sigma}\)), then computing the cross-correlation between x and y (\(CC(\mathbf{x}, \mathbf{y})\)), then deviding it by the geometric mean of both autocorrelations of the individual sequences to normalize it to \([-1, 1]\) (coefficient normalization), and finally detecting the position with the maximum normalized cross-correlation:
\[SBD(\mathbf{x}, \mathbf{y}) = 1 - max_w\left( \frac{ CC_w(\mathbf{x}, \mathbf{y}) }{ \sqrt{ (\mathbf{x} \cdot \mathbf{x}) * (\mathbf{y} \cdot \mathbf{y}) } }\right)\]This distance measure has values between 0 and 2; 0 is perfect similarity.
The computation of the cross-correlation \(CC(\mathbf{x}, \mathbf{y})\) for all values of w requires \(O(m^2)\) time, where m is the maximum time-series length. We can however use the convolution theorem to our advantage, and use the fast (inverse) fourier transform (FFT) to perform the computation of \(CC(\mathbf{x}, \mathbf{y})\) in \(O(m \cdot log(m))\):
\[CC(x, y) = \mathcal{F}^{-1}\{ \mathcal{F}(\mathbf{x}) * \mathcal{F}(\mathbf{y}) \}\]For multivariate time series, SBD is computed independently for each channel and then averaged. Both time series must have the same number of channels!
- Parameters:
- xnp.ndarray
First time series, either univariate, shape
(n_timepoints,)
, or multivariate, shape(n_channels, n_timepoints)
.- ynp.ndarray
Second time series, either univariate, shape
(n_timepoints,)
, or multivariate, shape(n_channels, n_timepoints)
.- standardizebool, default=True
Apply z-score to both input time series for standardization before computing the distance. This makes SBD scaling invariant. Default is True.
- Returns:
- float
SBD distance between x and y.
- Raises:
- ValueError
If x and y are not 1D or 2D arrays.
See also
sbd_pairwise_distance
Compute the shape-based distance (SBD) between all pairs of time series.
References
[1]Paparrizos, John, and Luis Gravano: Fast and Accurate Time-Series Clustering. ACM Transactions on Database Systems 42, no. 2 (2017): 8:1-8:49. https://doi.org/10.1145/3044711.
Examples
>>> import numpy as np >>> from aeon.distances import sbd_distance >>> x = np.array([[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]]) >>> y = np.array([[11, 12, 13, 14, 15, 16, 17, 18, 19, 20]]) >>> dist = sbd_distance(x, y)