lcss_distance¶

lcss_distance(x: ndarray, y: ndarray, window: float | None = None, epsilon: float = 1.0, itakura_max_slope: float | None = None) → float[source]¶

Return the LCSS distance between x and y.

The LCSS distance for time series is based on the solution to the longest common subsequence problem in pattern matching [1]. The typical problem is to find the longest subsequence that is common to two discrete series based on the edit distance. This approach can be extended to consider real-valued time series by using a distance threshold epsilon, which defines the maximum difference between a pair of values that is allowed for them to be considered a match. LCSS finds the optimal alignment between two series by find the greatest number of matching pairs. The LCSS distance uses a matrix \(L\) that records the sequence of matches over valid warpings. For two series \(a = a_1,... a_n\) and \(b = b_1,... b_m, L'\) is found by iterating over all valid windows (i.e. where bounding_matrix is not infinity, which by default is the constant band \(|i-j|<w*m\), where \(w\) is the window parameter value and \(m\) is series length), then calculating

:: math..

if(|a_i - b_j| < espilon) \: & L_{i,j} = L_{i-1,j-1}+1 \
else\: &L_{i,j} = max(L_{i,j-1}, L_{i-1,j})\

The distance is an inverse function of the longest common subsequence length, \(L_{n,m}\).

:: math..: d_{LCSS}({bfx,by}) = 1- frac{L_{n,m}.

Note that series a and b need not be equal length.

LCSS attempts to find the longest common sequence between two time series and returns a value that is the percentage that longest common sequence assumes. Originally present in [1], LCSS is computed by matching indexes that are similar up until a defined threshold (epsilon).

The value returned will be between 0.0 and 1.0, where 0.0 means the two time series are exactly the same and 1.0 means they are complete opposites.

Parameters:

xnp.ndarray: First time series, either univariate, shape (n_timepoints,), or multivariate, shape (n_channels, n_timepoints).
ynp.ndarray: Second time series, either univariate, shape (n_timepoints,), or multivariate, shape (n_channels, n_timepoints).
windowfloat, default=None: The window to use for the bounding matrix. If None, no bounding matrix is used.
epsilonfloat, default=1.: Matching threshold to determine if two subsequences are considered close enough to be considered ‘common’. The default is 1.
itakura_max_slopefloat, default=None: Maximum slope as a proportion of the number of time points used to create Itakura parallelogram on the bounding matrix. Must be between 0. and 1.

Returns:

float: The LCSS distance between x and y.

Raises:

ValueError: If x and y are not 1D or 2D arrays.

References

[1] (1,2)

M. Vlachos, D. Gunopoulos, and G. Kollios. 2002. “Discovering Similar Multidimensional Trajectories”, In Proceedings of the 18th International Conference on Data Engineering (ICDE ‘02).

Examples

>>> import numpy as np
>>> from aeon.distances import lcss_distance
>>> x = np.array([[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]])
>>> y = np.array([[11, 12, 13, 14, 15, 16, 17, 18, 19, 20]])
>>> dist = lcss_distance(x, y)