lcss_distance¶
- lcss_distance(x: ndarray, y: ndarray, window: float | None = None, epsilon: float = 1.0, itakura_max_slope: float | None = None) float [source]¶
Return the LCSS distance between x and y.
The LCSS distance for time series is based on the solution to the longest common subsequence problem in pattern matching [1]. The typical problem is to find the longest subsequence that is common to two discrete series based on the edit distance. This approach can be extended to consider real-valued time series by using a distance threshold epsilon, which defines the maximum difference between a pair of values that is allowed for them to be considered a match. LCSS finds the optimal alignment between two series by find the greatest number of matching pairs. The LCSS distance uses a matrix \(L\) that records the sequence of matches over valid warpings. For two series \(a = a_1,... a_n\) and \(b = b_1,... b_m, L'\) is found by iterating over all valid windows (i.e. where bounding_matrix is not infinity, which by default is the constant band \(|i-j|<w*m\), where \(w\) is the window parameter value and \(m\) is series length), then calculating
- :: math..
- if(|a_i - b_j| < espilon) \
& L_{i,j} = L_{i-1,j-1}+1 \
- else\
&L_{i,j} = max(L_{i,j-1}, L_{i-1,j})\
The distance is an inverse function of the longest common subsequence length, \(L_{n,m}\).
- :: math..
d_{LCSS}({bfx,by}) = 1- frac{L_{n,m}.
Note that series a and b need not be equal length.
LCSS attempts to find the longest common sequence between two time series and returns a value that is the percentage that longest common sequence assumes. Originally present in [1], LCSS is computed by matching indexes that are similar up until a defined threshold (epsilon).
The value returned will be between 0.0 and 1.0, where 0.0 means the two time series are exactly the same and 1.0 means they are complete opposites.
- Parameters:
- xnp.ndarray
First time series, either univariate, shape
(n_timepoints,)
, or multivariate, shape(n_channels, n_timepoints)
.- ynp.ndarray
Second time series, either univariate, shape
(n_timepoints,)
, or multivariate, shape(n_channels, n_timepoints)
.- windowfloat, default=None
The window to use for the bounding matrix. If None, no bounding matrix is used.
- epsilonfloat, default=1.
Matching threshold to determine if two subsequences are considered close enough to be considered ‘common’. The default is 1.
- itakura_max_slopefloat, default=None
Maximum slope as a proportion of the number of time points used to create Itakura parallelogram on the bounding matrix. Must be between 0. and 1.
- Returns:
- float
The LCSS distance between x and y.
- Raises:
- ValueError
If x and y are not 1D or 2D arrays.
References
Examples
>>> import numpy as np >>> from aeon.distances import lcss_distance >>> x = np.array([[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]]) >>> y = np.array([[11, 12, 13, 14, 15, 16, 17, 18, 19, 20]]) >>> dist = lcss_distance(x, y)