msm_distance¶
- msm_distance(x: ndarray, y: ndarray, window: float | None = None, independent: bool = True, c: float = 1.0, itakura_max_slope: float | None = None) float [source]¶
Compute the MSM distance between two time series.
Move-Split-Merge (MSM) [1] is a distance measure that is conceptually similar to other edit distance-based approaches, where similarity is calculated by using a set of operations to transform one series into another. Each operation has an associated cost, and three operations are defined for MSM: move, split, and merge. Move is called match in other distance function terminology and split and merge are equivalent to insert and delete.
For two series, possibly of unequal length, \(\mathbf{x}=\{x_1,x_2,\ldots, x_n\}\) and \(\mathbf{y}=\{y_1,y_2, \ldots,y_m\}\) MSM works by iterating over series lengths \(i = 1 \ldots n\) and \(j = 1 \ldote m\) to find the cost matrix $D$ as follows.
\[\begin{split}move &= D_{i-1,j-1}+d({x_{i},y_{j}}) \\ split &= D_{i-1,j}+cost(y_j,y_{j-1},x_i,c) \\ merge &= D_{i,j-1}+cost(x_i,x_{i-1},y_j,c) \\ D_{i,j} &= min(move, split, merge)\end{split}\]Where \(D_{0,j}\) and \(D_{i,0}\) are initialised to a constant value, and $c$ is a parameter that represents the cost of moving off the diagonal. The pointwise distance function $d$ is the absolute difference rather than the squared distance.
$cost$ is the cost function that calculates the cost of inserting and deleting values. Crucially, the cost depends on the current and adjacent values, rather than treating all insertions and deletions equally (for example, as in ERP).
\[\begin{split}cost(x,y,z,c) &= c & if\;\; & y \leq x \leq z \\ &= c & if\;\; & y \geq x \geq z \\ &= c+min(|x-y|,|x-z|) & & otherwise \\\end{split}\]If \(\mathbf{x}\) and \(\mathbf{y$}\) are multivariate, then there are two ways of calculating the MSM distance. The independent approach is to find the distance for each channel independently, then return the sum. The dependent approach adopts the adaptation described in [2] for computing the pointwise MSM distance over channels. MSM satisfies triangular inequality and is a metric.
- Parameters:
- xnp.ndarray
First time series, either univariate, shape
(n_timepoints,)
, or multivariate, shape(n_channels, n_timepoints)
.- ynp.ndarray
Second time series, either univariate, shape
(n_timepoints,)
, or multivariate, shape(n_channels, n_timepoints)
.- windowfloat, default=None
The window to use for the bounding matrix. If None, no bounding matrix is used.
- independentbool, default=True
Whether to use the independent or dependent MSM distance. The default is True (to use independent).
- cfloat, default=1.
Cost for split or merge operation. Default is 1.
- itakura_max_slopefloat, default=None
Maximum slope as a proportion of the number of time points used to create Itakura parallelogram on the bounding matrix. Must be between 0. and 1.
- Returns:
- float
MSM distance between x and y.
- Raises:
- ValueError
If x and y are not 1D or 2D arrays.
References
[1]Stefan A., Athitsos V., Das G.: The Move-Split-Merge metric for time
series. IEEE Transactions on Knowledge and Data Engineering 25(6), 2013.
[2]Shifaz, C. Pelletier, F. Petitjean, G. Webb: Elastic similarity and
distance measures for multivariate time series. Knowl. Inf. Syst. 65(6), 2023.
Examples
>>> import numpy as np >>> from aeon.distances import msm_distance >>> x = np.array([[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]]) >>> y = np.array([[11, 12, 13, 14, 15, 16, 17, 18, 19, 20]]) >>> dist = msm_distance(x, y)