dtw_distance(x: ndarray, y: ndarray, window: float | None = None, itakura_max_slope: float | None = None) float[source]

Compute the DTW distance between two time series.

DTW is the most widely researched and used elastic distance measure. It mitigates distortions in the time axis by realligning (warping) the series to best match each other. A good background into DTW can be found in [1]. For two series, possibly of unequal length, \(\mathbf{x}=\{x_1,x_2,\ldots,x_n\}\) and \(\mathbf{y}=\{y_1,y_2, \ldots,y_m\}\) DTW first calculates \(M(\mathbf{x},\mathbf{y})\), the \(n \times m\) pointwise distance matrix between series \(\mathbf{x}\) and \(\mathbf{y}\), where \(M_{i,j}= (x_i-y_j)^2\).

A warping path

\[P = <(e_1, f_1), (e_2, f_2), \ldots, (e_s, f_s)>\]

is a set of pairs of indices that define a traversal of matrix \(M\). A valid warping path must start at location \((1,1)\) and end at point \(( n,m)\) and not backtrack, i.e. \(0 \leq e_{i+1}-e_{i} \leq 1\) and \(0 \leq f_{i+1}- f_i \leq 1\) for all \(1< i < m\).

The DTW distance between series is the path through \(M\) that minimizes the total distance. The distance for any path \(P\) of length \(s\) is

\[D_P(\mathbf{x},\mathbf{y}, M) =\sum_{i=1}^s M_{e_i,f_i}\]

If \(\mathcal{P}\) is the space of all possible paths, the DTW path \(P^*\) is the path that has the minimum distance, hence the DTW distance between series is

\[d_{dtw}(\mathbf{x}, \mathbf{x}) =D_{P*}(\mathbf{x},\mathbf{x}, M).\]

The optimal warping path \(P^*\) can be found exactly through a dynamic programming formulation. This can be a time consuming operation, and it is common to put a restriction on the amount of warping allowed. This is implemented through the bounding_matrix structure, that supplies a mask for allowable warpings. The most common bounding strategies include the Sakoe-Chiba band [2]. The width of the allowed warping is controlled through the window parameter which sets the maximum proportion of warping allowed.


First time series, either univariate, shape (n_timepoints,), or multivariate, shape (n_channels, n_timepoints).


Second time series, either univariate, shape (n_timepoints,), or multivariate, shape (n_channels, n_timepoints).

windowfloat or None, default=None

The window to use for the bounding matrix. If None, no bounding matrix is used. window is a percentage deviation, so if window = 0.1 then 10% of the series length is the max warping allowed. is used.

itakura_max_slopefloat, default=None

Maximum slope as a proportion of the number of time points used to create Itakura parallelogram on the bounding matrix. Must be between 0. and 1.


DTW distance between x and y, minimum value 0.


If x and y are not 1D or 2D arrays.



Ratanamahatana C and Keogh E.: Three myths about dynamic time warping data

mining, Proceedings of 5th SIAM International Conference on Data Mining, 2005.


Sakoe H. and Chiba S.: Dynamic programming algorithm optimization for

spoken word recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing 26(1):43–49, 1978.


>>> import numpy as np
>>> from aeon.distances import dtw_distance
>>> x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
>>> y = np.array([11, 12, 13, 14, 15, 16, 17, 18, 19, 20])
>>> dtw_distance(x, y) # 1D series
>>> x = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10], [0, 1, 0, 2, 0]])
>>> y = np.array([[11, 12, 13, 14],[7, 8, 9, 20],[1, 3, 4, 5]] )
>>> dtw_distance(x, y) # 2D series with 3 channels, unequal length