erp_distance

erp_distance(x: ndarray, y: ndarray, window: float | None = None, g: float = 0.0, g_arr: ndarray | None = None, itakura_max_slope: float | None = None) float[source]

Compute the ERP distance between two time series.

Edit Distance with Real Penalty, ERP, first proposed in [1], attempts to align time series by better considering how indexes are carried forward through the cost matrix. Usually in the dtw cost matrix, if an alignment cannot be found the previous value is carried forward in the move off the diagonal. ERP instead proposes the idea of gaps or sequences of points that have no matches. These gaps are then penalised based on their distance from the parameter \(g\).

\[\begin{split}match &= D_{i-1,j-1}+ d({x_{i},y_{j}})\\ delete &= D_{i-1,j}+ d({x_{i},g})\\ insert &= D_{i,j-1}+ d({g,y_{j}})\\ D_{i,j} &= min(match,insert, delete)\end{split}\]

Where \(D_{0,j}\) and \(D_{i,0}\) are initialised to the sum of distances to $g$ for each series.

The value of \(g\) is by default 0 in aeon, but in [1] it is data dependent , selected from the range \([\sigma/5, \sigma]\), where \(\sigma\) is the average standard deviation of the training time series. When a series is multivariate (more than one channel), \(g\) is an array where the \(j^{th}\) value is the standard deviation of the \(j^{th}\) channel.

Parameters:
xnp.ndarray

First time series, either univariate, shape (n_timepoints,), or multivariate, shape (n_channels, n_timepoints).

ynp.ndarray

Second time series, either univariate, shape (n_timepoints,), or multivariate, shape (n_channels, n_timepoints).

windowfloat, default=None

The window to use for the bounding matrix. If None, no bounding matrix is used.

gfloat, default=0.0

The reference constant used to penalise moves off the diagonal. The default is 0.

g_arrnp.ndarray, default=None

Array of shape (n_channels), Numpy array with a separate g value for each channel. Must be the length of the number of channels in x and y.

itakura_max_slopefloat, default=None

Maximum slope as a proportion of the number of time points used to create Itakura parallelogram on the bounding matrix. Must be between 0. and 1.

Returns:
float

ERP distance between x and y.

Raises:
ValueError

If x and y are not 1D or 2D arrays.

References

[1] (1,2)

Lei Chen and Raymond Ng. 2004. On the marriage of Lp-norms and edit distance.

In Proceedings of the Thirtieth international conference on Very large data bases
  • Volume 30 (VLDB ‘04). VLDB Endowment, 792–803.

Examples

>>> import numpy as np
>>> from aeon.distances import erp_distance
>>> x = np.array([[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]])
>>> y = np.array([[2, 2, 2, 2, 5, 6, 7, 8, 9, 10]])
>>> erp_distance(x, y)
4.0