binder

Time Series Similarity search with aeon

The goal of Time Series Similarity search is to asses the similarities between a time series, denoted as a query q of length l, and a collection of time series, denoted as X, which lengths are superior or equal to l. In this context, the notion of similiarity between q and the other series in X is quantified by similarity functions. Those functions are most of the time defined as distance function, such as the Euclidean distance. Knowing the similarity between q and other admissible candidates, we can then perform many other tasks for “free”, such as anomaly or motif detection.

time series similarity search

[18]:
def plot_best_matches(top_k_search, best_matches):
    fig, ax = plt.subplots(figsize=(20, 5), ncols=3)
    for i_k, (id_sample, id_timestamp) in enumerate(best_matches):
        # plot the sample of the best match
        ax[i_k].plot(top_k_search.X_[id_sample, 0], linewidth=2)
        # plot the location of the best match on it
        ax[i_k].plot(
            range(id_timestamp, id_timestamp + q.shape[1]),
            top_k_search.X_[id_sample, 0, id_timestamp : id_timestamp + q.shape[1]],
            linewidth=7,
            alpha=0.5,
            color="green",
            label="best match location",
        )
        # plot the query on the location of the best match
        ax[i_k].plot(
            range(id_timestamp, id_timestamp + q.shape[1]),
            q[0],
            linewidth=5,
            alpha=0.5,
            color="red",
            label="query",
        )
        ax[i_k].set_title(f"best match {i_k}")
        ax[i_k].legend()
    plt.show()


def plot_matrix_profile(X, mp, i_k):
    fig, ax = plt.subplots(figsize=(20, 10), nrows=2)
    ax[0].set_title("series X used to build the matrix profile")
    ax[0].plot(X[0])  # plot first channel only
    # This is necessary as mp is a list of arrays due to unequal length support
    # as it can have different number of matches for each step when
    # using threshold-based search.
    ax[1].plot([mp[i][i_k] for i in range(len(mp))])
    ax[1].set_title(f"Top {i_k+1} matrix profile of X")
    ax[1].set_ylabel(f"Dist to top {i_k+1} match")
    ax[1].set_xlabel("Starting index of the query in X")
    plt.show()

Similarity search Notebooks

This notebook gives an overview of similarity search module and the available estimators. The following notebooks are avaiable to go more in depth with specific subject of similarity search in aeon:

Expected inputs and format

For both QuerySearch and SeriesSearch, the fit method expects a time series dataset of shape (n_cases, n_channels, n_timepoints). This can be 3D numpy array or a list of 2D numpy arrays if n_timepoints varies between cases (i.e. unequal length dataset).

The predict method expects a 2D numpy array of shape (n_channels, query_length) for QuerySearch. In SeriesSearch, the predict methods also expects a 2D numpy array, but of shape (n_channels, n_timepoints) (n_timepoints doesn’t have to be the same as in fit) and a query_length parameter.

Available estimators

All estimators of the similarity search module in aeon inherit from the BaseSimilaritySearch class, which requires the following arguments: - distance : a string indicating which distance function to use as similarity function. By default this is "euclidean", which means that the Euclidean distance is used. - normalize : a boolean indicating whether this similarity function should be z-normalized. This means that the scale of the two series being compared will be ignored, and that, loosely speaking, we will only focus on their shape during the comparison. By default, this parameter is set False.

Another parameter, which has no effect on the output of the estimators, is a boolean named store_distance_profile, set to False by default. If set to True, the estimators will expose an attribute named _distance_profile after the predict function is called. This attribute will contain the computed distance profile for query given as input to the predict function.

To illustrate how to work with similarity search estimators in aeon, we will now present some example use cases.

Using other distance functions

You can use any distance from the aeon distance module in the similarity search module. You can obtain a list of all available distances by using the following:

[9]:
from aeon.distances import get_distance_function_names

get_distance_function_names()
[9]:
['adtw',
 'ddtw',
 'dtw',
 'edr',
 'erp',
 'euclidean',
 'lcss',
 'manhattan',
 'minkowski',
 'msm',
 'sbd',
 'shape_dtw',
 'squared',
 'twe',
 'wddtw',
 'wdtw']

You can also specify keyword arguments using the distance_argsparameter of any estimator inheriting from BaseSimilaritySearch, such as the QuerySearch estimator. For example using the w parameter for dtw, which specify the warping size:

[10]:
top_k_search = QuerySearch(k=3, distance="dtw", distance_args={"w": 0.2})

q = X[3, :, 20:55]
mask = np.ones(X.shape[0], dtype=bool)
mask[3] = False
# Use this mask to exluce the sample from which we extracted the query
X_train = X[mask]
# Call fit to store X_train as the database to search in
top_k_search.fit(X_train)
distances_to_matches, best_matches = top_k_search.predict(q)
print(best_matches)
[[195  26]
 [128  17]
 [155  20]]
[11]:
plot_best_matches(top_k_search, best_matches)
../../_images/examples_similarity_search_similarity_search_25_0.png

Using you own distance functions

You can also create you own distance funcion and pass it in the distance parameter :

[12]:
def my_dist(x, y):
    return np.sum(np.abs(x - y))


top_k_search = QuerySearch(k=3, distance=my_dist)

q = X[3, :, 20:55]
mask = np.ones(X.shape[0], dtype=bool)
mask[3] = False
# Use this mask to exluce the sample from which we extracted the query
X_train = X[mask]
# Call fit to store X_train as the database to search in
top_k_search.fit(X_train)
distances_to_matches, best_matches = top_k_search.predict(q)
print(best_matches)
[[176  25]
 [ 64  23]
 [195  26]]
[13]:
plot_best_matches(top_k_search, best_matches)
../../_images/examples_similarity_search_similarity_search_29_0.png

Internally, we will try to compile this function using numba. Errors might appear if your function use operations unsported by numba (see numba docs). You could also disable numba using numba.config.DISTABLE_JIT = True, but the code will run a lot slower.


Generated using nbsphinx. The Jupyter notebook can be found here.