binder

Time Series Similarity Search with aeon

The goal of Time Series Similarity Search is to asses the similarities between a time series, denoted as a query q of length l, and a collection of time series, denoted as X, with lengths greater than or equal to l. In this context, the notion of similiarity between q and the other series in X is quantified by similarity functions. Those functions are most of the time defined as distance function, such as the Euclidean distance. Knowing the similarity between q and other admissible candidates, we can then perform many other tasks for “free”, such as anomaly or motif detection.

time series similarity search

[1]:
def plot_best_matches(top_k_search, best_matches):
    """Plot the top best matches of a query in a dataset."""
    fig, ax = plt.subplots(figsize=(20, 5), ncols=3)
    for i_k, (id_sample, id_timestamp) in enumerate(best_matches):
        # plot the sample of the best match
        ax[i_k].plot(top_k_search.X_[id_sample, 0], linewidth=2)
        # plot the location of the best match on it
        ax[i_k].plot(
            range(id_timestamp, id_timestamp + q.shape[1]),
            top_k_search.X_[id_sample, 0, id_timestamp : id_timestamp + q.shape[1]],
            linewidth=7,
            alpha=0.5,
            color="green",
            label="best match location",
        )
        # plot the query on the location of the best match
        ax[i_k].plot(
            range(id_timestamp, id_timestamp + q.shape[1]),
            q[0],
            linewidth=5,
            alpha=0.5,
            color="red",
            label="query",
        )
        ax[i_k].set_title(f"best match {i_k}")
        ax[i_k].legend()
    plt.show()


def plot_matrix_profile(X, mp, i_k):
    """Plot the matrix profile and corresponding time series."""
    fig, ax = plt.subplots(figsize=(20, 10), nrows=2)
    ax[0].set_title("series X used to build the matrix profile")
    ax[0].plot(X[0])  # plot first channel only
    # This is necessary as mp is a list of arrays due to unequal length support
    # as it can have different number of matches for each step when
    # using threshold-based search.
    ax[1].plot([mp[i][i_k] for i in range(len(mp))])
    ax[1].set_title(f"Top {i_k+1} matrix profile of X")
    ax[1].set_ylabel(f"Dist to top {i_k+1} match")
    ax[1].set_xlabel("Starting index of the query in X")
    plt.show()

Similarity search Notebooks

This notebook gives an overview of similarity search module and the available estimators. The following notebooks are avaiable to go more in depth with specific subject of similarity search in aeon:

Expected inputs and format

For both QuerySearch and SeriesSearch, the fit method expects a time series dataset of shape (n_cases, n_channels, n_timepoints). This can be 3D numpy array or a list of 2D numpy arrays if n_timepoints varies between cases (i.e. unequal length dataset).

The predict method expects a 2D numpy array of shape (n_channels, query_length) for QuerySearch. In SeriesSearch, the predict methods also expects a 2D numpy array, but of shape (n_channels, n_timepoints) (n_timepoints doesn’t have to be the same as in fit) and a query_length parameter.

Available estimators

All estimators of the similarity search module in aeon inherit from the BaseSimilaritySearch class, which requires the following arguments: - distance : a string indicating which distance function to use as similarity function. By default this is "euclidean", which means that the Euclidean distance is used. - normalise : a boolean indicating whether this similarity function should be z-normalised. This means that the scale of the two series being compared will be ignored, and that, loosely speaking, we will only focus on their shape during the comparison. By default, this parameter is set False.

Another parameter, which has no effect on the output of the estimators, is a boolean named store_distance_profile, set to False by default. If set to True, the estimators will expose an attribute named _distance_profile after the predict function is called. This attribute will contain the computed distance profile for query given as input to the predict function.

To illustrate how to work with similarity search estimators in aeon, we will now present some example use cases.


Generated using nbsphinx. The Jupyter notebook can be found here.