naive_matrix_profile¶
- naive_matrix_profile(X: ndarray | List, T: ndarray, length: int, k: int = 1, threshold: float = inf, distance: str = 'euclidean', distance_args: dict | None = None, inverse_distance: bool = False, normalize: bool = False, speed_up: str = 'fastest', n_jobs: int = 1, X_index: int | None = None, exclusion_factor: float = 2.0, apply_exclusion_to_result: bool = True)[source]¶
Compute a matrix profile in a naive way, by looping through a query search.
- Parameters:
- X: np.ndarray, 3D array of shape (n_cases, n_channels, n_timepoints)
The input samples. If X is an unquel length collection, expect a TypedList of 2D arrays of shape (n_channels, n_timepoints)
- Tnp.ndarray, 2D array of shape (n_channels, series_length)
The series used for similarity search. Note that series_length can be equal, superior or inferior to n_timepoints, it doesn’t matter.
- lengthint
The length of the subsequences considered during the search. This parameter cannot be larger than n_timepoints and series_length.
- kint, default=1
The number of best matches to return during predict for each subsequence.
- thresholdfloat, default=np.inf
The number of best matches to return during predict for each subsequence.
- distancestr, default=”euclidean”
Name of the distance function to use. A list of valid strings can be found in the documentation for
aeon.distances.get_distance_function. If a callable is passed it must either be a python function or numba function with nopython=True, that takes two 1d numpy arrays as input and returns a float.- distance_argsdict, default=None
Optional keyword arguments for the distance function.
- normalizebool, default=False
Whether the distance function should be z-normalized.
- speed_upstr, default=’fastest’
Which speed up technique to use with for the selected distance function. By default, the fastest algorithm is used. A list of available algorithm for each distance can be obtained by calling the get_speedup_function_names function.
- inverse_distancebool, default=False
If True, the matching will be made on the inverse of the distance, and thus, the worst matches to the query will be returned instead of the best ones.
- n_jobsint, default=1
Number of parallel jobs to use.
- X_indexint, default=None
An int used to specify the index of T in X, if T is part of X. Otherwise, defaults to None, meaning that T is not a sample of X.
- exclusion_factorfloat, default=2.
The factor to apply to the query length to define the exclusion zone. The exclusion zone is define from \(id_timestamp - query_length//exclusion_factor\) to \(id_timestamp + query_length//exclusion_factor\). This also applies to the matching conditions defined by child classes. For example, with TopKSimilaritySearch, the k best matches are also subject to the exclusion zone, but with \(id_timestamp\) the index of one of the k matches.
- apply_exclusion_to_result: bool, default=True
Wheter to apply the exclusion factor to the output of the similarity search. This means that two matches of the query from the same sample must be at least spaced by +/- \(query_length//exclusion_factor\). This can avoid pathological matching where, for example if we extract the best two matches, there is a high chance that if the best match is located at \(id_timestamp\), the second best match will be located at \(id_timestamp\) +/- 1, as they both share all their values except one.
- Returns:
- Tuple(ndarray, ndarray)
The first array, of shape
(series_length - length + 1, n_matches), contains the distance between all the queries of size length and their best matches in X_. The second array, of shape(series_length - L + 1, n_matches, 2), contains the indexes of these matches as(id_sample, id_timepoint). The corresponding match can be retrieved asX_[id_sample, :, id_timepoint : id_timepoint + length].