TimeSeriesKMeans¶

class TimeSeriesKMeans(n_clusters: int = 8, init_algorithm: str | ndarray = 'random', distance: str | Callable = 'msm', n_init: int = 10, max_iter: int = 300, tol: float = 1e-06, verbose: bool = False, random_state: int | RandomState | None = None, averaging_method: str | Callable[[ndarray], ndarray] = 'ba', distance_params: dict | None = None, average_params: dict | None = None)[source]¶

Time series K-means clustering implementation.

K-means [5]_ is a popular clustering algorithm that aims to partition n time series into k clusters in which each observation belongs to the cluster with the nearest centre. The centre is represented using an average which is generated during the training phase.

K-means using euclidean distance for time series generally performs poorly. However, when combined with an elastic distance it performs significantly better (in particular MSM/TWE [1]). K-means for time series can further be improved by using an elastic averaging method. The most common one is dynamic barycenter averaging [3] however, in recent years alternates using other elastic distances such as ShapeDBA [4] (Shape DTW DBA) and MBA (Msm DBA) [5]_ have shown signicant performance benefits.

Parameters:

n_clustersint, default=8: The number of clusters to form as well as the number of centroids to generate.
init_algorithmstr or np.ndarray, default=’random’: Random is the default and simply chooses k time series at random as centroids. It is fast but sometimes yields sub-optimal clustering. Kmeans++ [2] and is slower but often more accurate than random. It works by choosing centroids that are distant from one another. First is the fastest method and simply chooses the first k time series as centroids. If a np.ndarray provided it must be of shape (n_clusters, n_channels, n_timepoints) and contains the time series to use as centroids.
distancestr or Callable, default=’msm’: Distance metric to compute similarity between time series. A list of valid strings for metrics can be found in the documentation for aeon.distances.get_distance_function. If a callable is passed it must be a function that takes two 2d numpy arrays as input and returns a float.
n_initint, default=10: Number of times the k-means algorithm will be run with different centroid seeds. The final result will be the best output of n_init consecutive runs in terms of inertia.
max_iterint, default=300: Maximum number of iterations of the k-means algorithm for a single run.
tolfloat, default=1e-6: Relative tolerance with regards to Frobenius norm of the difference in the cluster centers of two consecutive iterations to declare convergence.
verbosebool, default=False: Verbosity mode.
random_stateint or np.random.RandomState instance or None, default=None: Determines random number generation for centroid initialization.
averaging_methodstr or Callable, default=’ba’: Averaging method to compute the average of a cluster. Any of the following strings are valid: [‘mean’, ‘ba’]. If a Callable is provided must take the form Callable[[np.ndarray], np.ndarray]. If you specify ‘ba’ then by default the distance measure used will be the same as the distance measure used for clustering. If you wish to use a different distance measure you can specify it by passing {“distance”: “dtw”} as averaging_params. BA yields ‘better’ clustering results but is very computationally expensive so you may want to consider setting a bounding window or using a different averaging method if time complexity is a concern.
average_paramsdict, default=None: Dictionary containing kwargs for averaging_method. See documentation of aeon.clustering.averaging and aeon.distances for more details. NOTE: if you want to use custom distance params during averaging here you must specify them in this dict in addition to custom averaging params. For example to specify a window as a distance param and verbosity for the averaging you would pass average_params={“window”: 0.2, “verbose”: True}.
distance_paramsdict, default=None: Dictionary containing kwargs for the distance being used. For example if you wanted to specify a window for DTW you would pass distance_params={“window”: 0.2}. See documentation of aeon.distances for more details.

Attributes:

cluster_centers_3d np.ndarray: Array of shape (n_clusters, n_channels, n_timepoints)) Time series that represent each of the cluster centers.
labels_1d np.ndarray: 1d array of shape (n_case,) Labels that is the index each time series belongs to.
inertia_float: Sum of distances of samples to their closest cluster center, weighted by the sample weights if provided.
n_iter_int: Number of iterations run.

References

[1]

Holder, Christopher & Middlehurst, Matthew & Bagnall, Anthony. (2022).

A Review and Evaluation of Elastic Distance Functions for Time Series Clustering. 10.48550/arXiv.2205.15181.

[2]

Arthur, David & Vassilvitskii, Sergei. (2007). K-Means++: The Advantages of

Careful Seeding. Proc. of the Annu. ACM-SIAM Symp. on Discrete Algorithms. 8. 1027-1035. 10.1145/1283383.1283494.

[3]

Holder, Christopher & Guijo-Rubio, David & Bagnall, Anthony. (2023).

Clustering time series with k-medoids based algorithms. In proceedings of the 8th Workshop on Advanced Analytics and Learning on Temporal Data (AALTD 2023).

[4]

Ali Ismail-Fawaz & Hassan Ismail Fawaz & Francois Petitjean &

Maxime Devanne & Jonathan Weber & Stefano Berretti & Geoffrey I. Webb & Germain Forestier ShapeDBA: Generating Effective Time Series Prototypes using ShapeDTW Barycenter Averaging. In proceedings of the 8th Workshop on Advanced Analytics and Learning on Temporal Data (AALTD 2023).

..[5] Lloyd, S. P. (1982). Least squares quantization in pcm. IEEE Trans. Inf. Theory, 28:129–136.

Examples

>>> import numpy as np
>>> from aeon.clustering import TimeSeriesKMeans
>>> X = np.random.random(size=(10,2,20))
>>> clst= TimeSeriesKMeans(distance="euclidean",n_clusters=2)
>>> clst.fit(X)
TimeSeriesKMeans(distance='euclidean', n_clusters=2)
>>> preds = clst.predict(X)

Methods

`check_is_fitted`()	Check if the estimator has been fitted.
`clone`()	Obtain a clone of the object with same hyper-parameters.
`clone_tags`(estimator[, tag_names])	Clone/mirror tags from another estimator as dynamic override.
`create_test_instance`([parameter_set])	Construct Estimator instance if possible.
`create_test_instances_and_names`([parameter_set])	Create list of all test instances and a list of names for them.
`fit`(X[, y])	Fit time series clusterer to training data.
`fit_predict`(X[, y])	Compute cluster centers and predict cluster index for each time series.
`get_class_tag`(tag_name[, tag_value_default])	Get tag value from estimator class (only class tags).
`get_class_tags`()	Get class tags from estimator class and all its parent classes.
`get_fitted_params`([deep])	Get fitted parameters.
`get_metadata_routing`()	Get metadata routing of this object.
`get_param_defaults`()	Get parameter defaults for the object.
`get_param_names`()	Get parameter names for the object.
`get_params`([deep])	Get parameters for this estimator.
`get_tag`(tag_name[, tag_value_default, ...])	Get tag value from estimator class.
`get_tags`()	Get tags from estimator class.
`get_test_params`([parameter_set])	Return testing parameter settings for the estimator.
`is_composite`()	Check if the object is composite.
`load_from_path`(serial)	Load object from file location.
`load_from_serial`(serial)	Load object from serialized memory container.
`predict`(X[, y])	Predict the closest cluster each sample in X belongs to.
`predict_proba`(X)	Predicts labels probabilities for sequences in X.
`reset`()	Reset the object to a clean post-init state.
`save`([path])	Save serialized self to bytes-like object or to (.zip) file.
`score`(X[, y])	Score the quality of the clusterer.
`set_params`(**params)	Set the parameters of this object.
`set_tags`(**tag_dict)	Set dynamic tags to given values.

classmethod get_test_params(parameter_set='default')[source]¶

Return testing parameter settings for the estimator.

Parameters:

parameter_setstr, default=”default”: Name of the set of test parameters to return, for use in tests. If no special parameters are defined for a value, will return “default” set.

Returns:

paramsdict or list of dict, default={}: Parameters to create testing instances of the class Each dict are parameters to construct an “interesting” test instance, i.e., MyClass(**params) or MyClass(**params[i]) creates a valid test instance. create_test_instance uses the first (or only) dictionary in params

check_is_fitted()[source]¶

Check if the estimator has been fitted.

Raises:

NotFittedError: If the estimator has not been fitted yet.

clone()[source]¶

Obtain a clone of the object with same hyper-parameters.

A clone is a different object without shared references, in post-init state. This function is equivalent to returning sklearn.clone of self. Equal in value to type(self)(**self.get_params(deep=False)).

Returns:

instance of type(self), clone of self (see above)

clone_tags(estimator, tag_names=None)[source]¶

Clone/mirror tags from another estimator as dynamic override.

Parameters:

estimatorobject: Estimator inheriting from :class:BaseEstimator.
tag_namesstr or list of str, default = None: Names of tags to clone. If None then all tags in estimator are used as tag_names.

Returns:

Self: Reference to self.

Notes

Changes object state by setting tag values in tag_set from estimator as dynamic tags in self.

classmethod create_test_instance(parameter_set='default')[source]¶

Construct Estimator instance if possible.

Parameters:

parameter_setstr, default=”default”: Name of the set of test parameters to return, for use in tests. If no special parameters are defined for a value, will return “default” set.

Returns:

instanceinstance of the class with default parameters.

Notes

get_test_params can return dict or list of dict. This function takes first or single dict that get_test_params returns, and constructs the object with that.

classmethod create_test_instances_and_names(parameter_set='default')[source]¶

Create list of all test instances and a list of names for them.

Parameters:

parameter_setstr, default=”default”: Name of the set of test parameters to return, for use in tests. If no special parameters are defined for a value, will return “default” set.

Returns:

objslist of instances of cls: i-th instance is cls(**cls.get_test_params()[i]).
nameslist of str, same length as objs: i-th element is name of i-th instance of obj in tests convention is {cls.__name__}-{i} if more than one instance otherwise {cls.__name__}.
parameter_setstr, default=”default”: Name of the set of test parameters to return, for use in tests. If no special parameters are defined for a value, will return “default” set.

fit(X, y=None) → BaseCollectionEstimator[source]¶

Fit time series clusterer to training data.

Parameters:

X3D np.ndarray (any number of channels, equal length series)

of shape (n_cases, n_channels, n_timepoints)

or 2D np.array (univariate, equal length series): of shape (n_cases, n_timepoints)
or list of numpy arrays (any number of channels, unequal length series): of shape [n_cases], 2D np.array (n_channels, n_timepoints_i), where n_timepoints_i is length of series i

other types are allowed and converted into one of the above.

y: ignored, exists for API consistency reasons.

Returns:

self:: Fitted estimator.

fit_predict(X, y=None) → ndarray[source]¶

Compute cluster centers and predict cluster index for each time series.

Convenience method; equivalent of calling fit(X) followed by predict(X)

Parameters:

Xnp.ndarray (2d or 3d array of shape (n_cases, n_timepoints) or shape: (n_cases, n_channels, n_timepoints)). Time series instances to train clusterer and then have indexes each belong to return.
y: ignored, exists for API consistency reasons.

Returns:

np.ndarray (1d array of shape (n_cases,)): Index of the cluster each time series in X belongs to.

classmethod get_class_tag(tag_name, tag_value_default=None)[source]¶

Get tag value from estimator class (only class tags).

Parameters:

tag_namestr: Name of tag value.
tag_value_defaultany type: Default/fallback value if tag is not found.

Returns:

tag_value: Value of the tag_name tag in self. If not found, returns tag_value_default.

See also

get_tag: Get a single tag from an object.
get_clas_tags: Get all tags from a class.
get_class_tag: Get a single tag from a class.

Examples

>>> from aeon.classification import DummyClassifier
>>> d = DummyClassifier()
>>> tags = d.get_tags()

is_composite()[source]¶

Check if the object is composite.

A composite object is an object which contains objects, as parameters. Called on an instance, since this may differ by instance.

Returns:

composite: bool: Whether self contains a parameter which is BaseObject.

property is_fitted[source]¶: Whether fit has been called.

classmethod load_from_path(serial)[source]¶

Load object from file location.

Parameters:

serialobject: Result of ZipFile(path).open(“object).

Returns:

deserialized self resulting in output at path, of cls.save(path)

classmethod load_from_serial(serial)[source]¶

Load object from serialized memory container.

Parameters:

serialobject: First element of output of cls.save(None).

Returns:

deserialized self resulting in output serial, of cls.save(None).

predict(X, y=None) → ndarray[source]¶

Predict the closest cluster each sample in X belongs to.

Parameters:

X3D np.ndarray: Input data, any number of channels, equal length series of shape ( n_cases, n_channels, n_timepoints) or 2D np.array (univariate, equal length series) of shape (n_cases, n_timepoints) or list of numpy arrays (any number of channels, unequal length series) of shape [n_cases], 2D np.array (n_channels, n_timepoints_i), where n_timepoints_i is length of series i. Other types are allowed and converted into one of the above.
y: ignored, exists for API consistency reasons.

Returns:

np.array: shape ``(n_cases)`, index of the cluster each time series in X. belongs to.

predict_proba(X) → ndarray[source]¶

Predicts labels probabilities for sequences in X.

Default behaviour is to call _predict and set the predicted class probability to 1, other class probabilities to 0. Override if better estimates are obtainable.

Parameters:

X3D np.ndarray: Input data, any number of channels, equal length series of shape ( n_cases, n_channels, n_timepoints) or 2D np.array (univariate, equal length series) of shape (n_cases, n_timepoints) or list of numpy arrays (any number of channels, unequal length series) of shape [n_cases], 2D np.array (n_channels, n_timepoints_i), where n_timepoints_i is length of series i. Other types are allowed and converted into one of the above.

Returns:

y2D array of shape [n_cases, n_classes] - predicted class probabilities: 1st dimension indices correspond to instance indices in X 2nd dimension indices correspond to possible labels (integers) (i, j)-th entry is predictive probability that i-th instance is of class j

reset()[source]¶

Reset the object to a clean post-init state.

Equivalent to sklearn.clone but overwrites self. After self.reset() call, self is equal in value to type(self)(**self.get_params(deep=False))

Detail behaviour: removes any object attributes, except:

hyper-parameters = arguments of __init__ object attributes containing double-underscores, i.e., the string “__”

runs __init__ with current values of hyper-parameters (result of get_params)

Not affected by the reset are: object attributes containing double-underscores class and object methods, class attributes

save(path=None)[source]¶

Save serialized self to bytes-like object or to (.zip) file.

Behaviour: if path is None, returns an in-memory serialized self if path is a file location, stores self at that location as a zip file

saved files are zip files with following contents: _metadata - contains class of self, i.e., type(self) _obj - serialized self. This class uses the default serialization (pickle).

Parameters:

pathNone or file location (str or Path).: if None, self is saved to an in-memory object if file location, self is saved to that file location. If:

path=”estimator” then a zip file estimator.zip will be made at cwd. path=”/home/stored/estimator” then a zip file estimator.zip will be stored in /home/stored/.

Returns:

if path is None - in-memory serialized self
if path is file location - ZipFile with reference to the file.

score(X, y=None) → float[source]¶

Score the quality of the clusterer.

Parameters:

Xnp.ndarray (2d or 3d array of shape (n_cases, n_timepoints) or shape: (n_cases, n_channels, n_timepoints)). Time series instances to train clusterer and then have indexes each belong to return.
y: ignored, exists for API consistency reasons.

Returns:

scorefloat: Score of the clusterer.

set_params(**params)[source]¶

Set the parameters of this object.

The method works on simple estimators as well as on nested objects. The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**paramsdict: BaseObject parameters

Returns:

selfreference to self (after parameters have been set)

set_tags(**tag_dict)[source]¶

Set dynamic tags to given values.

Parameters:

**tag_dictdict: Dictionary of tag name : tag value pairs.

Returns:

Self: Reference to self.

Notes

Changes object state by setting tag values in tag_dict as dynamic tags in self.