TimeSeriesKMedoids¶
- class TimeSeriesKMedoids(n_clusters: int = 8, init: str | ndarray = 'random', distance: str | Callable = 'msm', method: str = 'pam', n_init: int = 10, max_iter: int = 300, tol: float = 1e-06, verbose: bool = False, random_state: int | RandomState | None = None, distance_params: dict | None = None)[source]¶
Bases:
BaseClustererTime series K-medoids implementation.
K-medoids [1] is a clustering algorithm that aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest medoid/centroid. This results in a partitioning of the data space into Voronoi cells. The problem of finding k-medoids is known to be NP-hard and has a time complexity of :
\[\mathbf{O}(\mathbf{n}\mathbf{k}(\mathbf{n} - \mathbf{k})^2).\]Where n is the number of time series and k is the number of clusters. There have been a number of algorithms published to solve the problem. The most common is the PAM (Partition Around Medoids)[R9ed115bfe2e7-3]_ algorithm and is the default method used in this implementation. However, an adaptation of lloyds method classically used for k-means is also available by specifying method=’alternate’. Alternate is faster but less accurate than PAM. For a full review of variations of k-medoids for time series see [5].
K-medoids for time series uses a dissimilarity method to compute the distance between time series. The default is ‘msm’ (move split merge) as it was found to significantly outperform the other measures in [2].
- Parameters:
- n_clustersint, default=8
The number of clusters to form as well as the number of centroids to generate.
- initstr or np.ndarray, default=’random’
Method for initialising cluster centers. Any of the following are valid: [‘kmedoids++’, ‘random’, ‘first’, ‘build’]. Random is the default as it is very fast and it was found in [2] to perform about as well as the other methods. Kmedoids++ is a variant of kmeans++ [4] and is slower but often more accurate than random. It works by choosing centroids that are distant from one another. First is the fastest method and simply chooses the first k time series as centroids. Build [1] greedily selects the k medoids by first selecting the medoid that minimizes the sum of distances to all other points(this point is the most centrally located) and then iteratively selects the next k-1 medoids that maximizes the decrease in sum of distances of all other points to their respective medoids selected so far. If a np.ndarray provided it must be of shape (n_clusters,) and contain the indexes of the time series to use as centroids.
- distancestr or Callable, default=’msm’
Distance method to compute similarity between time series. A list of valid strings for measures can be found in the documentation for
aeon.distances.get_distance_function. If a callable is passed it must be a function that takes two 2d numpy arrays as input and returns a float.- methodstr, default=’pam’
Method for computing k-medoids. Any of the following are valid: [‘alternate’, ‘pam’]. Alternate applies lloyds method to k-medoids and is faster but less accurate than PAM. PAM is implemented using the fastpam1 algorithm which gives the same output as PAM but is faster.
- n_initint, default=10
Number of times the k-medoids algorithm will be run with different centroid seeds. The final result will be the best output of n_init consecutive runs in terms of inertia.
- max_iterint, default=300
Maximum number of iterations of the k-medoids algorithm for a single run.
- tolfloat, default=1e-6
Relative tolerance with regards to Frobenius norm of the difference in the cluster centers of two consecutive iterations to declare convergence.
- verbosebool, default=False
Verbosity mode.
- random_stateint, np.random.RandomState instance or None, default=None
Determines random number generation for centroid initialization. If int, random_state is the seed used by the random number generator; If np.random.RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
- distance_params: dict, default=None
Dictionary containing kwargs for the distance method being used.
- Attributes:
- cluster_centers_np.ndarray, of shape (n_cases, n_channels, n_timepoints)
A collection of time series instances that represent the cluster centers.
- labels_np.ndarray (1d array of shape (n_case,))
Labels that is the index each time series belongs to.
- inertia_float
Sum of squared distances of samples to their closest cluster center, weighted by the sample weights if provided.
- n_iter_int
Number of iterations run.
Notes
Capabilities ¶ Missing Values
No
Multithreading
No
Univariate
Yes
Multivariate
Yes
Unequal Length
No
References
[1]Kaufmann, Leonard & Rousseeuw, Peter. (1987). Clustering by Means of Medoids.
Data Analysis based on the L1-Norm and Related Methods. 405-416.
[2]Holder, Christopher & Middlehurst, Matthew & Bagnall, Anthony. (2022).
A Review and Evaluation of Elastic Distance Functions for Time Series Clustering. 10.48550/arXiv.2205.15181.
[3]Kaufman, L. and Rousseeuw, P.J. (1990). Partitioning Around Medoids
(Program PAM). In Finding Groups in Data (eds L. Kaufman and P.J. Rousseeuw). https://doi.org/10.1002/9780470316801.ch2
[4]Arthur, David & Vassilvitskii, Sergei. (2007). K-Means++: The Advantages of
Careful Seeding. Proc. of the Annu. ACM-SIAM Symp. on Discrete Algorithms. 8. 1027-1035. 10.1145/1283383.1283494.
[5]Holder, Christopher & Guijo-Rubio, David & Bagnall, Anthony. (2023).
Clustering time series with k-medoids based algorithms. In proceedings of the 8th Workshop on Advanced Analytics and Learning on Temporal Data (AALTD 2023).
Examples
>>> from aeon.clustering import TimeSeriesKMedoids >>> from aeon.datasets import load_basic_motions >>> # Load data >>> X_train, y_train = load_basic_motions(split="TRAIN")[0:10] >>> X_test, y_test = load_basic_motions(split="TEST")[0:10] >>> # Example of PAM clustering >>> km = TimeSeriesKMedoids(n_clusters=3, distance="euclidean", random_state=1) >>> km.fit(X_train) TimeSeriesKMedoids(distance='euclidean', n_clusters=3, random_state=1) >>> pam_pred = km.predict(X_test) # Example of alternate clustering >>> km = TimeSeriesKMedoids(n_clusters=3, distance="dtw", method="alternate", ... random_state=1) >>> km.fit(X_train) TimeSeriesKMedoids(distance='dtw', method='alternate', n_clusters=3, random_state=1) >>> alternate_pred = km.predict(X_test)
Methods
clone([random_state])Obtain a clone of the object with the same hyperparameters.
fit(X[, y])Fit time series clusterer to training data.
fit_predict(X[, y])Compute cluster centers and predict cluster index for each time series.
get_class_tag(tag_name[, raise_error, ...])Get tag value from estimator class (only class tags).
Get class tags from estimator class and all its parent classes.
get_fitted_params([deep])Get fitted parameters.
get_params([deep])Get parameters for this estimator.
get_tag(tag_name[, raise_error, ...])Get tag value from estimator class.
get_tags()Get tags from estimator.
predict(X)Predict the closest cluster each sample in X belongs to.
Predicts labels probabilities for sequences in X.
reset([keep])Reset the object to a clean post-init state.
set_params(**params)Set the parameters of this estimator.
set_tags(**tag_dict)Set dynamic tags to given values.
- clone(random_state=None)[source]¶
Obtain a clone of the object with the same hyperparameters.
A clone is a different object without shared references, in post-init state. This function is equivalent to returning
sklearn.cloneofself. Equal in value totype(self)(**self.get_params(deep=False)).- Parameters:
- random_stateint, RandomState instance, or None, default=None
Sets the random state of the clone. If
None, the random state is not set. Ifint,random_stateis the seed used by the random number generator. IfRandomStateinstance,random_stateis the random number generator.
- Returns:
- estimatorobject
Instance of
type(self), clone of self (see above)
- fit(X, y=None) BaseCollectionEstimator[source]¶
Fit time series clusterer to training data.
- Parameters:
- X3D np.ndarray (any number of channels, equal length series)
of shape (n_cases, n_channels, n_timepoints)
- or 2D np.array (univariate, equal length series)
of shape (n_cases, n_timepoints)
- or list of numpy arrays (any number of channels, unequal length series)
of shape [n_cases], 2D np.array (n_channels, n_timepoints_i), where n_timepoints_i is length of series i
other types are allowed and converted into one of the above.
- y: ignored, exists for API consistency reasons.
- Returns:
- self:
Fitted estimator.
- fit_predict(X, y=None) ndarray[source]¶
Compute cluster centers and predict cluster index for each time series.
Convenience method; equivalent of calling fit(X) followed by predict(X)
- Parameters:
- Xnp.ndarray (2d or 3d array of shape (n_cases, n_timepoints) or shape
(n_cases, n_channels, n_timepoints)). Time series instances to train clusterer and then have indexes each belong to return.
- y: ignored, exists for API consistency reasons.
- Returns:
- np.ndarray (1d array of shape (n_cases,))
Index of the cluster each time series in X belongs to.
- classmethod get_class_tag(tag_name, raise_error=True, tag_value_default=None)[source]¶
Get tag value from estimator class (only class tags).
- Parameters:
- tag_namestr
Name of tag value.
- raise_errorbool, default=True
Whether a
ValueErroris raised when the tag is not found.- tag_value_defaultany type, default=None
Default/fallback value if tag is not found and error is not raised.
- Returns:
- tag_value
Value of the
tag_nametag in cls. If not found, returns an error ifraise_errorisTrue, otherwise it returnstag_value_default.
- Raises:
- ValueError
if
raise_errorisTrueandtag_nameis not inself.get_tags().keys()
Examples
>>> from aeon.classification import DummyClassifier >>> DummyClassifier.get_class_tag("capability:multivariate") True
- classmethod get_class_tags()[source]¶
Get class tags from estimator class and all its parent classes.
- Returns:
- collected_tagsdict
Dictionary of tag name and tag value pairs. Collected from
_tagsclass attribute via nested inheritance. These are not overridden by dynamic tags set byset_tagsor class__init__calls.
- get_fitted_params(deep=True)[source]¶
Get fitted parameters.
- State required:
Requires state to be “fitted”.
- Parameters:
- deepbool, default=True
If
True, will return the fitted parameters for this estimator and contained subobjects that are estimators.
- Returns:
- fitted_paramsdict
Fitted parameter names mapped to their values.
- get_params(deep=True)¶
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- get_tag(tag_name, raise_error=True, tag_value_default=None)[source]¶
Get tag value from estimator class.
Includes dynamic and overridden tags.
- Parameters:
- tag_namestr
Name of tag to be retrieved.
- raise_errorbool, default=True
Whether a
ValueErroris raised when the tag is not found.- tag_value_defaultany type, default=None
Default/fallback value if tag is not found and error is not raised.
- Returns:
- tag_value
Value of the
tag_nametag in self. If not found, returns an error ifraise_errorisTrue, otherwise it returnstag_value_default.
- Raises:
- ValueError
if raise_error is
Trueandtag_nameis not inself.get_tags().keys()
Examples
>>> from aeon.classification import DummyClassifier >>> d = DummyClassifier() >>> d.get_tag("capability:multivariate") True
- get_tags()[source]¶
Get tags from estimator.
Includes dynamic and overridden tags.
- Returns:
- collected_tagsdict
Dictionary of tag name and tag value pairs. Collected from
_tagsclass attribute via nested inheritance and then any overridden and new tags from__init__orset_tags.
- predict(X) ndarray[source]¶
Predict the closest cluster each sample in X belongs to.
- Parameters:
- X3D np.ndarray
Input data, any number of channels, equal length series of shape
( n_cases, n_channels, n_timepoints)or 2D np.array (univariate, equal length series) of shape(n_cases, n_timepoints)or list of numpy arrays (any number of channels, unequal length series) of shape[n_cases], 2D np.array(n_channels, n_timepoints_i), wheren_timepoints_iis length of seriesi. Other types are allowed and converted into one of the above.
- Returns:
- np.array
shape ``(n_cases)`, index of the cluster each time series in X. belongs to.
- predict_proba(X) ndarray[source]¶
Predicts labels probabilities for sequences in X.
Default behaviour is to call _predict and set the predicted class probability to 1, other class probabilities to 0. Override if better estimates are obtainable.
- Parameters:
- X3D np.ndarray
Input data, any number of channels, equal length series of shape
( n_cases, n_channels, n_timepoints)or 2D np.array (univariate, equal length series) of shape(n_cases, n_timepoints)or list of numpy arrays (any number of channels, unequal length series) of shape[n_cases], 2D np.array(n_channels, n_timepoints_i), wheren_timepoints_iis length of seriesi. Other types are allowed and converted into one of the above.
- Returns:
- y2D array of shape [n_cases, n_classes] - predicted class probabilities
1st dimension indices correspond to instance indices in X 2nd dimension indices correspond to possible labels (integers) (i, j)-th entry is predictive probability that i-th instance is of class j
- reset(keep=None)[source]¶
Reset the object to a clean post-init state.
After a
self.reset()call,selfis equal or similar in value totype(self)(**self.get_params(deep=False)), assuming no other attributes were kept usingkeep.- Detailed behaviour:
- removes any object attributes, except:
hyper-parameters (arguments of
__init__) object attributes containing double-underscores, i.e., the string “__”
runs
__init__with current values of hyperparameters (result ofget_params)- Not affected by the reset are:
object attributes containing double-underscores class and object methods, class attributes any attributes specified in the
keepargument
- Parameters:
- keepNone, str, or list of str, default=None
If
None, all attributes are removed except hyperparameters. Ifstr, only the attribute with this name is kept. Iflistofstr, only the attributes with these names are kept.
- Returns:
- selfobject
Reference to self.
- Raises:
- TypeError
If ‘keep’ is not a string or a list of strings.
- set_params(**params)¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.