Catch22Clusterer¶

class Catch22Clusterer(features='all', catch24=True, outlier_norm=True, replace_nans=True, use_pycatch22=False, estimator=None, random_state=None, n_jobs=1, parallel_backend=None)[source]¶

Bases: BaseClusterer

Canonical Time-series Characteristics (catch22) clusterer.

This clusterer simply transforms the input data using the Catch22 [1] transformer and builds a provided estimator using the transformed data.

Parameters:

featuresint/str or List of int/str, default=”all”: The Catch22 features to extract by feature index, feature name as a str or as a list of names or indices for multiple features. If “all”, all features are extracted. Valid features are as follows:

[“DN_HistogramMode_5”, “DN_HistogramMode_10”, “SB_BinaryStats_diff_longstretch0”, “DN_OutlierInclude_p_001_mdrmd”, “DN_OutlierInclude_n_001_mdrmd”, “CO_f1ecac”, “CO_FirstMin_ac”, “SP_Summaries_welch_rect_area_5_1”, “SP_Summaries_welch_rect_centroid”, “FC_LocalSimple_mean3_stderr”, “CO_trev_1_num”, “CO_HistogramAMI_even_2_5”, “IN_AutoMutualInfoStats_40_gaussian_fmmi”, “MD_hrv_classic_pnn40”, “SB_BinaryStats_mean_longstretch1”, “SB_MotifThree_quantile_hh”, “FC_LocalSimple_mean1_tauresrat”, “CO_Embed2_Dist_tau_d_expfit_meandiff”, “SC_FluctAnal_2_dfa_50_1_2_logi_prop_r1”, “SC_FluctAnal_2_rsrangefit_50_1_logi_prop_r1”, “SB_TransitionMatrix_3ac_sumdiagcov”, “PD_PeriodicityWang_th0_01”]
catch24bool, default=True: Extract the mean and standard deviation as well as the 22 Catch22 features if true. If a List of specific features to extract is provided, “Mean” and/or “StandardDeviation” must be added to the List to extract these features. outlier_norm : bool, optional, default=False

If True, each time series is normalized during the computation of the two outlier Catch22 features, which can take a while to process for large values as it depends on the max value in the timseries. Note that this parameter did not exist in the original publication/implementation as they used time series that were already normalized.
replace_nansbool, default=True: Replace NaN or inf values from the Catch22 transform with 0.
use_pycatch22bool, default=False: Wraps the C based pycatch22 implementation for aeon. (https://github.com/DynamicsAndNeuralSystems/pycatch22). This requires the pycatch22 package to be installed if True.
estimatorsklearn clusterer, default=None: An sklearn estimator to be built using the transformed data. Defaults to sklearn KMeans().
random_stateint, RandomState instance or None, default=None: If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
n_jobsint, default=1: The number of jobs to run in parallel for both fit and predict. -1 means using all processors.
parallel_backendstr, ParallelBackendBase instance or None, default=None: Specify the parallelisation backend implementation in joblib for Catch22, if None a ‘prefer’ value of “threads” is used by default. Valid options are “loky”, “multiprocessing”, “threading” or a custom backend. See the joblib Parallel documentation for more details.

See also

Catch22: Catch22 transformer in aeon/transformations/collection.

Notes

Capabilities ¶
Missing Values	No
Multithreading	Yes
Univariate	Yes
Multivariate	Yes
Unequal Length	Yes

References

[1]

Lubba, Carl H., et al. “catch22: Canonical time-series characteristics.” Data Mining and Knowledge Discovery 33.6 (2019): 1821-1852. https://link.springer.com/article/10.1007/s10618-019-00647-x

Examples

>>> import numpy as np
>>> from sklearn.cluster import KMeans
>>> from aeon.clustering.feature_based import Catch22Clusterer
>>> X = np.random.random(size=(10,2,20))
>>> clst = Catch22Clusterer(estimator=KMeans(n_clusters=2))
>>> clst.fit(X)
Catch22Clusterer(...)
>>> preds = clst.predict(X)

Methods

`clone`([random_state])	Obtain a clone of the object with the same hyperparameters.
`fit`(X[, y])	Fit time series clusterer to training data.
`fit_predict`(X[, y])	Compute cluster centers and predict cluster index for each time series.
`get_class_tag`(tag_name[, raise_error, ...])	Get tag value from estimator class (only class tags).
`get_class_tags`()	Get class tags from estimator class and all its parent classes.
`get_fitted_params`([deep])	Get fitted parameters.
`get_params`([deep])	Get parameters for this estimator.
`get_tag`(tag_name[, raise_error, ...])	Get tag value from estimator class.
`get_tags`()	Get tags from estimator.
`predict`(X)	Predict the closest cluster each sample in X belongs to.
`predict_proba`(X)	Predicts labels probabilities for sequences in X.
`reset`([keep])	Reset the object to a clean post-init state.
`set_params`(**params)	Set the parameters of this estimator.
`set_tags`(**tag_dict)	Set dynamic tags to given values.

clone(random_state=None)[source]¶

Obtain a clone of the object with the same hyperparameters.

A clone is a different object without shared references, in post-init state. This function is equivalent to returning sklearn.clone of self. Equal in value to type(self)(**self.get_params(deep=False)).

Parameters:

random_stateint, RandomState instance, or None, default=None: Sets the random state of the clone. If None, the random state is not set. If int, random_state is the seed used by the random number generator. If RandomState instance, random_state is the random number generator.

Returns:

estimatorobject: Instance of type(self), clone of self (see above)

fit(X, y=None) → BaseCollectionEstimator[source]¶

Fit time series clusterer to training data.

Parameters:

X3D np.ndarray (any number of channels, equal length series)

of shape (n_cases, n_channels, n_timepoints)

or 2D np.array (univariate, equal length series): of shape (n_cases, n_timepoints)
or list of numpy arrays (any number of channels, unequal length series): of shape [n_cases], 2D np.array (n_channels, n_timepoints_i), where n_timepoints_i is length of series i

other types are allowed and converted into one of the above.

y: ignored, exists for API consistency reasons.

Returns:

self:: Fitted estimator.

fit_predict(X, y=None) → ndarray[source]¶

Compute cluster centers and predict cluster index for each time series.

Convenience method; equivalent of calling fit(X) followed by predict(X)

Parameters:

Xnp.ndarray (2d or 3d array of shape (n_cases, n_timepoints) or shape: (n_cases, n_channels, n_timepoints)). Time series instances used to train the clusterer and return their assigned cluster indices.
y: ignored, exists for API consistency reasons.

Returns:

np.ndarray (1d array of shape (n_cases,)): Index of the cluster each time series in X belongs to.

classmethod get_class_tag(tag_name, raise_error=True, tag_value_default=None)[source]¶

Get tag value from estimator class (only class tags).

Parameters:

tag_namestr: Name of tag value.
raise_errorbool, default=True: Whether a ValueError is raised when the tag is not found.
tag_value_defaultany type, default=None: Default/fallback value if tag is not found and error is not raised.

Returns:

tag_value: Value of the tag_name tag in cls. If not found, returns an error if raise_error is True, otherwise it returns tag_value_default.

Raises:

ValueError: if raise_error is True and tag_name is not in self.get_tags().keys()

Examples

>>> from aeon.classification import DummyClassifier
>>> DummyClassifier.get_class_tag("capability:multivariate")
True

classmethod get_class_tags()[source]¶

Get class tags from estimator class and all its parent classes.

Returns:

collected_tagsdict: Dictionary of tag name and tag value pairs. Collected from _tags class attribute via nested inheritance. These are not overridden by dynamic tags set by set_tags or class __init__ calls.

get_fitted_params(deep=True)[source]¶

Get fitted parameters.

State required:: Requires state to be “fitted”.

Parameters:

deepbool, default=True: If True, will return the fitted parameters for this estimator and contained subobjects that are estimators.

Returns:

fitted_paramsdict: Fitted parameter names mapped to their values.

get_params(deep=True)¶

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

get_tag(tag_name, raise_error=True, tag_value_default=None)[source]¶

Get tag value from estimator class.

Includes dynamic and overridden tags.

Parameters:

tag_namestr: Name of tag to be retrieved.
raise_errorbool, default=True: Whether a ValueError is raised when the tag is not found.
tag_value_defaultany type, default=None: Default/fallback value if tag is not found and error is not raised.

Returns:

tag_value: Value of the tag_name tag in self. If not found, returns an error if raise_error is True, otherwise it returns tag_value_default.

Raises:

ValueError: if raise_error is True and tag_name is not in self.get_tags().keys()

Examples

>>> from aeon.classification import DummyClassifier
>>> d = DummyClassifier()
>>> d.get_tag("capability:multivariate")
True

get_tags()[source]¶

Get tags from estimator.

Includes dynamic and overridden tags.

Returns:

collected_tagsdict: Dictionary of tag name and tag value pairs. Collected from _tags class attribute via nested inheritance and then any overridden and new tags from __init__ or set_tags.

predict(X) → ndarray[source]¶

Predict the closest cluster each sample in X belongs to.

Parameters:

X3D np.ndarray: Input data, any number of channels, equal length series of shape ( n_cases, n_channels, n_timepoints) or 2D np.array (univariate, equal length series) of shape (n_cases, n_timepoints) or list of numpy arrays (any number of channels, unequal length series) of shape [n_cases], 2D np.array (n_channels, n_timepoints_i), where n_timepoints_i is length of series i. Other types are allowed and converted into one of the above.

Returns:

np.array: shape (n_cases,), index of the cluster to which each time series in X belongs.

predict_proba(X) → ndarray[source]¶

Predicts labels probabilities for sequences in X.

Default behaviour is to call _predict and set the predicted class probability to 1, other class probabilities to 0. Override if better estimates are obtainable.

Parameters:

X3D np.ndarray: Input data, any number of channels, equal length series of shape ( n_cases, n_channels, n_timepoints) or 2D np.array (univariate, equal length series) of shape (n_cases, n_timepoints) or list of numpy arrays (any number of channels, unequal length series) of shape [n_cases], 2D np.array (n_channels, n_timepoints_i), where n_timepoints_i is length of series i. Other types are allowed and converted into one of the above.

Returns:

y2D array of shape [n_cases, n_classes] - predicted class probabilities: 1st dimension indices correspond to instance indices in X 2nd dimension indices correspond to possible labels (integers) (i, j)-th entry is predictive probability that i-th instance is of class j

reset(keep=None)[source]¶

Reset the object to a clean post-init state.

After a self.reset() call, self is equal or similar in value to type(self)(**self.get_params(deep=False)), assuming no other attributes were kept using keep.

Detailed behaviour:

removes any object attributes, except:: hyper-parameters (arguments of __init__) object attributes containing double-underscores, i.e., the string “__”

runs __init__ with current values of hyperparameters (result of get_params)

Not affected by the reset are:

object attributes containing double-underscores class and object methods, class attributes any attributes specified in the keep argument

Parameters:

keepNone, str, or list of str, default=None: If None, all attributes are removed except hyperparameters. If str, only the attribute with this name is kept. If list of str, only the attributes with these names are kept.

Returns:

selfobject: Reference to self.

Raises:

TypeError: If ‘keep’ is not a string or a list of strings.

set_params(**params)¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.

set_tags(**tag_dict)[source]¶

Set dynamic tags to given values.

Parameters:

**tag_dictdict: Dictionary of tag name and tag value pairs.

Returns:

selfobject: Reference to self.