STRAY

class STRAY(alpha: float = 0.01, k: int = 10, knn_algorithm: str = 'brute', p: float = 0.5, size_threshold: int = 50, outlier_tail: str = 'max')[source]

STRAY: robust anomaly detection in data streams with concept drift.

This is based on STRAY (Search TRace AnomalY) [1], which is a modification of HDoutliers [2]. HDoutliers is a powerful algorithm for the detection of anomalous observations in a dataset, which has (among other advantages) the ability to detect clusters of outliers in multidimensional data without requiring a model of the typical behavior of the system. However, it suffers from some limitations that affect its accuracy. STRAY is an extension of HDoutliers that uses extreme value theory for the anomolous threshold calculation, to deal with data streams that exhibit non-stationary behavior.

Capabilities

Input data format

univariate and multivariate

Output data format

binary classification

Learning Type

unsupervised

Parameters:
alphafloat, default=0.01

Threshold for determining the cutoff for outliers. Observations are considered outliers if they fall in the (1 - alpha) tail of the distribution of the nearest-neighbor distances between exemplars.

kint, default=10

Number of neighbours considered.

knn_algorithmstr {“auto”, “ball_tree”, “kd_tree”, “brute”}, optional

(default=”brute”) Algorithm used to compute the nearest neighbors, from sklearn.neighbors.NearestNeighbors

pfloat, default=0.5

Proportion of possible candidates for outliers. This defines the starting point for the bottom up searching algorithm.

size_thresholdint, default=50

Sample size to calculate an emperical threshold.

outlier_tailstr {“min”, “max”}, default=”max”

Direction of the outlier tail.

References

[1]

Talagala, Priyanga Dilini, Rob J. Hyndman, and Kate Smith-Miles. “Anomaly detection in high-dimensional data.” Journal of Computational and Graphical Statistics 30.2 (2021): 360-374.

[2]

Wilkinson, Leland. “Visualizing big data outliers through distributed aggregation.” IEEE transactions on visualization and computer graphics 24.1 (2017): 256-266.

Examples

>>> from aeon.anomaly_detection import STRAY
>>> from aeon.datasets import load_airline
>>> from sklearn.preprocessing import MinMaxScaler
>>> import numpy as np
>>> X = load_airline().to_frame().to_numpy()
>>> scaler = MinMaxScaler()
>>> X = scaler.fit_transform(X)
>>> detector = STRAY(k=3)
>>> y = detector.fit_predict(X, axis=0)
>>> y[:5]
array([False, False, False, False, False])

Methods

check_is_fitted()

Check if the estimator has been fitted.

clone([random_state])

Obtain a clone of the object with the same hyperparameters.

create_test_instance([parameter_set, ...])

Construct Estimator instance if possible.

fit(X[, y, axis])

Fit time series anomaly detector to X.

fit_predict(X[, y, axis])

Fit time series anomaly detector and find anomalies for X.

get_class_tag(tag_name[, tag_value_default, ...])

Get tag value from estimator class (only class tags).

get_class_tags()

Get class tags from estimator class and all its parent classes.

get_fitted_params([deep])

Get fitted parameters.

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

get_tag(tag_name[, tag_value_default, ...])

Get tag value from estimator class.

get_tags()

Get tags from estimator.

get_test_params([parameter_set])

Return testing parameter settings for the estimator.

load_from_path(serial)

Load object from file location.

load_from_serial(serial)

Load object from serialized memory container.

predict(X[, axis])

Find anomalies in X.

reset([keep])

Reset the object to a clean post-init state.

save([path])

Save serialized self to bytes-like object or to (.zip) file.

set_fit_request(*[, axis])

Request metadata passed to the fit method.

set_params(**params)

Set the parameters of this estimator.

set_predict_request(*[, axis])

Request metadata passed to the predict method.

set_tags(**tag_dict)

Set dynamic tags to given values.

check_is_fitted()[source]

Check if the estimator has been fitted.

Raises:
NotFittedError

If the estimator has not been fitted yet.

clone(random_state=None)[source]

Obtain a clone of the object with the same hyperparameters.

A clone is a different object without shared references, in post-init state. This function is equivalent to returning sklearn.clone of self. Equal in value to type(self)(**self.get_params(deep=False)).

Parameters:
random_stateint, RandomState instance, or None, default=None

Sets the random state of the clone. If None, the random state is not set. If int, random_state is the seed used by the random number generator. If RandomState instance, random_state is the random number generator.

Returns:
estimatorobject

Instance of type(self), clone of self (see above)

classmethod create_test_instance(parameter_set='default', return_first=True)[source]

Construct Estimator instance if possible.

Calls the get_test_params method and returns an instance or list of instances using the returned dict or list of dict.

Parameters:
parameter_setstr, default=”default”

Name of the set of test parameters to return, for use in tests. If no special parameters are defined for a value, will return “default” set.

return_firstbool, default=True

If True, return the first instance of the list of instances. If False, return the list of instances.

Returns:
instanceBaseAeonEstimator or list of BaseAeonEstimator

Instance of the class with default parameters. If return_first is False, returns list of instances.

fit(X, y=None, axis=1)[source]

Fit time series anomaly detector to X.

If the tag fit_is_empty is true, this just sets the is_fitted tag to true. Otherwise, it checks self can handle X, formats X into the structure required by self then passes X (and possibly y) to _fit.

Parameters:
Xone of aeon.base._base_series.VALID_INPUT_TYPES

The time series to fit the model to. A valid aeon time series data structure. See aeon.base._base_series.VALID_INPUT_TYPES for aeon supported types.

yone of aeon.base._base_series.VALID_INPUT_TYPES, default=None

The target values for the time series. A valid aeon time series data structure. See aeon.base._base_series.VALID_INPUT_TYPES for aeon supported types.

axisint

The time point axis of the input series if it is 2D. If axis==0, it is assumed each column is a time series and each row is a time point. i.e. the shape of the data is (n_timepoints, n_channels). axis==1 indicates the time series are in rows, i.e. the shape of the data is (n_channels, n_timepoints).

Returns:
BaseAnomalyDetector

The fitted estimator, reference to self.

fit_predict(X, y=None, axis=1) ndarray[source]

Fit time series anomaly detector and find anomalies for X.

Parameters:
Xone of aeon.base._base_series.VALID_INPUT_TYPES

The time series to fit the model to. A valid aeon time series data structure. See aeon.base._base_series.VALID_INPUT_TYPES for aeon supported types.

yone of aeon.base._base_series.VALID_INPUT_TYPES, default=None

The target values for the time series. A valid aeon time series data structure. See aeon.base._base_series.VALID_INPUT_TYPES for aeon supported types.

axisint, default=1

The time point axis of the input series if it is 2D. If axis==0, it is assumed each column is a time series and each row is a time point. i.e. the shape of the data is (n_timepoints, n_channels). axis==1 indicates the time series are in rows, i.e. the shape of the data is (n_channels, n_timepoints).

Returns:
np.ndarray

A boolean, int or float array of length len(X), where each element indicates whether the corresponding subsequence is anomalous or its anomaly score.

classmethod get_class_tag(tag_name, tag_value_default=None, raise_error=False)[source]

Get tag value from estimator class (only class tags).

Parameters:
tag_namestr

Name of tag value.

tag_value_defaultany type

Default/fallback value if tag is not found.

raise_errorbool

Whether a ValueError is raised when the tag is not found.

Returns:
tag_value

Value of the tag_name tag in self. If not found, returns an error if raise_error is True, otherwise it returns tag_value_default.

Raises:
ValueError

if raise_error is True and tag_name is not in self.get_tags().keys()

Examples

>>> from aeon.classification import DummyClassifier
>>> DummyClassifier.get_class_tag("capability:multivariate")
True
classmethod get_class_tags()[source]

Get class tags from estimator class and all its parent classes.

Returns:
collected_tagsdict

Dictionary of tag name and tag value pairs. Collected from _tags class attribute via nested inheritance. These are not overridden by dynamic tags set by set_tags or class __init__ calls.

get_fitted_params(deep=True)[source]

Get fitted parameters.

State required:

Requires state to be “fitted”.

Parameters:
deepbool, default=True

Whether to return fitted parameters of components.

  • If True, will return a dict of parameter name : value for this object, including fitted parameters of fittable components (= BaseAeonEstimator-valued parameters).

  • If False, will return a dict of parameter name : value for this object, but not include fitted parameters of components.

Returns:
fitted_paramsdict with str-valued keys

Dictionary of fitted parameters, paramname : paramvalue keys-value pairs include:

  • always: all fitted parameters of this object

  • if deep=True, also contains keys/value pairs of component parameters parameters of components are indexed as [componentname]__[paramname] all parameters of componentname appear as paramname with its value

  • if deep=True, also contains arbitrary levels of component recursion, e.g., [componentname]__[componentcomponentname]__[paramname], etc.

get_metadata_routing()[source]

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

get_tag(tag_name, tag_value_default=None, raise_error=True)[source]

Get tag value from estimator class.

Includes dynamic and overridden tags.

Parameters:
tag_namestr

Name of tag to be retrieved.

tag_value_defaultany type, default=None

Default/fallback value if tag is not found.

raise_errorbool

Whether a ValueError is raised when the tag is not found.

Returns:
tag_value

Value of the tag_name tag in self. If not found, returns an error if raise_error is True, otherwise it returns tag_value_default.

Raises:
ValueError

if raise_error is True and tag_name is not in self.get_tags().keys()

Examples

>>> from aeon.classification import DummyClassifier
>>> d = DummyClassifier()
>>> d.get_tag("capability:multivariate")
True
get_tags()[source]

Get tags from estimator.

Includes dynamic and overridden tags.

Returns:
collected_tagsdict

Dictionary of tag name and tag value pairs. Collected from _tags class attribute via nested inheritance and then any overridden and new tags from __init__ or set_tags.

classmethod get_test_params(parameter_set='default')[source]

Return testing parameter settings for the estimator.

Parameters:
parameter_setstr, default=”default”

Name of the set of test parameters to return, for use in tests. If no special parameters are defined for a value, will return “default” set.

Returns:
paramsdict or list of dict, default = {}

Parameters to create testing instances of the class. Each dict are parameters to construct an “interesting” test instance, i.e., MyClass(**params) or MyClass(**params[i]) creates a valid test instance. create_test_instance uses the first (or only) dictionary in params.

classmethod load_from_path(serial)[source]

Load object from file location.

Parameters:
serialobject

Result of ZipFile(path).open(“object).

Returns:
deserialized self resulting in output at path, of cls.save(path)
classmethod load_from_serial(serial)[source]

Load object from serialized memory container.

Parameters:
serialobject

First element of output of cls.save(None).

Returns:
deserialized self resulting in output serial, of cls.save(None).
predict(X, axis=1) ndarray[source]

Find anomalies in X.

Parameters:
Xone of aeon.base._base_series.VALID_INPUT_TYPES

The time series to fit the model to. A valid aeon time series data structure. See aeon.base._base_series.VALID_INPUT_TYPES for aeon supported types.

axisint, default=1

The time point axis of the input series if it is 2D. If axis==0, it is assumed each column is a time series and each row is a time point. i.e. the shape of the data is (n_timepoints, n_channels). axis==1 indicates the time series are in rows, i.e. the shape of the data is (n_channels, n_timepoints).

Returns:
np.ndarray

A boolean, int or float array of length len(X), where each element indicates whether the corresponding subsequence is anomalous or its anomaly score.

reset(keep=None)[source]

Reset the object to a clean post-init state.

After a self.reset() call, self is equal or similar in value to type(self)(**self.get_params(deep=False)), assuming no other attributes were kept using keep.

Detailed behaviour:
removes any object attributes, except:

hyper-parameters (arguments of __init__) object attributes containing double-underscores, i.e., the string “__”

runs __init__ with current values of hyperparameters (result of get_params)

Not affected by the reset are:

object attributes containing double-underscores class and object methods, class attributes any attributes specified in the keep argument

Parameters:
keepNone, str, or list of str, default=None

If None, all attributes are removed except hyper-parameters. If str, only the attribute with this name is kept. If list of str, only the attributes with these names are kept.

Returns:
self

Reference to self.

save(path=None)[source]

Save serialized self to bytes-like object or to (.zip) file.

Behaviour: if path is None, returns an in-memory serialized self if path is a file location, stores self at that location as a zip file

saved files are zip files with following contents: _metadata - contains class of self, i.e., type(self) _obj - serialized self. This class uses the default serialization (pickle).

Parameters:
pathNone or file location (str or Path).

if None, self is saved to an in-memory object if file location, self is saved to that file location. If:

path=”estimator” then a zip file estimator.zip will be made at cwd. path=”/home/stored/estimator” then a zip file estimator.zip will be stored in /home/stored/.

Returns:
if path is None - in-memory serialized self
if path is file location - ZipFile with reference to the file.
set_fit_request(*, axis: bool | None | str = '$UNCHANGED$') STRAY[source]

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
axisstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for axis parameter in fit.

Returns:
selfobject

The updated object.

set_params(**params)[source]

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

set_predict_request(*, axis: bool | None | str = '$UNCHANGED$') STRAY[source]

Request metadata passed to the predict method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to predict.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
axisstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for axis parameter in predict.

Returns:
selfobject

The updated object.

set_tags(**tag_dict)[source]

Set dynamic tags to given values.

Parameters:
**tag_dictdict

Dictionary of tag name and tag value pairs.

Returns:
self

Reference to self.