STRAY¶
- class STRAY(alpha: float = 0.01, k: int = 10, knn_algorithm: str = 'brute', p: float = 0.5, size_threshold: int = 50, outlier_tail: str = 'max')[source]¶
STRAY: robust anomaly detection in data streams with concept drift.
This is based on STRAY (Search TRace AnomalY) [1], which is a modification of HDoutliers [2]. HDoutliers is a powerful algorithm for the detection of anomalous observations in a dataset, which has (among other advantages) the ability to detect clusters of outliers in multidimensional data without requiring a model of the typical behavior of the system. However, it suffers from some limitations that affect its accuracy. STRAY is an extension of HDoutliers that uses extreme value theory for the anomolous threshold calculation, to deal with data streams that exhibit non-stationary behavior.
¶ Input data format
univariate and multivariate
Output data format
binary classification
Learning Type
unsupervised
- Parameters:
- alphafloat, default=0.01
Threshold for determining the cutoff for outliers. Observations are considered outliers if they fall in the (1 - alpha) tail of the distribution of the nearest-neighbor distances between exemplars.
- kint, default=10
Number of neighbours considered.
- knn_algorithmstr {“auto”, “ball_tree”, “kd_tree”, “brute”}, optional
(default=”brute”) Algorithm used to compute the nearest neighbors, from sklearn.neighbors.NearestNeighbors
- pfloat, default=0.5
Proportion of possible candidates for outliers. This defines the starting point for the bottom up searching algorithm.
- size_thresholdint, default=50
Sample size to calculate an emperical threshold.
- outlier_tailstr {“min”, “max”}, default=”max”
Direction of the outlier tail.
References
[1]Talagala, Priyanga Dilini, Rob J. Hyndman, and Kate Smith-Miles. “Anomaly detection in high-dimensional data.” Journal of Computational and Graphical Statistics 30.2 (2021): 360-374.
[2]Wilkinson, Leland. “Visualizing big data outliers through distributed aggregation.” IEEE transactions on visualization and computer graphics 24.1 (2017): 256-266.
Examples
>>> from aeon.anomaly_detection import STRAY >>> from aeon.datasets import load_airline >>> from sklearn.preprocessing import MinMaxScaler >>> import numpy as np >>> X = load_airline().to_frame().to_numpy() >>> scaler = MinMaxScaler() >>> X = scaler.fit_transform(X) >>> detector = STRAY(k=3) >>> y = detector.fit_predict(X, axis=0) >>> y[:5] array([False, False, False, False, False])
Methods
Check if the estimator has been fitted.
clone
([random_state])Obtain a clone of the object with the same hyperparameters.
create_test_instance
([parameter_set, ...])Construct Estimator instance if possible.
fit
(X[, y, axis])Fit time series anomaly detector to X.
fit_predict
(X[, y, axis])Fit time series anomaly detector and find anomalies for X.
get_class_tag
(tag_name[, tag_value_default, ...])Get tag value from estimator class (only class tags).
Get class tags from estimator class and all its parent classes.
get_fitted_params
([deep])Get fitted parameters.
Get metadata routing of this object.
get_params
([deep])Get parameters for this estimator.
get_tag
(tag_name[, tag_value_default, ...])Get tag value from estimator class.
get_tags
()Get tags from estimator.
get_test_params
([parameter_set])Return testing parameter settings for the estimator.
load_from_path
(serial)Load object from file location.
load_from_serial
(serial)Load object from serialized memory container.
predict
(X[, axis])Find anomalies in X.
reset
([keep])Reset the object to a clean post-init state.
save
([path])Save serialized self to bytes-like object or to (.zip) file.
set_fit_request
(*[, axis])Request metadata passed to the
fit
method.set_params
(**params)Set the parameters of this estimator.
set_predict_request
(*[, axis])Request metadata passed to the
predict
method.set_tags
(**tag_dict)Set dynamic tags to given values.
- check_is_fitted()[source]¶
Check if the estimator has been fitted.
- Raises:
- NotFittedError
If the estimator has not been fitted yet.
- clone(random_state=None)[source]¶
Obtain a clone of the object with the same hyperparameters.
A clone is a different object without shared references, in post-init state. This function is equivalent to returning
sklearn.clone
of self. Equal in value totype(self)(**self.get_params(deep=False))
.- Parameters:
- random_stateint, RandomState instance, or None, default=None
Sets the random state of the clone. If None, the random state is not set. If int, random_state is the seed used by the random number generator. If RandomState instance, random_state is the random number generator.
- Returns:
- estimatorobject
Instance of
type(self)
, clone of self (see above)
- classmethod create_test_instance(parameter_set='default', return_first=True)[source]¶
Construct Estimator instance if possible.
Calls the get_test_params method and returns an instance or list of instances using the returned dict or list of dict.
- Parameters:
- parameter_setstr, default=”default”
Name of the set of test parameters to return, for use in tests. If no special parameters are defined for a value, will return “default” set.
- return_firstbool, default=True
If True, return the first instance of the list of instances. If False, return the list of instances.
- Returns:
- instanceBaseAeonEstimator or list of BaseAeonEstimator
Instance of the class with default parameters. If return_first is False, returns list of instances.
- fit(X, y=None, axis=1)[source]¶
Fit time series anomaly detector to X.
If the tag
fit_is_empty
is true, this just sets theis_fitted
tag to true. Otherwise, it checksself
can handleX
, formatsX
into the structure required byself
then passesX
(and possiblyy
) to_fit
.- Parameters:
- Xone of aeon.base._base_series.VALID_INPUT_TYPES
The time series to fit the model to. A valid aeon time series data structure. See aeon.base._base_series.VALID_INPUT_TYPES for aeon supported types.
- yone of aeon.base._base_series.VALID_INPUT_TYPES, default=None
The target values for the time series. A valid aeon time series data structure. See aeon.base._base_series.VALID_INPUT_TYPES for aeon supported types.
- axisint
The time point axis of the input series if it is 2D. If
axis==0
, it is assumed each column is a time series and each row is a time point. i.e. the shape of the data is(n_timepoints, n_channels)
.axis==1
indicates the time series are in rows, i.e. the shape of the data is(n_channels, n_timepoints)
.
- Returns:
- BaseAnomalyDetector
The fitted estimator, reference to self.
- fit_predict(X, y=None, axis=1) ndarray [source]¶
Fit time series anomaly detector and find anomalies for X.
- Parameters:
- Xone of aeon.base._base_series.VALID_INPUT_TYPES
The time series to fit the model to. A valid aeon time series data structure. See aeon.base._base_series.VALID_INPUT_TYPES for aeon supported types.
- yone of aeon.base._base_series.VALID_INPUT_TYPES, default=None
The target values for the time series. A valid aeon time series data structure. See aeon.base._base_series.VALID_INPUT_TYPES for aeon supported types.
- axisint, default=1
The time point axis of the input series if it is 2D. If
axis==0
, it is assumed each column is a time series and each row is a time point. i.e. the shape of the data is(n_timepoints, n_channels)
.axis==1
indicates the time series are in rows, i.e. the shape of the data is(n_channels, n_timepoints)
.
- Returns:
- np.ndarray
A boolean, int or float array of length len(X), where each element indicates whether the corresponding subsequence is anomalous or its anomaly score.
- classmethod get_class_tag(tag_name, tag_value_default=None, raise_error=False)[source]¶
Get tag value from estimator class (only class tags).
- Parameters:
- tag_namestr
Name of tag value.
- tag_value_defaultany type
Default/fallback value if tag is not found.
- raise_errorbool
Whether a ValueError is raised when the tag is not found.
- Returns:
- tag_value
Value of the
tag_name
tag in self. If not found, returns an error if raise_error is True, otherwise it returns tag_value_default.
- Raises:
- ValueError
if raise_error is
True
andtag_name
is not inself.get_tags().keys()
Examples
>>> from aeon.classification import DummyClassifier >>> DummyClassifier.get_class_tag("capability:multivariate") True
- classmethod get_class_tags()[source]¶
Get class tags from estimator class and all its parent classes.
- Returns:
- collected_tagsdict
Dictionary of tag name and tag value pairs. Collected from
_tags
class attribute via nested inheritance. These are not overridden by dynamic tags set byset_tags
or class__init__
calls.
- get_fitted_params(deep=True)[source]¶
Get fitted parameters.
- State required:
Requires state to be “fitted”.
- Parameters:
- deepbool, default=True
Whether to return fitted parameters of components.
If True, will return a dict of parameter name : value for this object, including fitted parameters of fittable components (= BaseAeonEstimator-valued parameters).
If False, will return a dict of parameter name : value for this object, but not include fitted parameters of components.
- Returns:
- fitted_paramsdict with str-valued keys
Dictionary of fitted parameters, paramname : paramvalue keys-value pairs include:
always: all fitted parameters of this object
if
deep=True
, also contains keys/value pairs of component parameters parameters of components are indexed as[componentname]__[paramname]
all parameters ofcomponentname
appear asparamname
with its valueif
deep=True
, also contains arbitrary levels of component recursion, e.g.,[componentname]__[componentcomponentname]__[paramname]
, etc.
- get_metadata_routing()[source]¶
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]¶
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- get_tag(tag_name, tag_value_default=None, raise_error=True)[source]¶
Get tag value from estimator class.
Includes dynamic and overridden tags.
- Parameters:
- tag_namestr
Name of tag to be retrieved.
- tag_value_defaultany type, default=None
Default/fallback value if tag is not found.
- raise_errorbool
Whether a ValueError is raised when the tag is not found.
- Returns:
- tag_value
Value of the
tag_name
tag in self. If not found, returns an error if raise_error is True, otherwise it returns tag_value_default.
- Raises:
- ValueError
if raise_error is
True
andtag_name
is not inself.get_tags().keys()
Examples
>>> from aeon.classification import DummyClassifier >>> d = DummyClassifier() >>> d.get_tag("capability:multivariate") True
- get_tags()[source]¶
Get tags from estimator.
Includes dynamic and overridden tags.
- Returns:
- collected_tagsdict
Dictionary of tag name and tag value pairs. Collected from
_tags
class attribute via nested inheritance and then any overridden and new tags from__init__
orset_tags
.
- classmethod get_test_params(parameter_set='default')[source]¶
Return testing parameter settings for the estimator.
- Parameters:
- parameter_setstr, default=”default”
Name of the set of test parameters to return, for use in tests. If no special parameters are defined for a value, will return “default” set.
- Returns:
- paramsdict or list of dict, default = {}
Parameters to create testing instances of the class. Each dict are parameters to construct an “interesting” test instance, i.e., MyClass(**params) or MyClass(**params[i]) creates a valid test instance. create_test_instance uses the first (or only) dictionary in params.
- classmethod load_from_path(serial)[source]¶
Load object from file location.
- Parameters:
- serialobject
Result of ZipFile(path).open(“object).
- Returns:
- deserialized self resulting in output at path, of cls.save(path)
- classmethod load_from_serial(serial)[source]¶
Load object from serialized memory container.
- Parameters:
- serialobject
First element of output of cls.save(None).
- Returns:
- deserialized self resulting in output serial, of cls.save(None).
- predict(X, axis=1) ndarray [source]¶
Find anomalies in X.
- Parameters:
- Xone of aeon.base._base_series.VALID_INPUT_TYPES
The time series to fit the model to. A valid aeon time series data structure. See aeon.base._base_series.VALID_INPUT_TYPES for aeon supported types.
- axisint, default=1
The time point axis of the input series if it is 2D. If
axis==0
, it is assumed each column is a time series and each row is a time point. i.e. the shape of the data is(n_timepoints, n_channels)
.axis==1
indicates the time series are in rows, i.e. the shape of the data is(n_channels, n_timepoints)
.
- Returns:
- np.ndarray
A boolean, int or float array of length len(X), where each element indicates whether the corresponding subsequence is anomalous or its anomaly score.
- reset(keep=None)[source]¶
Reset the object to a clean post-init state.
After a
self.reset()
call, self is equal or similar in value totype(self)(**self.get_params(deep=False))
, assuming no other attributes were kept usingkeep
.- Detailed behaviour:
- removes any object attributes, except:
hyper-parameters (arguments of
__init__
) object attributes containing double-underscores, i.e., the string “__”
runs
__init__
with current values of hyperparameters (result of get_params)- Not affected by the reset are:
object attributes containing double-underscores class and object methods, class attributes any attributes specified in the
keep
argument
- Parameters:
- keepNone, str, or list of str, default=None
If None, all attributes are removed except hyper-parameters. If str, only the attribute with this name is kept. If list of str, only the attributes with these names are kept.
- Returns:
- self
Reference to self.
- save(path=None)[source]¶
Save serialized self to bytes-like object or to (.zip) file.
Behaviour: if path is None, returns an in-memory serialized self if path is a file location, stores self at that location as a zip file
saved files are zip files with following contents: _metadata - contains class of self, i.e., type(self) _obj - serialized self. This class uses the default serialization (pickle).
- Parameters:
- pathNone or file location (str or Path).
if None, self is saved to an in-memory object if file location, self is saved to that file location. If:
path=”estimator” then a zip file estimator.zip will be made at cwd. path=”/home/stored/estimator” then a zip file estimator.zip will be stored in /home/stored/.
- Returns:
- if path is None - in-memory serialized self
- if path is file location - ZipFile with reference to the file.
- set_fit_request(*, axis: bool | None | str = '$UNCHANGED$') STRAY [source]¶
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
- axisstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
axis
parameter infit
.
- Returns:
- selfobject
The updated object.
- set_params(**params)[source]¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- set_predict_request(*, axis: bool | None | str = '$UNCHANGED$') STRAY [source]¶
Request metadata passed to the
predict
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed topredict
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it topredict
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
- axisstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
axis
parameter inpredict
.
- Returns:
- selfobject
The updated object.