GreedyGaussianSegmentation#

class GreedyGaussianSegmentation(k_max: int = 10, lamb: float = 1.0, max_shuffles: int = 250, verbose: bool = False, random_state: = None)[source]#

Greedy Gaussian Segmentation Estimator.

The method approxmates solutions for the problem of breaking a multivariate time series into segments, where the data in each segment could be modeled as independent samples from a multivariate Gaussian distribution. It uses a dynamic programming search algorithm with a heuristic that allows finding approximate solution in linear time with respect to the data length and always yields locally optimal choice.

Greedy Gaussian Segmentation (GGS) fits a segmented gaussian model (SGM) to the data by computing the approximate solution to the combinatorial problem of finding the approximate covariance-regularized maximum log-likelihood for fixed number of change points and a reagularization strength. It follows an interative procedure where a new breakpoint is added and then adjusting all breakpoints to (approximately) maximize the objective. It is similar to the top-down search used in other change point detection problems.

Parameters:
k_max: int, default=10

Maximum number of change points to find. The number of segments is thus k+1.

lamb:float, default=1.0

Regularization parameter lambda (>= 0), which controls the amount of (inverse) covariance regularization, see Eq (1) in [1]. Regularization is introduced to reduce issues for high-dimensional problems. Setting `lamb` to zero will ignore regularization, whereas large values of lambda will favour simpler models.

max_shuffles: int, default=250

Maximum number of shuffles

verbose: bool, default=False

If `True` verbose output is enabled.

random_state: int or np.random.RandomState, default=None

Either random seed or an instance of `np.random.RandomState`

Attributes:
change_points_: array_like, default=[]

Locations of change points as integer indexes. By convention change points include the identity segmentation, i.e. first and last index + 1 values.

_intermediate_change_points: List[List[int]], default=[]

Intermediate values of change points for each value of k = 1…k_max

_intermediate_ll: List[float], default=[]

Intermediate values for log-likelihood for each value of k = 1…k_max

Notes

Based on the work from [1].

References

[1] (1,2)

Hallac, D., Nystrup, P. & Boyd, S., “Greedy Gaussian segmentation of multivariate time series.”, Adv Data Anal Classif 13, 727–751 (2019). https://doi.org/10.1007/s11634-018-0335-0

Methods

 Check if the estimator has been fitted. Obtain a clone of the object with same hyper-parameters. `clone_tags`(estimator[, tag_names]) clone/mirror tags from another estimator as dynamic override. `create_test_instance`([parameter_set]) Construct Estimator instance if possible. `create_test_instances_and_names`([parameter_set]) Create list of all test instances and a list of names for them. `fit`(X[, y]) Fit method for compatibility with sklearn-type estimator interface. `fit_predict`(X[, y]) Perform segmentation. `get_class_tag`(tag_name[, tag_value_default]) Get tag value from estimator class (only class tags). Get class tags from estimator class and all its parent classes. `get_fitted_params`([deep]) Get fitted parameters. Get parameter defaults for the object. Get parameter names for the object. `get_params`([deep]) Return initialization parameters. `get_tag`(tag_name[, tag_value_default, ...]) Get tag value from estimator class and dynamic tag overrides. Get tags from estimator class and dynamic tag overrides. `get_test_params`([parameter_set]) Return testing parameter settings for the estimator. Check if the object is composite. `load_from_path`(serial) Load object from file location. `load_from_serial`(serial) Load object from serialized memory container. `predict`(X[, y]) Perform segmentation. Reset the object to a clean post-init state. `save`([path]) Save serialized self to bytes-like object or to (.zip) file. `set_params`(**parameters) Set the parameters of this object. `set_tags`(**tag_dict) Set dynamic tags to given values.
fit(X: Union[_SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]], y: Optional[Union[_SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]]] = None)[source]#

Fit method for compatibility with sklearn-type estimator interface.

It sets the internal state of the estimator and returns the initialized instance.

Parameters:
X: array_like

2D array_like representing time series with sequence index along the first dimension and value series as columns.

y: array_like

Placeholder for compatibility with sklearn-api, not used, default=None.

predict(X: Union[_SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]], y: Optional[Union[_SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]]] = None) Union[_SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]][source]#

Perform segmentation.

Parameters:
X: array_like

2D array_like representing time series with sequence index along the first dimension and value series as columns.

y: array_like

Placeholder for compatibility with sklearn-api, not used, default=None.

Returns:
y_predarray_like

1D array with predicted segmentation of the same size as the first dimension of X. The numerical values represent distinct segments labels for each of the data points.

fit_predict(X: Union[_SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]], y: Optional[Union[_SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]]] = None) Union[_SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]][source]#

Perform segmentation.

Parameters:
X: array_like

2D array_like representing time series with sequence index along the first dimension and value series as columns.

y: array_like

Placeholder for compatibility with sklearn-api, not used, default=None.

Returns:
y_predarray_like

1D array with predicted segmentation of the same size as the first dimension of X. The numerical values represent distinct segments labels for each of the data points.

get_params(deep: bool = True) Dict[source]#

Return initialization parameters.

Parameters:
deep: bool

Dummy argument for compatibility with sklearn-api, not used.

Returns:
params: dict

Dictionary with the estimator’s initialization parameters, with keys being argument names and values being argument values.

set_params(**parameters)[source]#

Set the parameters of this object.

Parameters:
parametersdict

Initialization parameters for th estimator.

Returns:
selfreference to self (after parameters have been set)
check_is_fitted()[source]#

Check if the estimator has been fitted.

Raises:
NotFittedError

If the estimator has not been fitted yet.

clone()[source]#

Obtain a clone of the object with same hyper-parameters.

A clone is a different object without shared references, in post-init state. This function is equivalent to returning sklearn.clone of self. Equal in value to type(self)(**self.get_params(deep=False)).

Returns:
instance of type(self), clone of self (see above)
clone_tags(estimator, tag_names=None)[source]#

clone/mirror tags from another estimator as dynamic override.

Parameters:
estimatorestimator inheriting from :class:BaseEstimator
tag_namesstr or list of str, default = None

Names of tags to clone. If None then all tags in estimator are used as tag_names.

Returns:
Self

Reference to self.

Notes

Changes object state by setting tag values in tag_set from estimator as dynamic tags in self.

classmethod create_test_instance(parameter_set='default')[source]#

Construct Estimator instance if possible.

Parameters:
parameter_setstr, default=”default”

Name of the set of test parameters to return, for use in tests. If no special parameters are defined for a value, will return “default” set.

Returns:
instanceinstance of the class with default parameters

Notes

get_test_params can return dict or list of dict. This function takes first or single dict that get_test_params returns, and constructs the object with that.

classmethod create_test_instances_and_names(parameter_set='default')[source]#

Create list of all test instances and a list of names for them.

Parameters:
parameter_setstr, default=”default”

Name of the set of test parameters to return, for use in tests. If no special parameters are defined for a value, will return “default” set.

Returns:
objslist of instances of cls

i-th instance is cls(**cls.get_test_params()[i])

nameslist of str, same length as objs

i-th element is name of i-th instance of obj in tests convention is {cls.__name__}-{i} if more than one instance otherwise {cls.__name__}

parameter_setstr, default=”default”

Name of the set of test parameters to return, for use in tests. If no special parameters are defined for a value, will return “default” set.

classmethod get_class_tag(tag_name, tag_value_default=None)[source]#

Get tag value from estimator class (only class tags).

Parameters:
tag_namestr

Name of tag value.

tag_value_defaultany type

Default/fallback value if tag is not found.

Returns:
tag_value

Value of the tag_name tag in self. If not found, returns tag_value_default.

classmethod get_class_tags()[source]#

Get class tags from estimator class and all its parent classes.

Returns:
collected_tagsdict

Dictionary of tag name : tag value pairs. Collected from _tags class attribute via nested inheritance. NOT overridden by dynamic tags set by set_tags or mirror_tags.

get_fitted_params(deep=True)[source]#

Get fitted parameters.

State required:

Requires state to be “fitted”.

Parameters:
deepbool, default=True

Whether to return fitted parameters of components.

• If True, will return a dict of parameter name : value for this object, including fitted parameters of fittable components (= BaseEstimator-valued parameters).

• If False, will return a dict of parameter name : value for this object, but not include fitted parameters of components.

Returns:
fitted_paramsdict with str-valued keys

Dictionary of fitted parameters, paramname : paramvalue keys-value pairs include:

• always: all fitted parameters of this object, as via get_param_names values are fitted parameter value for that key, of this object

• if deep=True, also contains keys/value pairs of component parameters parameters of components are indexed as [componentname]__[paramname] all parameters of componentname appear as paramname with its value

• if deep=True, also contains arbitrary levels of component recursion, e.g., [componentname]__[componentcomponentname]__[paramname], etc

classmethod get_param_defaults()[source]#

Get parameter defaults for the object.

Returns:
default_dict: dict with str keys

keys are all parameters of cls that have a default defined in __init__ values are the defaults, as defined in __init__

classmethod get_param_names()[source]#

Get parameter names for the object.

Returns:
param_names: list of str, alphabetically sorted list of parameter names of cls
get_tag(tag_name, tag_value_default=None, raise_error=True)[source]#

Get tag value from estimator class and dynamic tag overrides.

Parameters:
tag_namestr

Name of tag to be retrieved

tag_value_defaultany type, optional; default=None

Default/fallback value if tag is not found

raise_errorbool

whether a ValueError is raised when the tag is not found

Returns:
tag_value

Value of the tag_name tag in self. If not found, returns an error if raise_error is True, otherwise it returns tag_value_default.

Raises:
ValueError if raise_error is True i.e. if tag_name is not in self.get_tags(
).keys()
get_tags()[source]#

Get tags from estimator class and dynamic tag overrides.

Returns:
collected_tagsdict

Dictionary of tag name : tag value pairs. Collected from _tags class attribute via nested inheritance and then any overrides and new tags from _tags_dynamic object attribute.

classmethod get_test_params(parameter_set='default')[source]#

Return testing parameter settings for the estimator.

Parameters:
parameter_setstr, default=”default”

Name of the set of test parameters to return, for use in tests. If no special parameters are defined for a value, will return “default” set.

Returns:
paramsdict or list of dict, default = {}

Parameters to create testing instances of the class Each dict are parameters to construct an “interesting” test instance, i.e., MyClass(**params) or MyClass(**params[i]) creates a valid test instance. create_test_instance uses the first (or only) dictionary in params

is_composite()[source]#

Check if the object is composite.

A composite object is an object which contains objects, as parameters. Called on an instance, since this may differ by instance.

Returns:
composite: bool, whether self contains a parameter which is BaseObject
property is_fitted[source]#

Whether fit has been called.

Load object from file location.

Parameters:
serialresult of ZipFile(path).open(“object)
Returns:
deserialized self resulting in output at path, of cls.save(path)

Load object from serialized memory container.

Parameters:
serial1st element of output of cls.save(None)
Returns:
deserialized self resulting in output serial, of cls.save(None)
reset()[source]#

Reset the object to a clean post-init state.

Equivalent to sklearn.clone but overwrites self. After self.reset() call, self is equal in value to type(self)(**self.get_params(deep=False))

Detail behaviour: removes any object attributes, except:

hyper-parameters = arguments of __init__ object attributes containing double-underscores, i.e., the string “__”

runs __init__ with current values of hyper-parameters (result of get_params)

Not affected by the reset are: object attributes containing double-underscores class and object methods, class attributes

save(path=None)[source]#

Save serialized self to bytes-like object or to (.zip) file.

Behaviour: if path is None, returns an in-memory serialized self if path is a file location, stores self at that location as a zip file

saved files are zip files with following contents: _metadata - contains class of self, i.e., type(self) _obj - serialized self. This class uses the default serialization (pickle).

Parameters:
pathNone or file location (str or Path)

if None, self is saved to an in-memory object if file location, self is saved to that file location. If:

path=”estimator” then a zip file estimator.zip will be made at cwd. path=”/home/stored/estimator” then a zip file estimator.zip will be stored in /home/stored/.

Returns:
if path is None - in-memory serialized self
if path is file location - ZipFile with reference to the file
set_tags(**tag_dict)[source]#

Set dynamic tags to given values.

Parameters:
tag_dictdict

Dictionary of tag name : tag value pairs.

Returns:
Self

Reference to self.

Notes

Changes object state by settting tag values in tag_dict as dynamic tags in self.