Loading data with unequal length series or missing valuesΒΆ
Some of the archive datasets have variable length series or missing values. Some algorithms can handle this type of data internally, but many cannot. You can find out estimator capabilities through the tags. For example, the ability to handle unequal length series internally is indicated by the tag capability:unequal_length. You can find out which estimators have this capability by using all_estimators.
[1]:
from aeon.utils.discovery import all_estimators
all_estimators(type_filter="classifier", tag_filter={"capability:unequal_length": True})
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
Cell In[1], line 1
----> 1 from aeon.utils.discovery import all_estimators
3 all_estimators(type_filter="classifier", tag_filter={"capability:unequal_length": True})
File C:\Code\aeon\aeon\utils\discovery.py:15
12 from sklearn.base import BaseEstimator
14 from aeon.base import BaseAeonEstimator
---> 15 from aeon.utils.base import BASE_CLASS_REGISTER
16 from aeon.utils.tags import ESTIMATOR_TAGS
17 from aeon.utils.tags._validate import check_tag_value
File C:\Code\aeon\aeon\utils\base\__init__.py:9
1 """Base class collections and utilities."""
3 __all__ = [
4 "BASE_CLASS_REGISTER",
5 "VALID_ESTIMATOR_BASES",
6 "get_identifier",
7 ]
----> 9 from aeon.utils.base._identifier import get_identifier
10 from aeon.utils.base._register import BASE_CLASS_REGISTER, VALID_ESTIMATOR_BASES
File C:\Code\aeon\aeon\utils\base\_identifier.py:9
6 from inspect import isclass
8 from aeon.base import BaseAeonEstimator
----> 9 from aeon.utils.base._register import BASE_CLASS_REGISTER
12 def get_identifier(estimator):
13 """Determine identifier string of an estimator.
14
15 Parameters
(...) 28 If no identifier can be determined for estimator
29 """
File C:\Code\aeon\aeon\utils\base\_register.py:19
13 __all__ = [
14 "BASE_CLASS_REGISTER",
15 "VALID_ESTIMATOR_BASES",
16 ]
18 from aeon.anomaly_detection.base import BaseAnomalyDetector
---> 19 from aeon.anomaly_detection.collection.base import BaseCollectionAnomalyDetector
20 from aeon.anomaly_detection.series.base import BaseSeriesAnomalyDetector
21 from aeon.base import BaseAeonEstimator, BaseCollectionEstimator, BaseSeriesEstimator
File C:\Code\aeon\aeon\anomaly_detection\collection\__init__.py:10
1 """Whole-series anomaly detection methods."""
3 __all__ = [
4 "BaseCollectionAnomalyDetector",
5 "ClassificationAdapter",
6 "OutlierDetectionAdapter",
7 "ROCKAD",
8 ]
---> 10 from aeon.anomaly_detection.collection._classification import ClassificationAdapter
11 from aeon.anomaly_detection.collection._outlier_detection import OutlierDetectionAdapter
12 from aeon.anomaly_detection.collection._rockad import ROCKAD
File C:\Code\aeon\aeon\anomaly_detection\collection\_classification.py:11
9 from aeon.anomaly_detection.collection.base import BaseCollectionAnomalyDetector
10 from aeon.base._base import _clone_estimator
---> 11 from aeon.classification.feature_based import SummaryClassifier
14 class ClassificationAdapter(BaseCollectionAnomalyDetector):
15 """
16 Basic classifier adapter for collection anomaly detection.
17
(...) 29 by `np.random`.
30 """
File C:\Code\aeon\aeon\classification\feature_based\__init__.py:17
7 __all__ = [
8 "Catch22Classifier",
9 "SignatureClassifier",
(...) 13 "TDMVDCClassifier",
14 ]
16 from aeon.classification.feature_based._catch22 import Catch22Classifier
---> 17 from aeon.classification.feature_based._fresh_prince import FreshPRINCEClassifier
18 from aeon.classification.feature_based._signature_classifier import SignatureClassifier
19 from aeon.classification.feature_based._summary import SummaryClassifier
File C:\Code\aeon\aeon\classification\feature_based\_fresh_prince.py:14
11 from sklearn.tree import DecisionTreeClassifier
13 from aeon.classification.base import BaseClassifier
---> 14 from aeon.classification.sklearn import RotationForestClassifier
15 from aeon.transformations.collection.feature_based import TSFresh
16 from aeon.utils.validation import check_n_jobs
File C:\Code\aeon\aeon\classification\sklearn\__init__.py:9
1 """Vector sklearn classifiers."""
3 __all__ = [
4 "RotationForestClassifier",
5 "ContinuousIntervalTree",
6 "SklearnClassifierWrapper",
7 ]
----> 9 from aeon.classification.sklearn._continuous_interval_tree import ContinuousIntervalTree
10 from aeon.classification.sklearn._rotation_forest_classifier import (
11 RotationForestClassifier,
12 )
13 from aeon.classification.sklearn._wrapper import SklearnClassifierWrapper
File C:\Code\aeon\aeon\classification\sklearn\_continuous_interval_tree.py:23
21 from sklearn.utils import check_random_state
22 from sklearn.utils.multiclass import check_classification_targets
---> 23 from sklearn.utils.validation import validate_data
26 class _TreeNode:
27 """ContinuousIntervalTree tree node."""
ImportError: cannot import name 'validate_data' from 'sklearn.utils.validation' (C:\Code\aeon\.venv\Lib\site-packages\sklearn\utils\validation.py)
Collections of unequal length series are stored as a list of 2D arrays. There are two unequal length example problems in aeon
[2]:
from aeon.datasets import load_japanese_vowels, load_pickup_gesture_wiimoteZ
j_vowels, j_labels = load_japanese_vowels()
p_vowels, p_labels = load_pickup_gesture_wiimoteZ()
print(type(j_vowels[0].shape), " ", type(p_vowels[0].shape))
print("shape first =", j_vowels[0].shape, "shape 11th =", j_vowels[10].shape)
<class 'tuple'> <class 'tuple'>
shape first = (12, 20) shape 11th = (12, 23)
The TSML archive TSC.com contains several unequal length series, including 11 from the UCR univariate archive and seven from the multivariate archive.
[3]:
from aeon.datasets.tsc_datasets import (
multivariate_unequal_length,
univariate_variable_length,
)
print(univariate_variable_length)
print(multivariate_unequal_length)
{'PickupGestureWiimoteZ', 'GestureMidAirD2', 'ShakeGestureWiimoteZ', 'PLAID', 'GesturePebbleZ1', 'AllGestureWiimoteZ', 'GestureMidAirD1', 'GestureMidAirD3', 'GesturePebbleZ2', 'AllGestureWiimoteY', 'AllGestureWiimoteX'}
{'AsphaltObstaclesCoordinates', 'SpokenArabicDigits', 'InsectWingbeat', 'CharacterTrajectories', 'JapaneseVowels', 'AsphaltPavementTypeCoordinates', 'AsphaltRegularityCoordinates'}
It is commonplace to preprocess variable length series prior to classification/regression/clustering. There are tools to do this in aeon directly. For example, you can pad series to the longest length or you can truncate them to the shortest length series in the collection if unequal length:
[4]:
from aeon.transformations.collection.unequal_length import Padder, Truncator
padder = Padder()
truncator = Truncator()
padded_j_vowels = padder.fit_transform(j_vowels)
truncated_j_vowels = truncator.fit_transform(j_vowels)
print(padded_j_vowels.shape, truncated_j_vowels.shape)
(640, 12, 29) (640, 12, 7)
There is not one best way of dealing with unequal length series. TSC has equal length version of all unequal length datasets and you can load these directly with load_classification and load_regression functions where the equalising operation is bespoke to the problem. For the classification problems, the data was padded with the series mean with low level Gaussian noise added. Loading equal length is the default behaviour
[14]:
from aeon.datasets import load_classification
j_equal, _ = load_classification("JapaneseVowels")
j_unequal, _ = load_classification("JapaneseVowels", load_equal_length=False)
print(type(j_equal))
print(j_equal.shape)
print(type(j_unequal))
<class 'numpy.ndarray'>
(640, 12, 25)
<class 'list'>
This is the case for both the classification and regression problems. When downloaded, it copies a zip file containing both versions.

Unequal length problems made equal length have a suffix _eq and those with missing values imputed have suffix _nmv. At the moment we do not have any problems with both missing and unequal length.
[ ]:
[ ]:
Generated using nbsphinx. The Jupyter notebook can be found here.