Feature extraction with tsfresh transformer¶
In this tutorial, we show how you can use aeon with tsfresh to first extract features from time series, so that we can then use any scikit-learn estimator.
Preliminaries¶
You have to install tsfresh if you haven’t already. To install it, uncomment the cell below:
[1]:
# !pip install --upgrade tsfresh
[2]:
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline
from aeon.datasets import load_arrow_head, load_basic_motions
from aeon.transformations.collection.feature_based import TSFreshFeatureExtractor
Univariate time series classification data¶
For more details on the data set, see the univariate time series classification notebook.
[3]:
X, y = load_arrow_head(return_X_y=True, return_type="nested_univ")
X_train, X_test, y_train, y_test = train_test_split(X, y)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)
(158, 1) (158,) (53, 1) (53,)
[4]:
X_train.head()
[4]:
dim_0 | |
---|---|
69 | 0 -1.7998 1 -1.7987 2 -1.7942 3 ... |
103 | 0 -1.8091 1 -1.8067 2 -1.7866 3 ... |
34 | 0 -2.0417 1 -2.0572 2 -2.0522 3 ... |
14 | 0 -2.1888 1 -2.1855 2 -2.1765 3 ... |
121 | 0 -1.9586 1 -1.9371 2 -1.8798 3 ... |
[5]:
# binary classification task
np.unique(y_train)
[5]:
array(['0', '1', '2'], dtype=object)
Using tsfresh to extract features¶
[6]:
# tf = TsFreshTransformer()
t = TSFreshFeatureExtractor(default_fc_parameters="efficient", show_warnings=False)
Xt = t.fit_transform(X_train)
Xt.head()
/Users/mloning/Documents/Research/software/aeon/aeon/aeon/transformations/panel/tsfresh.py:164: UserWarning: tsfresh requires a unique index, but found non-unique. To avoid this warning, please make sure the index of X contains only unique values.
"tsfresh requires a unique index, but found "
Feature Extraction: 100%|██████████| 5/5 [00:10<00:00, 2.05s/it]
[6]:
dim_0__variance_larger_than_standard_deviation | dim_0__has_duplicate_max | dim_0__has_duplicate_min | dim_0__has_duplicate | dim_0__sum_values | dim_0__abs_energy | dim_0__mean_abs_change | dim_0__mean_change | dim_0__mean_second_derivative_central | dim_0__median | ... | dim_0__fourier_entropy__bins_2 | dim_0__fourier_entropy__bins_3 | dim_0__fourier_entropy__bins_5 | dim_0__fourier_entropy__bins_10 | dim_0__fourier_entropy__bins_100 | dim_0__permutation_entropy__dimension_3__tau_1 | dim_0__permutation_entropy__dimension_4__tau_1 | dim_0__permutation_entropy__dimension_5__tau_1 | dim_0__permutation_entropy__dimension_6__tau_1 | dim_0__permutation_entropy__dimension_7__tau_1 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.0 | 0.0 | 0.0 | 1.0 | -0.000080 | 249.998516 | 0.052357 | -0.000001 | -0.000005 | -0.024066 | ... | 0.046288 | 0.092513 | 0.092513 | 0.092513 | 0.250609 | 1.323194 | 1.819631 | 2.183824 | 2.463220 | 2.707387 |
1 | 0.0 | 0.0 | 1.0 | 1.0 | -0.000525 | 250.000756 | 0.049118 | 0.000000 | -0.000006 | -0.031622 | ... | 0.046288 | 0.046288 | 0.092513 | 0.092513 | 0.184769 | 1.213529 | 1.668744 | 2.081159 | 2.418614 | 2.707518 |
2 | 0.0 | 0.0 | 0.0 | 1.0 | -0.000034 | 249.998998 | 0.069971 | 0.000084 | 0.000025 | 0.018880 | ... | 0.081510 | 0.092513 | 0.092513 | 0.138673 | 0.311663 | 1.116706 | 1.545256 | 1.889777 | 2.155644 | 2.374722 |
3 | 0.0 | 0.0 | 0.0 | 1.0 | 0.000202 | 249.999702 | 0.067601 | -0.000002 | -0.000010 | 0.384770 | ... | 0.046288 | 0.092513 | 0.092513 | 0.204643 | 0.414263 | 1.323315 | 1.915330 | 2.406197 | 2.794719 | 3.117007 |
4 | 0.0 | 0.0 | 0.0 | 1.0 | -0.000146 | 249.998674 | 0.050355 | -0.000004 | -0.000046 | -0.045353 | ... | 0.046288 | 0.092513 | 0.092513 | 0.092513 | 0.230801 | 1.173933 | 1.628543 | 2.003443 | 2.303091 | 2.559695 |
5 rows × 773 columns
Using tsfresh with aeon¶
[7]:
classifier = make_pipeline(
TSFreshFeatureExtractor(default_fc_parameters="efficient", show_warnings=False),
RandomForestClassifier(),
)
classifier.fit(X_train, y_train)
classifier.score(X_test, y_test)
/Users/mloning/Documents/Research/software/aeon/aeon/aeon/transformations/panel/tsfresh.py:164: UserWarning: tsfresh requires a unique index, but found non-unique. To avoid this warning, please make sure the index of X contains only unique values.
"tsfresh requires a unique index, but found "
Feature Extraction: 100%|██████████| 5/5 [00:11<00:00, 2.21s/it]
/Users/mloning/Documents/Research/software/aeon/aeon/aeon/transformations/panel/tsfresh.py:164: UserWarning: tsfresh requires a unique index, but found non-unique. To avoid this warning, please make sure the index of X contains only unique values.
"tsfresh requires a unique index, but found "
Feature Extraction: 100%|██████████| 5/5 [00:03<00:00, 1.45it/s]
[7]:
0.8490566037735849
Multivariate time series classification data¶
[8]:
X, y = load_basic_motions(return_X_y=True, return_type="nested_univ")
X_train, X_test, y_train, y_test = train_test_split(X, y)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)
(60, 6) (60,) (20, 6) (20,)
[9]:
# multivariate input data
X_train.head()
[9]:
dim_0 | dim_1 | dim_2 | dim_3 | dim_4 | dim_5 | |
---|---|---|---|---|---|---|
20 | 0 -0.294498 1 -0.294498 2 -0.050044 3... | 0 0.540218 1 0.540218 2 -0.515245 3... | 0 0.218114 1 0.218114 2 -0.301108 3... | 0 -0.045277 1 -0.045277 2 0.103872 3... | 0 -0.002663 1 -0.002663 2 -0.183773 3... | 0 0.031960 1 0.031960 2 0.037287 3... |
26 | 0 -0.761604 1 -0.761604 2 0.121078 3... | 0 0.260125 1 0.260125 2 -1.423255 3... | 0 -0.064487 1 -0.064487 2 0.075600 3... | 0 0.069248 1 0.069248 2 -0.282318 3... | 0 0.242367 1 0.242367 2 -0.332922 3... | 0 -0.007990 1 -0.007990 2 0.239704 3... |
7 | 0 -0.352746 1 -0.352746 2 -1.354561 3... | 0 0.316845 1 0.316845 2 0.490525 3... | 0 -0.473779 1 -0.473779 2 1.454261 3... | 0 -0.327595 1 -0.327595 2 -0.269001 3... | 0 0.106535 1 0.106535 2 0.021307 3... | 0 0.197090 1 0.197090 2 0.460763 3... |
8 | 0 -0.342233 1 -0.342233 2 -0.298542 3... | 0 0.327415 1 0.327415 2 -0.527154 3... | 0 0.157229 1 0.157229 2 0.248585 3... | 0 0.394179 1 0.394179 2 -0.037287 3... | 0 0.074574 1 0.074574 2 -0.087891 3... | 0 -0.037287 1 -0.037287 2 -0.050604 3... |
10 | 0 0.206148 1 0.206148 2 6.53436... | 0 -0.658294 1 -0.658294 2 4.597327 3... | 0 0.469612 1 0.469612 2 -2.723661 3... | 0 -0.106535 1 -0.106535 2 -0.439456 3... | 0 0.306288 1 0.306288 2 1.717875 3... | 0 0.950824 1 0.950824 2 -1.041379 3... |
[10]:
t = TSFreshFeatureExtractor(default_fc_parameters="efficient", show_warnings=False)
Xt = t.fit_transform(X_train)
Xt.head()
/Users/mloning/Documents/Research/software/aeon/aeon/aeon/transformations/panel/tsfresh.py:164: UserWarning: tsfresh requires a unique index, but found non-unique. To avoid this warning, please make sure the index of X contains only unique values.
"tsfresh requires a unique index, but found "
Feature Extraction: 100%|██████████| 5/5 [00:18<00:00, 3.69s/it]
[10]:
dim_0__variance_larger_than_standard_deviation | dim_0__has_duplicate_max | dim_0__has_duplicate_min | dim_0__has_duplicate | dim_0__sum_values | dim_0__abs_energy | dim_0__mean_abs_change | dim_0__mean_change | dim_0__mean_second_derivative_central | dim_0__median | ... | dim_5__fourier_entropy__bins_2 | dim_5__fourier_entropy__bins_3 | dim_5__fourier_entropy__bins_5 | dim_5__fourier_entropy__bins_10 | dim_5__fourier_entropy__bins_100 | dim_5__permutation_entropy__dimension_3__tau_1 | dim_5__permutation_entropy__dimension_4__tau_1 | dim_5__permutation_entropy__dimension_5__tau_1 | dim_5__permutation_entropy__dimension_6__tau_1 | dim_5__permutation_entropy__dimension_7__tau_1 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.0 | 0.0 | 0.0 | 1.0 | 33.334188 | 110.735119 | 0.822452 | 0.000639 | 0.001751 | 0.164096 | ... | 0.165443 | 0.165443 | 0.165443 | 0.192626 | 0.545824 | 1.279774 | 1.910772 | 2.565051 | 3.096812 | 3.567632 |
1 | 1.0 | 0.0 | 0.0 | 1.0 | 73.888480 | 220.949429 | 0.964075 | -0.002087 | -0.003908 | 0.613719 | ... | 0.096509 | 0.096509 | 0.261160 | 0.261160 | 0.451359 | 1.313299 | 1.987599 | 2.593635 | 3.173890 | 3.696247 |
2 | 0.0 | 0.0 | 0.0 | 1.0 | -17.428760 | 7.940863 | 0.170422 | 0.002326 | -0.000244 | -0.152038 | ... | 0.223718 | 0.261160 | 0.356468 | 0.545824 | 1.821690 | 1.438857 | 2.291659 | 3.140440 | 3.819994 | 4.207710 |
3 | 0.0 | 0.0 | 0.0 | 1.0 | -18.154841 | 5.568890 | 0.135705 | 0.001051 | 0.000688 | -0.196623 | ... | 0.399949 | 0.705356 | 1.127853 | 1.742820 | 3.274497 | 1.683010 | 2.766048 | 3.748502 | 4.303872 | 4.449241 |
4 | 1.0 | 0.0 | 0.0 | 1.0 | 395.985445 | 11192.658970 | 6.583700 | 0.099344 | 0.000000 | 8.608970 | ... | 0.165443 | 0.165443 | 0.165443 | 0.165443 | 0.706253 | 1.483926 | 2.279149 | 3.014130 | 3.525453 | 3.919983 |
5 rows × 4638 columns
Using tsfresh for forecasting¶
You can also use tsfresh to do univariate forecasting. To find out more about forecasting, check out our forecasting tutorial notebook.
[11]:
from sklearn.ensemble import RandomForestRegressor
from aeon.datasets import load_airline
from aeon.forecasting.base import ForecastingHorizon
from aeon.forecasting.compose import make_reduction
from aeon.forecasting.model_selection import temporal_train_test_split
y = load_airline()
y_train, y_test = temporal_train_test_split(y)
regressor = make_pipeline(
TSFreshFeatureExtractor(show_warnings=False, disable_progressbar=True),
RandomForestRegressor(),
)
forecaster = make_reduction(
regressor, scitype="time-series-regressor", window_length=12
)
forecaster.fit(y_train)
fh = ForecastingHorizon(y_test.index, is_relative=False)
y_pred = forecaster.predict(fh)
Generated using nbsphinx. The Jupyter notebook can be found here.