RotationForestRegressor¶

class RotationForestRegressor(n_estimators: int = 200, min_group: int = 3, max_group: int = 3, remove_proportion: float = 0.5, base_estimator: BaseEstimator | None = None, pca_solver: str = 'full', time_limit_in_minutes: float = 0.0, contract_max_n_estimators: int = 500, n_jobs: int = 1, random_state: int | RandomState | None = None)[source]¶

Bases: RegressorMixin, BaseEstimator

A Rotation Forest (RotF) vector regressor.

Implementation of the Rotation Forest regressor described in Rodriguez et al (2013) [1]. Builds a forest of trees build on random portions of the data transformed using PCA.

Intended as a benchmark for time series data and a base regressor for transformation based approaches such as FreshPRINCERegressor, this aeon implementation only works with continuous attributes.

Parameters:

n_estimatorsint, default=200: Number of estimators to build for the ensemble.
min_groupint, default=3: The minimum size of an attribute subsample group.
max_groupint, default=3: The maximum size of an attribute subsample group.
remove_proportionfloat, default=0.5: The proportion of cases to be removed per group.
base_estimatorBaseEstimator or None, default=”None”: Base estimator for the ensemble. By default, uses the sklearn DecisionTreeRegressor using MSE as a splitting measure.
pca_solverstr, default=”auto”: Solver to use for the PCA svd_solver parameter. See the scikit-learn PCA implementation for options.
time_limit_in_minutesint, default=0: Time contract to limit build time in minutes, overriding n_estimators. Default of 0 means n_estimators is used.
contract_max_n_estimatorsint, default=500: Max number of estimators to build when time_limit_in_minutes is set.
n_jobsint, default=1: The number of jobs to run in parallel for both fit and predict. -1 means using all processors.
random_stateint, RandomState instance or None, default=None: If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

Attributes:

n_cases_int: The number of train cases in the training set.
n_atts_int: The number of attributes in the training set.
estimators_list of shape (n_estimators) of BaseEstimator: The collections of estimators trained in fit.

Notes

Predictions may differ slightly between scikit-learn versions. In particular, scikit-learn 1.8 fixed decision-tree handling of almost constant features, which can change the fitted trees used by RotationForestRegressor and therefore its output, even when using the same random state.

References

[1]

Rodriguez, Juan José, Ludmila I. Kuncheva, and Carlos J. Alonso. “Rotation forest: A new classifier ensemble method.” IEEE transactions on pattern analysis and machine intelligence 28.10 (2006).

[2]

Bagnall, A., et al. “Is rotation forest the best classifier for problems with continuous features?.” arXiv preprint arXiv:1809.06705 (2018).

Examples

>>> from aeon.regression.sklearn import RotationForestRegressor
>>> from aeon.testing.data_generation import make_example_2d_numpy_collection
>>> X, y = make_example_2d_numpy_collection(n_cases=10, n_timepoints=12,
...                              regression_target=True, random_state=0)
>>> reg = RotationForestRegressor(n_estimators=10)
>>> reg.fit(X, y)
RotationForestRegressor(n_estimators=10)
>>> reg.predict(X)
array([0.7252543 , 1.50132442, 0.95608366, 1.64399016, 0.42385504,
       0.60639322, 1.01919317, 1.30157483, 1.66017354, 0.2900776 ])

Methods

`fit`(X, y)	Fit a forest of trees on cases (X,y), where y is the target variable.
`get_params`([deep])	Get parameters for this estimator.
`predict`(X)	Predict for all cases in X.
`score`(X, y[, sample_weight])	Return coefficient of determination on test data.
`set_params`(**params)	Set the parameters of this estimator.
`set_score_request`(*[, sample_weight])	Configure whether metadata should be requested to be passed to the `score` method.

fit_predict

fit(X, y)[source]¶

Fit a forest of trees on cases (X,y), where y is the target variable.

Parameters:

X2d ndarray or DataFrame of shape = [n_cases, n_attributes]: The training data.
yarray-like, shape = [n_cases]: The output values.

Returns:

self: Reference to self.

Notes

Changes state by creating a fitted model that updates attributes ending in “_”.

get_params(deep=True)¶

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

predict(X) → ndarray[source]¶

Predict for all cases in X.

Parameters:

X2d ndarray or DataFrame of shape = [n_cases, n_attributes]: The data to make predictions for.

Returns:

yarray-like, shape = [n_cases]: Predicted output values.

score(X, y, sample_weight=None)¶

Return coefficient of determination on test data.

The coefficient of determination, $R^2$, is defined as $(1 - \frac{u}{v})$, where $u$ is the residual sum of squares ((y_true - y_pred)** 2).sum() and $v$ is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a $R^2$ score of 0.0.

Parameters:

Xarray-like of shape (n_samples, n_features): Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape (n_samples, n_samples_fitted), where n_samples_fitted is the number of samples used in the fitting for the estimator.
yarray-like of shape (n_samples,) or (n_samples, n_outputs): True values for X.
sample_weightarray-like of shape (n_samples,), default=None: Sample weights.

Returns:

scorefloat: $R^2$ of self.predict(X) w.r.t. y.

Notes

The $R^2$ score used when calling score on a regressor uses multioutput='uniform_average' from version 0.23 to keep consistent with default value of r2_score. This influences the score method of all the multioutput regressors (except for MultiOutputRegressor).

set_params(**params)¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → RotationForestRegressor¶

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for sample_weight parameter in score.

Returns:

selfobject: The updated object.