load_regression

load_regression(name: str, split=None, extract_path=None, return_metadata: bool = False, load_equal_length: bool = True, load_no_missing: bool = True)[source]

Download/load regression problem.

Download from either https://timeseriesclassification.com or, if that fails, http://tseregression.org/.

If you want to load a problem from a local file, specify the location in extract_path. This function assumes the data is stored in format <extract_path>/<name>/<name>_TRAIN.ts and <extract_path>/<name>/<name>_TEST.ts. If you want to load a file directly from a full path, use the function load_from_tsfile` directly. If you do not specify extract_path, or if the problem is not present in extract_path it will attempt to download the data from https://timeseriesclassification.com or, if that fails, http://tseregression.org/.

The list of problems this function can download from the website is in datasets/tser_lists.py called tser_soton. This function can load timestamped data, but it does not store the time stamps. The time stamp loading is fragile, it will only work if all data are floats.

Data is assumed to be in the standard .ts format: each row is a (possibly multivariate) time series. Each dimension is separated by a colon, each value in a series is comma separated. For an example TSER problem see aeon.datasets.data.Covid3Month. Some of the original problems are unequal length and have missing values. By default, this function loads equal length no missing value versions of the files that have been used in experimental studies. These have suffixes _eq or _nmv after the name. If you want to load a different version, set the flags load_equal_length and/or load_no_missing to true. If present, the function will then load these versions if it can. aeon supports loading series with missing values and or unequal length between series, but it does not support loading multivariate series where lengths differ between channels. The original PGDALIA is in this format. The data PGDALIA_eq has length normalised series. If a problem has unequal length series and missing values, it is assumed to be of the form <name>_eq_nmv_TRAIN.ts and <name>_eq_nmv_TEST.ts. There are currently no problems in the archive with missing and unequal length.

Parameters:
namestring

Name of the problem to load or download.

extract_pathNone or str, default = None

Path of the location for the data file. If None, data is written to os.path.dirname(__file__)/local_data/<name>/.

splitNone or str{“train”, “test”}, default=None

Whether to load the train or test partition of the problem. By default it loads both into a single dataset, otherwise it looks only for files of the format <name>_TRAIN.ts or <name>_TEST.ts.

return_metadataboolean, default = False

If True, returns a tuple (X, y, metadata)

load_equal_lengthboolean, default=True

This is for the case when the standard release has unequal length series. The downloaded zip for these contain a version made equal length through truncation. These versions all have the suffix _eq after the name. If this flag is set to True, the function first attempts to load files called <name>_eq_TRAIN.ts/TEST.ts. If these are not present, it will load the normal version.

load_no_missingboolean, default=True

This is for the case when the standard release has missing values. The downloaded zip for these contain a version with imputed missing values. These versions all have the suffix _nmv after the name. If this flag is set to True, the function first attempts to load files called <name>_nmv_TRAIN.ts/TEST.ts. If these are not present, it will load the normal version.

Returns:
X: np.ndarray or list of np.ndarray
y: numpy array

The target response variable for each case in X

metadata: optional

returns the following meta data ‘problemname’,timestamps, missing,univariate,equallength. targetlabel should be true, and classlabel false