.ts
File Format¶
This document formalises the .ts
file format used by aeon
.
Encoded in utf-8
, a .ts
file stores a time-series dataset and its corresponding
metadata (specified via string identifiers). String identifiers refer to strings
beginning with @
in the file.
.ts
files contains information blocks in the following order:
A description block. Contains any number of continuous lines starting with
#
. Each#
is followed by an arbitrary (utf-8
) sequence of symbols. The.ts
specification does not prescribe any content for the description block, but it is common to include a description of the dataset contained in the file i.e. a full data dictionary, citations, etc.A metadata block. Contains continuous lines starting with
@
. Each@
is directly followed a string identifier without whitespace (@<identifier>
), followed by an appropriate value for the identifier where the value depends on type of identifier. There is no strict order of occurrence for all string identifiers, except@data
which must be at the end of this block. The number of lines in this block depends on certain properties of the dataset (e.g: if the dataset is multivariate, an additional line is required to specify number of channels)A dataset block. Contains a multiple collections of float values that represent the dataset. There are
n
cases each its own time series, delimited by new lines. The values for a series are expressed in a comma,
seperated list and the index of each value is relative to its position in said list (0, 1, …,m
). An instance may contain 1 tod
channels, where each channel for a case is delimited using a colon:
. In case timestamps are present, each value in a series is enclosed within round brackets i.e.(YYYY-MM-DD HH:mm:ss,<value>)
. The response variable is at the end of each case and is seperated via a colon.
Here is an extract from an examlple .ts
file that shows portions of all three
blocks:
#The data was generated from students wearing a smart watch.
#Consists of four classes, which are walking, resting, running and badminton.
...
@problemName BasicMotions
@timeStamps false
@missing false
...
@data
-0.740653,-0.740653,10.208449,2.867009:-0.194301,-0.194301,-0.249618,0.516079:Standing
-0.247409,-0.247409,-0.77129,-0.576154:-0.368484,-0.020851,-0.020851,-0.465607:Walking
...
For example files, see the files in aeon/datasets/data/
or the
tsml Zenodo community.
Metadata¶
aeon
’s core loader/writer functions relies on the existence of metadata to
correctly load data into memory.
It is also helpful to provide information about the dataset to a different user not
familiar with the dataset.
The format of individual string identifier is: @<identifier> [value]
,
except for @data
where there is no trailing information. There should be a new
line for each metadata entry.
Identifier |
Description |
Value |
Additional Comments |
Example |
---|---|---|---|---|
|
The name of the dataset. |
any |
Value cannot be space separated |
|
|
Whether timestamps are present. |
|
|
|
|
Whether there are missing values. |
|
|
|
|
Whether there is only one dimension for the time series. |
|
|
|
|
The number of channels. |
integer > 0 |
Only present when |
6 |
|
Whether all cases are equal length. |
|
|
|
|
Number of timepoints in each case. |
integer > 0 |
Only present if |
100 |
|
Whether there is a target label. |
|
Exclusive to regression data; |
|
|
Whether class labels are present. |
|
Exclusive to classification data; when |
|
|
Marks the beginning of data. |
- |
The data begins from the next line. |
- |