Mentoring and Projects#

aeon runs a range of short projects interacting with the community and the code base. These projects are designed for internships, usage as part of undergraduate/postgraduate projects at academic institutions, and as options for programs such as Google Summer of Code (GSoC).

For those interested in undertaking a project outside these scenarios, we recommend joining the Slack and discussing with the project mentors. We aim to run schemes to help new contributors to become more familiar with aeon, time series machine learning research, and open-source software development.

All the projects listed will require knowledge of Python 3 and Git/GitHub. The majority of them will require some knowledge of machine learning and time series.

Current aeon projects#

This is a list of some of the projects we are interested in running in 2024. Feel free to propose your own project ideas, but please discuss them with us first. We have an active community of researchers and students who work on aeon. Please get in touch via Slack if you are interested in any of these projects or have any questions.

We will more widely advertise funding opportunities as and when they become available.

Forecasting#

1. Machine Learning for Time Series Forecasting#

Mentors: Tony Bagnall (@TonyBagnall) and TBC.

Description#

This project will investigate algorithms for forecasting based on traditional machine learning (tree based) and time series machine learning (transformation based). Note this project will not involve deep learning based forecasting. It will involve helping develop the aeon framework to work more transparently with ML algorithms, evaluating regression algorithms already in aeon[1] for forecasting problems and implementing at least one algorithm from the literature not already in aeon, such as SETAR-Tree [3].

Project Stages#

Learn about aeon best practices, coding standards and testing policies.
Work through existing forecasting workflow and experimental reproduction.
Adapt the M competition set up [2] for ML experimental framework to assess time series regression algorithms [1].
Implement a machine learning forecasting algorithm [3]

Expected Outcomes#

Contributions to the aeon forecasting module.
Implementation of a machine learning forecasting algortihms.
Help write up results for a technical report/academic paper (depending on outcomes).

Skills Required#

Python 3
Git and GitHub
Some machine learning and/or forecasting background (e.g. taught courses or practical experience)

References#

Guijo-Rubio, D.,Middlehurst, M., Arcencio, G., Furtado, D. and Bagnall, A. Unsupervised Feature Based Algorithms for Time Series Extrinsic Regression, arXiv2305.01429, 2023
https://forecasters.org/resources/time-series-data/
Godahewa, R., Webb, G.I., Schmidt, D. et al. SETAR-Tree: a novel and accurate tree algorithm for global time series forecasting. Mach Learn 112, 2555–2591 (2023). https://link.springer.com/article/10.1007/s10994-023-06316-x

2. Deep Learning for Time Series Forecasting#

Mentors: Ali Ismail-Fawaz (@hadifawaz1999)

Description#

Implement and evaluate some models from the literature, maybe benchmark them as well to non-deep models

Project Stages#

TBC

Expected Outcomes#

TBC

References#

TBC

Classification#

1. Optimizing the Shapelet Transform for Classification and Similarity Search#

Mentors : Antoine Guillaume (@baraline) and Tony Bagnall (@TonyBagnall)

Description#

A shapelet is defined as a time series subsequence representing a pattern of interest that we wish to search for in time series data. Shapelet-based algorithms can be used for a wide range of time series tasks. In this project, we will focus on its core application, which is to create an embedding of the input time series.

Our goal in this project will be to optimize the code related to the shapelet transform method, which takes as input a set of shapelets and a time series dataset, and give as output a tabular dataset containing the features characterizing the presence (or absence) of each shapelet in the input time series (more information in [1] and [2]).

Similarity search is another field of time series, which has proposed greatly optimized algorithms (see [3] and [4]) for the task of finding the best matches of a subsequence inside another time series. As this task is extremely similar to what is done in the shapelet transform, we want to adapt these algorithms to the context of shapelets, in order to achieve significant speed-ups.

Project stages#

To achieve this goal, with the assistance of the mentor, we identify the following steps for the mentee:

Learn about aeon best practices, coding standards and testing policies.
Study the shapelet transform algorithm and how it is related to the task of similarity search.
Study the similarity search algorithms for the Euclidean distance and the computational optimization they use.
Propose a candidate implementation for to increase the performance of the computations made by a single shapelet. This can be made with the help of the existing implementation of the similarity search module in aeon.
Measure the performance of this first candidate implementation against the current approach.
Implement this solution to the shapelet transform algorithm, which uses multiple shapelets.
Benchmark the implementation against the original shapelet transform algorithm.
If time, generalize this new algorithm to the case of dilated shapelets (see [5]).

Expected Outcomes#

We expect the mentee engage with the aeon community and produce a performance games for the We Based on the benchmark of the different implementations, we will evaluate the performance gains of the new shapelet transform and the success of this project.

References#

Hills, J., Lines, J., Baranauskas, E., Mapp, J. and Bagnall, A., 2014. Classification of time series by shapelet transformation. Data mining and knowledge discovery, 28, pp.851-881.
Bostrom, A. and Bagnall, A., 2017. Binary shapelet transform for multiclass time series classification. Transactions on Large-Scale Data-and Knowledge-Centered Systems XXXII: Special Issue on Big Data Analytics and Knowledge Discovery, pp.24-46.
Yeh, C.C.M., Zhu, Y., Ulanova, L., Begum, N., Ding, Y., Dau, H.A., Silva, D.F., Mueen, A. and Keogh, E., 2016, December. Matrix profile I: all pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets. In 2016 IEEE 16th international conference on data mining (ICDM) (pp. 1317-1322). Ieee.
Zhu, Y., Zimmerman, Z., Senobari, N.S., Yeh, C.C.M., Funning, G., Mueen, A., Brisk, P. and Keogh, E., 2016, December. Matrix profile ii: Exploiting a novel algorithm and gpus to break the one hundred million barrier for time series motifs and joins. In 2016 IEEE 16th international conference on data mining (ICDM) (pp. 739-748). IEEE.
Guillaume, A., Vrain, C. and Elloumi, W., 2022, June. Random dilated shapelet transform: A new approach for time series shapelets. In International Conference on Pattern Recognition and Artificial Intelligence (pp. 653-664). Cham: Springer International Publishing.

2. EEG classification with aeon-neuro#

Mentors: Tony Bagnall (@TonyBagnall) and Aiden Rushbrooke

Description#

EEG (Electroencephalogram) data are high dimensional time series that are used in medical, psychology and brain computer interface research. For example, EEG are used to detect epilepsy and to control decvices such as mice. There is a huge body of work on analysing and learning from EEG, but there is a wide disparity of tools, practices and systems used. This project will help members of the aeon team who are currently researching techniques for EEG classification [1] and developing an aeon sister toolkit, aeon-neuro [LINK]. We will work together to improve the structure and documentation for aeon-neuro, help integrate the toolkit with existing EEG toolkits such as NM [2], provide interfaces to standard data formats such as BIDS [3] and help develop and assess a range of EEG classification algorithms.

Project stages#

Learn about aeon best practices, coding standards and testing policies.
Study the existing techniques for EEG classification.
Implement or wrap standard EEG processing algorithms.
Evaluate aeon classifiers for EEG problems.
Implement alternatives transformations for preprocessing EEG data.
Help write up results for a technical report/academic paper (depending on outcomes).

Expected Outcomes#

We would expect a better documented and more integrated aeon-neuro toolkit with better functionality and a wider appeal.

References#

Aiden Rushbrooke, Jordan Tsigarides, Saber Sami, Anthony Bagnall, Time Series Classification of Electroencephalography Data, IWANN 2023.
MNE Toolkit, https://mne.tools/stable/index.html
The Brain Imaging Data Structure (BIDS) standard, https://bids.neuroimaging.io/

3. Improved Proximity Forest for classification#

Mentors: Matthew Middlehurst (@MatthewMiddlehurst) and Tony Bagnall (@TonyBagnall)

Description#

Distance-based classifiers such as k-Nearest Neighbours are popular approaches to time series classification. They primarily use elastic distance measures such as Dynamic Time Warping (DTW) to compare two series. The Proximity Forest algorithm [1] is a distance-based classifier for time series. The classifier creates a forest of decision trees, where the tree splits are based on the distance between time series using various distance measures. A recent review of time series classification algorithms [2] found that Proximity Forest was the most accurate distance-based algorithm of those compared.

aeon previously had an implementation of the Proximity Forest algorithm, but it was not as accurate as the original implementation (the one used in the study) and was unstable on benchmark datasets. The goal of this project is to significantly overhaul the previous implementation or completely re-implement Proximity Forest in aeon to match the accuracy of the original algorithm. This will involve comparing against the authors’ Java implementation of the algorithm as well as alternate Python versions. The mentors will provide results for both for alternative methods. While knowing Java is not a requirement for this project, it could be beneficial.

Recently, the group which published the algorithm has proposed a new version of the Proximity Forest algorithm, Proximity Forest 2.0 [3]. This algorithm is more accurate than the original Proximity Forest algorithm, and does not currently have an implementation in aeon or elsewhere in Python. If time allows, the project could also involve implementing and evaluating the Proximity Forest 2.0 algorithm.

Project stages#

Learn about aeon best practices, coding standards and testing policies.
Study the Proximity Forest algorithm and previous aeon implementation.
Improve/re-implement the Proximity Forest implementation in aeon, with the aim being to have an implementation that is as accurate as the original algorithm, while remaining feasible to run.
Evaluate the improved implementation against the original aeon Proximity Forest and the authors’ Java implementation.
If time, implement the Proximity Forest 2.0 algorithm and repeat the above evaluation.

Expected Outcomes#

We expect the mentee engage with the aeon community and produce a high quality implementation of the Proximity Forest algorithm(s) that gets accepted into the toolkit.

References#

Lucas, B., Shifaz, A., Pelletier, C., O’Neill, L., Zaidi, N., Goethals, B., Petitjean, F. and Webb, G.I., 2019. Proximity forest: an effective and scalable distance-based classifier for time series. Data Mining and Knowledge Discovery, 33(3), pp.607-635.
Middlehurst, M., Schäfer, P. and Bagnall, A., 2023. Bake off redux: a review and experimental evaluation of recent time series classification algorithms. arXiv preprint arXiv:2304.13029.
Herrmann, M., Tan, C.W., Salehi, M. and Webb, G.I., 2023. Proximity Forest 2.0: A new effective and scalable similarity-based classifier for time series. arXiv preprint arXiv:2304.05800.

Clustering#

1. Feature based or deep learning based algorithms#

Mentors: Tony Bagnall (@TonyBagnall), Ali Ismail-Fawaz (@hadifawaz1999) and @Chris?

Description#

Implement and evaluate some of the recently proposed clustering algorithms

The clustering module in aeon, up until now, primarily consists of distance-based algorithms like K-Means, K-Medoids, and Clara, among others. Recently, we introduced an initial deep clustering module featuring an FCN auto-encoder, incorporating distance-based algorithms in the latent space. However, there is currently a shortage of feature-based clustering algorithms.

The objective of this project is to enhance aeon by incorporating more deep learning approaches for time series clustering. This involves adapting the FCN auto-encoder to leverage the ResNet model. Additionally, the project aims to integrate feature-based algorithms for time series clustering into the system.

Project Stages#

TBC

Expected Outcomes#

TBC

References#

Lafabregue, B., Weber, J., Gançarski, P. and Forestier, G., 2022. End-to-end deep representation learning for time series clustering: a comparative study. Data Mining and Knowledge Discovery, 36(1), pp.29-81.

Anomaly detection#

1. Anomaly detection with the Matrix Profile and MERLIN#

Mentors: Matthew Middlehurst (@MatthewMiddlehurst)

Description#

aeon is looking to extend its module for time series anomaly detection. The end goal of this project is to implement the Matrix Profile [1][2] and MERLIN [3] algorithms, but suitable framework for anomaly detection in aeon will need to be designed first. The mentee will help design the API for the anomaly detection module and implement the Matrix Profile and MERLIN algorithms.

Usage of external libraries such as stumpy [4] is possible for the algorithm implementations, or the mentee can implement the algorithms from scratch using numba. There is also scope to benchmark the implementations, but as there is no existing anomaly detection module in aeon, this will require some infrastructure to be developed and is subject to time and interest.

Project stages#

Learn about aeon best practices, coding standards and testing policies.
Familiarise yourself with similar single series experimental modules in aeon such as segmentation and similarity search.
Help design the API for the anomaly detection module.
Study and implement the Matrix Profile for anomaly detection and MERLIN algorithms using the new API.
If time allows and there is interest, benchmark the implementations against the original implementations or other anomaly detection algorithms.

Project Outcome#

As the anomaly detection is a new module in aeon, there is very little existing code to compare against and little infrastructure to evluate anomaly detection algorithms. The success of the project will be evaluated by the quality of the code produced and engagement with the project and the aeon community.

References#

Yeh, C.C.M., Zhu, Y., Ulanova, L., Begum, N., Ding, Y., Dau, H.A., Silva, D.F., Mueen, A. and Keogh, E., 2016, December. Matrix profile I: all pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets. In 2016 IEEE 16th international conference on data mining (ICDM) (pp. 1317-1322). Ieee.
Lu, Y., Wu, R., Mueen, A., Zuluaga, M.A. and Keogh, E., 2022, August. Matrix profile XXIV: scaling time series anomaly detection to trillions of datapoints and ultra-fast arriving data streams. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (pp. 1173-1182).
Nakamura, T., Imamura, M., Mercer, R. and Keogh, E., 2020, November. Merlin: Parameter-free discovery of arbitrary length anomalies in massive time series archives. In 2020 IEEE international conference on data mining (ICDM) (pp. 1190-1195). IEEE.
Law, S.M., 2019. STUMPY: A powerful and scalable Python library for time series data mining. Journal of Open Source Software, 4(39), p.1504.

Segmentation#

1. Time series segmentation#

Mentors: Tony Bagnall (@TonyBagnall) and TBC

Description#

The time series segmentation module contains a range of algorithms for segmenting time series. The goal of this project is to extend the functionality of segmentation in aeon and develop tools for comparing segmentation algorithms.

Project stages#

Learn about aeon best practices, coding standards and testing policies.
Study the existing segmentation algorithms in aeon.
Implement existing segmentation algorithms, e.g. https://github.com/aeon-toolkit/aeon/issues/948
Implement tools for comparing segmentation algorithms
Conduct a bake off of segmentation algorithms on a range of datasets.

Project Outcome#

As with all research programming based projects, progress can be hindered by many unforseen circumstances. Success will be measured by engagement, effort and willingness to join the community rather than performance of the algorithms.

References#

Allegra, M., Facco, E., Denti, F., Laio, A. and Mira, A., 2020. Data segmentation based on the local intrinsic dimension. Scientific Reports, 10(1), p.16449.
Ermshaus, A., Schäfer, P. and Leser, U., 2023. ClaSP: parameter-free time series segmentation. Data Mining and Knowledge Discovery, 37(3), pp.1262-1300.
Hallac, D., Nystrup, P. and Boyd, S., 2019. Greedy Gaussian segmentation of multivariate time series. Advances in Data Analysis and Classification, 13(3), pp.727-751.
Matteson, D.S. and James, N.A., 2014. A nonparametric approach for multiple change point analysis of multivariate data. Journal of the American Statistical Association, 109(505), pp.334-345.
Sadri, A., Ren, Y. and Salim, F.D., 2017. Information gain-based metric for recognizing transitions in human activities. Pervasive and Mobile Computing, 38, pp.92-109.

Transformation#

1. Improve ROCKET transformers#

Mentors: Ali Ismail-Fawaz (@hadifawaz1999) and Matthew Middlehurst (@MatthewMiddlehurst)

Description#

The ROCKET algorithm [1] is a very fast and accurate transformation designed for time series classification. It is based on a randomly initialised convolutional kernels that are applied to the time series and used to extract summary statistics. ROCKET has applications to time series classification, extrinsic regression and anomaly detection, but as a fast and unsupervised transformation, it has potential to a wide range of other time series tasks.

aeon has implementations of the ROCKET transformation and its variants, including MiniROCKET [2] and MultiROCKET [3]. However, these implementations have room for improvement (#208). There is scope to speed up the implementations, and the amount of varients is likely unnecessary and could be condensed into higher quality estimators.

This projects involves improving the existing ROCKET implementations in aeon or implementing new ROCKET variants. The project will involve benchmarking to ensure that the new implementations are as fast and accurate as the original ROCKET algorithm and potentially to compare to other implementations (#214). Besides improving the existing implementations, there is scope to implement the HYDRA algorithm [4] or implement GPU compatible versions of the algorithms.

Project Stages#

Learn about aeon best practices, coding standards and testing policies.
Study the ROCKET, MiniROCKET, MultiROCKET algorithms.
Study the existing ROCKET implementations in aeon.
Merge and tidy the ROCKET implementations, with the aim being to familiarise the mentee with the aeon pull request process.
Implement one (or more) of the proposed ROCKET implementation improvements:
- Significantly alter the current ROCKET implementations with the goal of speeding up the implementation on CPU processing.
- Implement a GPU version of some of the ROCKET transformers, using either tensorflow or pytorch.
- Extend the existing ROCKET implementations to allow for the use of unequal length series.
- Implement the HYDRA algorithm.
Benchmark the implementation against the original ROCKET implementations, looking at booth speed of the transform and accuracy in a classification setting.

Project Outcomes#

Success of the project will be assessed by the quality of the code produced and an evaluation of the transformers in a classification setting. None of the implementations should significantly degrade the performance of the original ROCKET algorithm in terms of accuracy and speed. Regardless, effort and engagement with the project and the aeon community are more important factors in evaluating success.

References#

Dempster, A., Petitjean, F. and Webb, G.I., 2020. ROCKET: exceptionally fast and accurate time series classification using random convolutional kernels. Data Mining and Knowledge Discovery, 34(5), pp.1454-1495.
Dempster, A., Schmidt, D.F. and Webb, G.I., 2021, August. Minirocket: A very fast (almost) deterministic transform for time series classification. In Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining (pp. 248-257).
Tan, C.W., Dempster, A., Bergmeir, C. and Webb, G.I., 2022. MultiRocket: multiple pooling operators and transformations for fast and effective time series classification. Data Mining and Knowledge Discovery, 36(5), pp.1623-1646.
Dempster, A., Schmidt, D.F. and Webb, G.I., 2023. Hydra: Competing convolutional kernels for fast and accurate time series classification. Data Mining and Knowledge Discovery, pp.1-27.

Documentation#

1. Improve automated API documentation#

Mentors: Matthew Middlehurst (@MatthewMiddlehurst)

Description#

aeon uses sphinx and numpydoc to generate API documentation from docstrings. Many of the docstrings are incomplete or missing sections, and could be improved to make the API documentation more useful. The goal of this project is to generally improve the API documentation. A specific goal is to automatically generate links to examples which use the function/class, similar to the scikit-learn documentation. The way this is achieved is up to the mentee, but should include a new section in the relevant API page. I.e., the API page for aeon.transformers.collection.convolution_based.Rocket should have a section called “Examples” which links to the examples which use the class (such as the Rocket notebook).

Project Stages#

Learn about aeon best practices and project documentation.
Familiarise with sphinx documentation generation and numpydoc docstring standards.
Improve the API documentation for a few classes/functions and go through the Pull Request and review process.
Implement a function or improve the API template to automatically generate links to examples which use the function/class.
The main bulk of work is done, but the API documentation is vast and can always be improved! If time allows, continue to enhance the API documentation through individual docstrings, API landing page and template improvements at the mentees discretion.

Project Outcomes#

Success of the project will be assessed by the quality of the documentation produced and engagement with the project and the aeon community. Automatically generating links to examples is the primary goal.