kasba_average

kasba_average(X: ndarray, init_barycenter: ndarray | None = 'mean', previous_cost: float | None = None, previous_distance_to_center: ndarray | None = None, distance: str = 'msm', max_iters: int = 30, tol: float = 1e-05, ba_subset_size: float = 0.5, weights: ndarray | None = None, precomputed_medoids_pairwise_distance: ndarray | None = None, initial_step_size: float = 0.05, decay_rate: float = 0.1, verbose: bool = False, n_jobs: int = 1, random_state: int | None = None, return_distances_to_center: bool = False, return_cost: bool = False, **kwargs)[source]

Compute the KASBA barycenter average of time series using an elastic distance [1]_.

KASBA adapts the Stochastic Subgradient Elastic Barycenter Average by iterating randomly over the dataset. On the first iteration, all series are used; on subsequent iterations, only a subset (controlled by ba_subset_size) is used. If there are fewer than 10 series, all available data are used on every iteration.

Parameters:
Xnp.ndarray of shape (n_cases, n_channels, n_timepoints) or (n_cases,

n_timepoints) Collection of time series to average. If 2D, it is internally reshaped to (n_cases, 1, n_timepoints).

init_barycenter{“mean”, “medoids”, “random”} or np.ndarray of shape (n_channels,

n_timepoints), default=”mean” Initial barycenter. If a string is provided, it specifies the initialisation strategy. If an array is provided, it is used directly as the starting barycenter.

previous_costfloat, default=None

The total cost (sum of distances from all series in X to the current barycenter). If None, it is computed in the first iteration.

previous_distance_to_centernp.ndarray of shape (n_cases,), default=None

Distances from each series in X to the current barycenter. If None, they are computed in the first iteration.

distancestr, default=”msm”

Distance function used during averaging. See aeon.distances.get_distance_function for valid options.

max_itersint, default=30

Maximum number of iterations to update the barycenter.

tolfloat, default=1e-5

Early-stopping tolerance: if the decrease in cost between iterations is smaller than this value, the procedure stops.

ba_subset_sizefloat, default=0.5

Proportion of the data to use on each iteration after the first. The first iteration always uses all data. If X has fewer than 10 series, all are used on every iteration.

weightsnp.ndarray of shape (n_cases,), default=None

Weights for each time series. If None, all series receive weight 1.

precomputed_medoids_pairwise_distancenp.ndarray of shape (n_cases, n_cases),

default=None Optional precomputed pairwise distance matrix (used when relevant, e.g., for “medoids” initialisation). If None, distances are computed on the fly.

initial_step_sizefloat, default=0.05

Initial step size for the stochastic subgradient update.

decay_ratefloat, default=0.1

Exponential decay rate for the step size; the step size at iteration i is initial_step_size * exp(-decay_rate * i).

verbosebool, default=False

If True, prints progress information.

n_jobsint, default=1

The number of jobs to run in parallel. If -1, then the number of jobs is set to the number of CPU cores. If 1, then the function is executed in a single thread. If greater than 1, then the function is executed in parallel.

random_stateint or None, default=None

Random seed used where applicable.

return_distances_to_centerbool, default=False

If True, also return the distances between each time series and the barycenter.

return_costbool, default=False

If True, also return the total cost.

**kwargs

Additional keyword arguments forwarded to the chosen distance function.

Returns:
barycenternp.ndarray of shape (n_channels, n_timepoints)

The barycenter (KASBA average) of the input time series.

distances_to_centernp.ndarray of shape (n_cases,), optional

Returned if return_distances_to_center=True. Distances between each time series and the barycenter.

costfloat, optional

Returned if return_cost=True. The total cost (sum of distances to barycenter).

References

. [1] Holder, C. & Bagnall, A. (2024).

Rock the KASBA: Blazingly Fast and Accurate Time Series Clustering. arXiv:2411.17838.