kasba_average¶
- kasba_average(X: ndarray, init_barycenter: ndarray | None = 'mean', previous_cost: float | None = None, previous_distance_to_center: ndarray | None = None, distance: str = 'msm', max_iters: int = 30, tol: float = 1e-05, ba_subset_size: float = 0.5, weights: ndarray | None = None, precomputed_medoids_pairwise_distance: ndarray | None = None, initial_step_size: float = 0.05, decay_rate: float = 0.1, verbose: bool = False, n_jobs: int = 1, random_state: int | None = None, return_distances_to_center: bool = False, return_cost: bool = False, **kwargs)[source]¶
Compute the KASBA barycenter average of time series using an elastic distance [1]_.
KASBA adapts the Stochastic Subgradient Elastic Barycenter Average by iterating randomly over the dataset. On the first iteration, all series are used; on subsequent iterations, only a subset (controlled by
ba_subset_size) is used. If there are fewer than 10 series, all available data are used on every iteration.- Parameters:
- Xnp.ndarray of shape (n_cases, n_channels, n_timepoints) or (n_cases,
n_timepoints) Collection of time series to average. If 2D, it is internally reshaped to (n_cases, 1, n_timepoints).
- init_barycenter{“mean”, “medoids”, “random”} or np.ndarray of shape (n_channels,
n_timepoints), default=”mean” Initial barycenter. If a string is provided, it specifies the initialisation strategy. If an array is provided, it is used directly as the starting barycenter.
- previous_costfloat, default=None
The total cost (sum of distances from all series in X to the current barycenter). If None, it is computed in the first iteration.
- previous_distance_to_centernp.ndarray of shape (n_cases,), default=None
Distances from each series in X to the current barycenter. If None, they are computed in the first iteration.
- distancestr, default=”msm”
Distance function used during averaging. See
aeon.distances.get_distance_functionfor valid options.- max_itersint, default=30
Maximum number of iterations to update the barycenter.
- tolfloat, default=1e-5
Early-stopping tolerance: if the decrease in cost between iterations is smaller than this value, the procedure stops.
- ba_subset_sizefloat, default=0.5
Proportion of the data to use on each iteration after the first. The first iteration always uses all data. If X has fewer than 10 series, all are used on every iteration.
- weightsnp.ndarray of shape (n_cases,), default=None
Weights for each time series. If None, all series receive weight 1.
- precomputed_medoids_pairwise_distancenp.ndarray of shape (n_cases, n_cases),
default=None Optional precomputed pairwise distance matrix (used when relevant, e.g., for “medoids” initialisation). If None, distances are computed on the fly.
- initial_step_sizefloat, default=0.05
Initial step size for the stochastic subgradient update.
- decay_ratefloat, default=0.1
Exponential decay rate for the step size; the step size at iteration i is
initial_step_size * exp(-decay_rate * i).- verbosebool, default=False
If True, prints progress information.
- n_jobsint, default=1
The number of jobs to run in parallel. If -1, then the number of jobs is set to the number of CPU cores. If 1, then the function is executed in a single thread. If greater than 1, then the function is executed in parallel.
- random_stateint or None, default=None
Random seed used where applicable.
- return_distances_to_centerbool, default=False
If True, also return the distances between each time series and the barycenter.
- return_costbool, default=False
If True, also return the total cost.
- **kwargs
Additional keyword arguments forwarded to the chosen distance function.
- Returns:
- barycenternp.ndarray of shape (n_channels, n_timepoints)
The barycenter (KASBA average) of the input time series.
- distances_to_centernp.ndarray of shape (n_cases,), optional
Returned if return_distances_to_center=True. Distances between each time series and the barycenter.
- costfloat, optional
Returned if return_cost=True. The total cost (sum of distances to barycenter).
References
- . [1] Holder, C. & Bagnall, A. (2024).
Rock the KASBA: Blazingly Fast and Accurate Time Series Clustering. arXiv:2411.17838.