create_multi_comparison_matrix¶
- create_multi_comparison_matrix(df_results, save_path='./mcm', formats=None, used_statistic='Accuracy', plot_1v1_comparisons=False, higher_stat_better=True, include_pvalue=True, pvalue_test='wilcoxon', pvalue_test_params=None, pvalue_correction=None, pvalue_threshold=0.05, use_mean='mean-difference', order_stats='average-statistic', order_stats_increasing=False, dataset_column=None, precision=4, load_analysis=False, row_comparates=None, col_comparates=None, excluded_row_comparates=None, excluded_col_comparates=None, colormap='coolwarm', fig_size='auto', font_size='auto', colorbar_orientation='vertical', colorbar_value=None, win_tie_loss_labels=None, include_legend=True, show_symetry=True)[source]¶
Generate the Multi-Comparison Matrix (MCM) [1].
MCM summarises a set of results for multiple estimators evaluated on multiple datasets. The MCM is a heatmap that shows absolute performance and tests for significant difference. It is configurable inmany ways.
- Parameters:
- df_results: str or pd.DataFrame
A csv file containing results in n_problems,n_estimators format. The first row should contain the names of the estimators and the first column can contain the names of the problems if dataset_column is true.
- save_path: str, default = ‘./mcm’
The output directory for the results. If you want to save the results with a different filename, you must include the filename in the path. (e.g., ‘./your_filename’)
- formatsstr or list of str, default = None
File formats to save in the save_path. - If None, no files are saved. - Valid formats are ‘pdf’, ‘png’, ‘json’, ‘csv’, ‘tex’.
- used_statistic: str, default = ‘Score’
Name of the metric being assesses (e.g. accuracy, error, mse).
- save_as_json: bool, default = True
Whether or not to save the python analysis dict into a json file format.
- plot_1v1_comparisons: bool, default = True
Whether or not to plot the 1v1 scatter results.
- higher_stat_better: bool, default = True
The order on considering a win or a loss for a given statistics.
- include_pvalue bool, default = True
Condition whether or not include a pvalue stats.
- pvalue_test: str, default = ‘wilcoxon’
The statistical test to produce the pvalue stats. Currently only wilcoxon is supported.
- pvalue_test_params: dict, default = None,
The default parameter set for the pvalue_test used. If pvalue_test is set to Wilcoxon, one should check the scipy.stats.wilcoxon parameters, in the case Wilcoxon is set and this parameter is None, then the default setup is {“zero_method”: “pratt”, “alternative”: “greater”}.
- pvalue_correction: str, default = None
Correction to use for the pvalue significant test, None or “Holm”.
- pvalue_threshold: float, default = 0.05
Threshold for considering a comparison is significant or not. If pvalue < pvalue_threshhold -> comparison is significant.
- use_mean: str, default = ‘mean-difference’
The mean used to compare two estimators. The only option available is ‘mean-difference’ which is the difference between arithmetic mean over all datasets.
- order_stats: str, default = ‘average-statistic’
The way to order the used_statistic, default setup orders by average statistic over all datasets. The options are: =============================================================== method what it does =============================================================== average-statistic average used_statistic over all datasets average-rank average rank over all datasets max-wins maximum number of wins over all datasets amean-amean average over difference of use_mean pvalue average pvalue over all comparates ================================================================
- order_stats_increasing: bool, default = False
If True, the order_stats will be ordered in increasing order, otherwise they are ordered in decreasing order.
- dataset_column: str, default = ‘dataset_name’
The name of the datasets column in the csv file.
- precision: int, default = 4
The number of floating numbers after decimal point.
- load_analysis: bool, default = False
If True attempts to load the analysis json file.
- row_comparates: list of str, default = None
A list of included row comparates, if None, all of the comparates in the study are placed in the rows.
- col_comparates: list of str, default = None
A list of included col comparates, if None, all of the comparates in the study are placed in the cols.
- excluded_row_comparates: list of str, default = None
A list of excluded row comparates. If None, all comparates are included.
- excluded_col_comparates: list of str, default = None
A list of excluded col comparates. If None, all comparates are included.
- colormap: str, default = ‘coolwarm’
The colormap used in matplotlib, if set to None, no color map is used and the heatmap is turned off, no colors will be seen.
- fig_size: str or tuple of two int, default = ‘auto’
The height and width of the figure, if ‘auto’, use _get_fig_size function in utils.py. Note that the fig size values are in matplotlib units.
- font_size: int, default = 17
The font size of text.
- colorbar_orientation: str, default = ‘vertical’
In which orientation to show the colorbar either horizontal or vertical.
- colorbar_value: str, default = ‘mean-difference’
The values for which the heat map colors are based on.
- win_tie_loss_labels: tuple of str or None, default = None
Custom labels for heatmap cells, in the form (win_label, tie_label, loss_label). If win_tie_loss_labels=None, default labels are chosen based on higher_stat_better: - If higher_stat_better=True, defaults to (‘r>c’, ‘r=c’, ‘r<c’) - If higher_stat_better=False, defaults to (‘r<c’, ‘r=c’, ‘r>c’) The tuple must contain exactly three strings, representing win, tie, and loss outcomes for the row comparate (r) against the column comparate (c).
- include_legend: bool, default = True
Whether or not to show the legend on the MCM.
- show_symetry: bool, default = True
Whether or not to show the symmetrical part of the heatmap.
- Returns:
- fig: plt.Figure
The figure object of the heatmap.
Notes
Developed from the code in https://github.com/MSD-IRIMAS/Multi_Comparison_Matrix
References
[1]Ismail-Fawaz A. et al, An Approach To Multiple Comparison Benchmark
Evaluations That Is Stable Under Manipulation Of The Comparate Set arXiv preprint arXiv:2305.11921, 2023.