create_multi_comparison_matrix¶
- create_multi_comparison_matrix(df_results, output_dir='./', pdf_savename=None, png_savename=None, csv_savename=None, tex_savename=None, used_statistic='Accuracy', save_as_json=False, plot_1v1_comparisons=False, order_win_tie_loss='higher', include_pvalue=True, pvalue_test='wilcoxon', pvalue_test_params=None, pvalue_correction=None, pvalue_threshold=0.05, use_mean='mean-difference', order_stats='average-statistic', order_better='decreasing', dataset_column=None, precision=4, load_analysis=False, row_comparates=None, col_comparates=None, excluded_row_comparates=None, excluded_col_comparates=None, colormap='coolwarm', fig_size='auto', font_size='auto', colorbar_orientation='vertical', colorbar_value=None, win_label='r>c', tie_label='r=c', loss_label='r<c', include_legend=True, show_symetry=True)[source]¶
Generate the Multi-Comparison Matrix (MCM) [1].
MCM summarises a set of results for multiple estimators evaluated on multiple datasets. The MCM is a heatmap that shows absolute performance and tests for significant difference. It is configurable inmany ways.
- Parameters:
- df_results: str or pd.DataFrame
A csv file containing results in n_problems,n_estimators format. The first row should contain the names of the estimators and the first column can contain the names of the problems if dataset_column is true.
- output_dir: str, default = ‘./’
The output directory for the results.
- pdf_savename: str, default = None
The name of the saved file into pdf format. if None, it will not be saved into this format.
- png_savename: str, default = None
The name of the saved file into png format, if None, it will not be saved into this format.
- csv_savename: str, default = None
The name of the saved file into csv format, if None, will not be saved into this format.
- tex_savename: str, default = None
The name of the saved file into tex format, if None, will not be saved into this format.
- used_statistic: str, default = ‘Score’
Name of the metric being assesses (e.g. accuracy, error, mse).
- save_as_json: bool, default = True
Whether or not to save the python analysis dict into a json file format.
- plot_1v1_comparisons: bool, default = True
Whether or not to plot the 1v1 scatter results.
- order_win_tie_loss: str, default = ‘higher’
The order on considering a win or a loss for a given statistics.
- include_pvalue bool, default = True
Condition whether or not include a pvalue stats.
- pvalue_test: str, default = ‘wilcoxon’
The statistical test to produce the pvalue stats. Currently only wilcoxon is supported.
- pvalue_test_params: dict, default = None,
The default parameter set for the pvalue_test used. If pvalue_test is set to Wilcoxon, one should check the scipy.stats.wilcoxon parameters, in the case Wilcoxon is set and this parameter is None, then the default setup is {“zero_method”: “pratt”, “alternative”: “greater”}.
- pvalue_correction: str, default = None
Correction to use for the pvalue significant test, None or “Holm”.
- pvalue_threshold: float, default = 0.05
Threshold for considering a comparison is significant or not. If pvalue < pvalue_threshhold -> comparison is significant.
- use_mean: str, default = ‘mean-difference’
The mean used to compare two estimators. The only option available is ‘mean-difference’ which is the difference between arithmetic mean over all datasets.
- order_stats: str, default = ‘average-statistic’
The way to order the used_statistic, default setup orders by average statistic over all datasets. The options are: =============================================================== method what it does =============================================================== average-statistic average used_statistic over all datasets average-rank average rank over all datasets max-wins maximum number of wins over all datasets amean-amean average over difference of use_mean pvalue average pvalue over all comparates ================================================================
- order_better: str, default = ‘decreasing’
By which order to sort stats, from best to worse.
- dataset_column: str, default = ‘dataset_name’
The name of the datasets column in the csv file.
- precision: int, default = 4
The number of floating numbers after decimal point.
- load_analysis: bool, default = False
If True attempts to load the analysis json file.
- row_comparates: list of str, default = None
A list of included row comparates, if None, all of the comparates in the study are placed in the rows.
- col_comparates: list of str, default = None
A list of included col comparates, if None, all of the comparates in the study are placed in the cols.
- excluded_row_comparates: list of str, default = None
A list of excluded row comparates. If None, all comparates are included.
- excluded_col_comparates: list of str, default = None
A list of excluded col comparates. If None, all comparates are included.
- colormap: str, default = ‘coolwarm’
The colormap used in matplotlib, if set to None, no color map is used and the heatmap is turned off, no colors will be seen.
- fig_size: str ot tuple of two int, default = ‘auto’
The height and width of the figure, if ‘auto’, use _get_fig_size function in utils.py. Note that the fig size values are in matplotlib units.
- font_size: int, default = 17
The font size of text.
- colorbar_orientation: str, default = ‘vertical’
In which orientation to show the colorbar either horizontal or vertical.
- colorbar_value: str, default = ‘mean-difference’
The values for which the heat map colors are based on.
- win_label: str, default = “r>c”
The winning label to be set on the MCM.
- tie_label: str, default = “r=c”
The tie label to be set on the MCM.
- loss_label: str, default = “r<c”
The loss label to be set on the MCM.
- include_legend: bool, default = True
Whether or not to show the legend on the MCM.
- show_symetry: bool, default = True
Whether or not to show the symetrical part of the heatmap.
- Returns:
- fig: plt.Figure
The figure object of the heatmap.
Notes
Developed from the code in https://github.com/MSD-IRIMAS/Multi_Comparison_Matrix
References
[1]Ismail-Fawaz A. et al, An Approach To Multiple Comparison Benchmark
Evaluations That Is Stable Under Manipulation Of The Comparate Set arXiv preprint arXiv:2305.11921, 2023.