Diversity¶
This file contains the implementation of key diversity measures found in the ensemble literature:
 Double Fault
 Negative Double fault
 Qstatistics
 Ratio of errors
The implementation are made according to the specifications from the book “Combining Pattern Classifiers”.

deslib.util.diversity.
Q_statistic
(y, y_pred1, y_pred2)[source]¶ Calculates the Qstatistics diversity measure between a pair of classifiers. The Q value is in a range [1, 1]. Classifiers that tend to classify the same object correctly will have positive values of Q, and Q = 0 for two independent classifiers.
Parameters:  y : array of shape (n_samples):
class labels of each sample.
 y_pred1 : array of shape (n_samples):
predicted class labels by the classifier 1 for each sample.
 y_pred2 : array of shape (n_samples):
predicted class labels by the classifier 2 for each sample.
Returns:  Q : The qstatistic measure between two classifiers

deslib.util.diversity.
agreement_measure
(y, y_pred1, y_pred2)[source]¶ Calculates the agreement measure between a pair of classifiers. This measure is calculated by the frequency that both classifiers either obtained the correct or incorrect prediction for any given sample
Parameters:  y : array of shape (n_samples):
class labels of each sample.
 y_pred1 : array of shape (n_samples):
predicted class labels by the classifier 1 for each sample.
 y_pred2 : array of shape (n_samples):
predicted class labels by the classifier 2 for each sample.
Returns:  agreement : The frequency at which both classifiers agrees

deslib.util.diversity.
compute_pairwise_diversity
(targets, prediction_matrix, diversity_func)[source]¶ Computes the pairwise diversity matrix.
Parameters:  targets : array of shape (n_samples):
Class labels of each sample in X.
 prediction_matrix : array of shape (n_samples, n_classifiers):
Predicted class labels for each classifier in the pool
 diversity_func : Function
Function used to estimate the pairwise diversity
Returns:  diversity : array of shape = [n_classifiers]
The average pairwise diversity matrix calculated for the pool of classifiers

deslib.util.diversity.
correlation_coefficient
(y, y_pred1, y_pred2)[source]¶ Calculates the correlation between two classifiers using oracle outputs. Coefficient is a value in a range [1, 1].
Parameters:  y : array of shape (n_samples):
class labels of each sample.
 y_pred1 : array of shape (n_samples):
predicted class labels by the classifier 1 for each sample.
 y_pred2 : array of shape (n_samples):
predicted class labels by the classifier 2 for each sample.
Returns:  rho : The correlation coefficient measured between two classifiers

deslib.util.diversity.
disagreement_measure
(y, y_pred1, y_pred2)[source]¶ Calculates the disagreement measure between a pair of classifiers. This measure is calculated by the frequency that only one classifier makes the correct prediction.
Parameters:  y : array of shape (n_samples):
class labels of each sample.
 y_pred1 : array of shape (n_samples):
predicted class labels by the classifier 1 for each sample.
 y_pred2 : array of shape (n_samples):
predicted class labels by the classifier 2 for each sample.
Returns:  disagreement : The frequency at which both classifiers disagrees

deslib.util.diversity.
double_fault
(y, y_pred1, y_pred2)[source]¶ Calculates the double fault (df) measure. This measure represents the probability that both classifiers makes the wrong prediction. A lower value of df means the base classifiers are less likely to make the same error. This measure must be minimized to increase diversity.
Parameters:  y : array of shape (n_samples):
class labels of each sample.
 y_pred1 : array of shape (n_samples):
predicted class labels by the classifier 1 for each sample.
 y_pred2 : array of shape (n_samples):
predicted class labels by the classifier 2 for each sample.
Returns:  df : The double fault measure between two classifiers
References
Giacinto, Giorgio, and Fabio Roli. “Design of effective neural network ensembles for image classification purposes.” Image and Vision Computing 19.9 (2001): 699707.

deslib.util.diversity.
negative_double_fault
(y, y_pred1, y_pred2)[source]¶ The negative of the double fault measure. This measure should be maximized for a higher diversity.
Parameters:  y : array of shape (n_samples):
class labels of each sample.
 y_pred1 : array of shape (n_samples):
predicted class labels by the classifier 1 for each sample.
 y_pred2 : array of shape (n_samples):
predicted class labels by the classifier 2 for each sample.
Returns:  df : The negative double fault measure between two classifiers
References
Giacinto, Giorgio, and Fabio Roli. “Design of effective neural network ensembles for image classification purposes.” Image and Vision Computing 19.9 (2001): 699707.

deslib.util.diversity.
ratio_errors
(y, y_pred1, y_pred2)[source]¶ Calculates Ratio of errors diversity measure between a pair of classifiers. A higher value means that the base classifiers are less likely to make the same errors. The ratio must be maximized for a higher diversity
Parameters:  y : array of shape (n_samples):
class labels of each sample.
 y_pred1 : array of shape (n_samples):
predicted class labels by the classifier 1 for each sample.
 y_pred2 : array of shape (n_samples):
predicted class labels by the classifier 2 for each sample.
Returns:  ratio : The qstatistic measure between two classifiers
References
Aksela, Matti. “Comparison of classifier selection methods for improving committee performance.” Multiple Classifier Systems (2003): 159159.