DES-Minimum Difference¶

class deslib.des.probabilistic.MinimumDifference(pool_classifiers=None, k=None, DFP=False, with_IH=False, safe_k=None, IH_rate=0.3, mode='selection', random_state=None, knn_classifier='knn', DSEL_perc=0.5)[source]¶

Computes the competence level of the classifiers based on the difference between the support obtained by each class. The competence level at a data point \(\mathbf{x}_{k}\) is equal to the minimum difference between the support obtained to the correct class and the support obtained for different classes.

The influence of each sample xk is defined according to a Gaussian function model[2]. Samples that are closer to the query have a higher influence in the competence estimation.

Parameters:

pool_classifiers : list of classifiers (Default = None)

The generated_pool of classifiers trained for the corresponding classification problem. Each base classifiers should support the method “predict”. If None, then the pool of classifiers is a bagging classifier.

k : int (Default = 7)

Number of neighbors used to estimate the competence of the base classifiers.

DFP : Boolean (Default = False)

Determines if the dynamic frienemy pruning is applied.

with_IH : Boolean (Default = False)

Whether the hardness level of the region of competence is used to decide between using the DS algorithm or the KNN for classification of a given query sample.

safe_k : int (default = None)

The size of the indecision region.

IH_rate : float (default = 0.3)

Hardness threshold. If the hardness level of the competence region is lower than the IH_rate the KNN classifier is used. Otherwise, the DS algorithm is used for classification.

mode : String (Default = “selection”)

Whether the technique will perform dynamic selection, dynamic weighting or an hybrid approach for classification.

random_state : int, RandomState instance or None, optional (default=None)

If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

knn_classifier : {‘knn’, ‘faiss’, None} (Default = ‘knn’)

The algorithm used to estimate the region of competence:

‘knn’ will use KNeighborsClassifier from sklearn
‘faiss’ will use Facebook’s Faiss similarity search through the class FaissKNNClassifier
None, will use sklearn KNeighborsClassifier.

DSEL_perc : float (Default = 0.5)

Percentage of the input data used to fit DSEL. Note: This parameter is only used if the pool of classifier is None or unfitted.

References

[1] B. Antosik, M. Kurzynski, New measures of classifier competence – heuristics and application to the design of multiple classifier systems., in: Computer recognition systems 4., 2011, pp. 197–206.

[2] Woloszynski, Tomasz, and Marek Kurzynski. “A probabilistic model of classifier competence for dynamic ensemble selection.” Pattern Recognition 44.10 (2011): 2656-2668.

estimate_competence(query, neighbors, distances, predictions=None)[source]¶

estimate the competence of each base classifier \(c_{i}\) using the source of competence \(C_{src}\) and the potential function model. The source of competence \(C_{src}\) for all data points in DSEL is already pre-computed in the fit() steps.

\[\delta_{i,j} = \frac{\sum_{k=1}^{N}C_{src} \: exp(-d (\mathbf{x}_{k}, \mathbf{x}_{q})^{2})} {exp( -d (\mathbf{x}_{k}, \mathbf{x}_{q})^{2} )}\]

Parameters:

query : array of shape = [n_samples, n_features]: The test examples.
neighbors : array of shale = [n_samples, n_neighbors]: Indices of the k nearest neighbors according for each test sample.
distances : array of shale = [n_samples, n_neighbors]: Distances of the k nearest neighbors according for each test sample.
predictions : array of shape = [n_samples, n_classifiers]: Predictions of the base classifiers for all test examples.

Returns:

competences : array of shape = [n_samples, n_classifiers]: Competence level estimated for each base classifier and test example.

fit(X, y)[source]¶

Train the DS model by setting the KNN algorithm and pre-processing the information required to apply the DS methods. In the case of probabilistic techniques, the source of competence (C_src) is calculated for each data point in DSEL in order to speed up the process during the testing phases.

C_src is estimated with the source_competence() function that is overridden by each DS method based on this paradigm.

Parameters:	X : array of shape = [n_samples, n_features] Data used to fit the model. y : array of shape = [n_samples] class labels of each example in X.
Returns:	self : object Returns self.

predict(X)[source]¶

Predict the class label for each sample in X.

Parameters:	X : array of shape = [n_samples, n_features] The input data.
Returns:	predicted_labels : array of shape = [n_samples] Predicted class label for each sample in X.

predict_proba(X)[source]¶

Estimates the posterior probabilities for sample in X.

Parameters:	X : array of shape = [n_samples, n_features] The input data.
Returns:	predicted_proba : array of shape = [n_samples, n_classes] Probabilities estimates for each sample in X.

score(X, y, sample_weight=None)[source]¶

Returns the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:	X : array-like, shape = (n_samples, n_features) Test samples. y : array-like, shape = (n_samples) or (n_samples, n_outputs) True labels for X. sample_weight : array-like, shape = [n_samples], optional Sample weights.
Returns:	score : float Mean accuracy of self.predict(X) wrt. y.

select(competences)[source]¶

Selects the base classifiers that obtained a competence level higher than the predefined threshold. In this case, the threshold indicates the competence of the random classifier.

Parameters:	competences : array of shape = [n_samples, n_classifiers] Competence level estimated for each base classifier and test example.
Returns:	selected_classifiers : array of shape = [n_samples, n_classifiers] Boolean matrix containing True if the base classifier is selected, False otherwise.

source_competence()[source]¶

Calculates the source of competence using the Minimum Difference method.

The source of competence C_src at the validation point \(\mathbf{x}_{k}\) calculated by the Minimum Difference between the supports obtained to the correct class and the support obtained by the other classes

Returns:	C_src : array of shape = [n_samples, n_classifiers] The competence source for each base classifier at each data point.