DES-Minimum Difference¶
-
class
deslib.des.probabilistic.
MinimumDifference
(pool_classifiers=None, k=None, DFP=False, with_IH=False, safe_k=None, IH_rate=0.3, mode='selection', random_state=None, knn_classifier='knn', DSEL_perc=0.5)[source]¶ Computes the competence level of the classifiers based on the difference between the support obtained by each class. The competence level at a data point \(\mathbf{x}_{k}\) is equal to the minimum difference between the support obtained to the correct class and the support obtained for different classes.
The influence of each sample xk is defined according to a Gaussian function model[2]. Samples that are closer to the query have a higher influence in the competence estimation.
Parameters: - pool_classifiers : list of classifiers (Default = None)
The generated_pool of classifiers trained for the corresponding classification problem. Each base classifiers should support the method “predict”. If None, then the pool of classifiers is a bagging classifier.
- k : int (Default = 7)
Number of neighbors used to estimate the competence of the base classifiers.
- DFP : Boolean (Default = False)
Determines if the dynamic frienemy pruning is applied.
- with_IH : Boolean (Default = False)
Whether the hardness level of the region of competence is used to decide between using the DS algorithm or the KNN for classification of a given query sample.
- safe_k : int (default = None)
The size of the indecision region.
- IH_rate : float (default = 0.3)
Hardness threshold. If the hardness level of the competence region is lower than the IH_rate the KNN classifier is used. Otherwise, the DS algorithm is used for classification.
- mode : String (Default = “selection”)
Whether the technique will perform dynamic selection, dynamic weighting or an hybrid approach for classification.
- random_state : int, RandomState instance or None, optional (default=None)
If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
- knn_classifier : {‘knn’, ‘faiss’, None} (Default = ‘knn’)
The algorithm used to estimate the region of competence:
- ‘knn’ will use
KNeighborsClassifier
from sklearn - ‘faiss’ will use Facebook’s Faiss similarity search through the
class
FaissKNNClassifier
- None, will use sklearn
KNeighborsClassifier
.
- ‘knn’ will use
- DSEL_perc : float (Default = 0.5)
Percentage of the input data used to fit DSEL. Note: This parameter is only used if the pool of classifier is None or unfitted.
References
[1] B. Antosik, M. Kurzynski, New measures of classifier competence – heuristics and application to the design of multiple classifier systems., in: Computer recognition systems 4., 2011, pp. 197–206.
[2] Woloszynski, Tomasz, and Marek Kurzynski. “A probabilistic model of classifier competence for dynamic ensemble selection.” Pattern Recognition 44.10 (2011): 2656-2668.
-
estimate_competence
(query, neighbors, distances, predictions=None)[source]¶ estimate the competence of each base classifier \(c_{i}\) using the source of competence \(C_{src}\) and the potential function model. The source of competence \(C_{src}\) for all data points in DSEL is already pre-computed in the fit() steps.
\[\delta_{i,j} = \frac{\sum_{k=1}^{N}C_{src} \: exp(-d (\mathbf{x}_{k}, \mathbf{x}_{q})^{2})} {exp( -d (\mathbf{x}_{k}, \mathbf{x}_{q})^{2} )}\]Parameters: - query : array of shape = [n_samples, n_features]
The test examples.
- neighbors : array of shale = [n_samples, n_neighbors]
Indices of the k nearest neighbors according for each test sample.
- distances : array of shale = [n_samples, n_neighbors]
Distances of the k nearest neighbors according for each test sample.
- predictions : array of shape = [n_samples, n_classifiers]
Predictions of the base classifiers for all test examples.
Returns: - competences : array of shape = [n_samples, n_classifiers]
Competence level estimated for each base classifier and test example.
-
fit
(X, y)[source]¶ Train the DS model by setting the KNN algorithm and pre-processing the information required to apply the DS methods. In the case of probabilistic techniques, the source of competence (C_src) is calculated for each data point in DSEL in order to speed up the process during the testing phases.
C_src is estimated with the source_competence() function that is overridden by each DS method based on this paradigm.
Parameters: - X : array of shape = [n_samples, n_features]
Data used to fit the model.
- y : array of shape = [n_samples]
class labels of each example in X.
Returns: - self : object
Returns self.
-
predict
(X)[source]¶ Predict the class label for each sample in X.
Parameters: - X : array of shape = [n_samples, n_features]
The input data.
Returns: - predicted_labels : array of shape = [n_samples]
Predicted class label for each sample in X.
-
predict_proba
(X)[source]¶ Estimates the posterior probabilities for sample in X.
Parameters: - X : array of shape = [n_samples, n_features]
The input data.
Returns: - predicted_proba : array of shape = [n_samples, n_classes]
Probabilities estimates for each sample in X.
-
score
(X, y, sample_weight=None)[source]¶ Returns the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
Parameters: - X : array-like, shape = (n_samples, n_features)
Test samples.
- y : array-like, shape = (n_samples) or (n_samples, n_outputs)
True labels for X.
- sample_weight : array-like, shape = [n_samples], optional
Sample weights.
Returns: - score : float
Mean accuracy of self.predict(X) wrt. y.
-
select
(competences)[source]¶ Selects the base classifiers that obtained a competence level higher than the predefined threshold. In this case, the threshold indicates the competence of the random classifier.
Parameters: - competences : array of shape = [n_samples, n_classifiers]
Competence level estimated for each base classifier and test example.
Returns: - selected_classifiers : array of shape = [n_samples, n_classifiers]
Boolean matrix containing True if the base classifier is selected, False otherwise.
-
source_competence
()[source]¶ Calculates the source of competence using the Minimum Difference method.
The source of competence C_src at the validation point \(\mathbf{x}_{k}\) calculated by the Minimum Difference between the supports obtained to the correct class and the support obtained by the other classes
Returns: - C_src : array of shape = [n_samples, n_classifiers]
The competence source for each base classifier at each data point.