DES-KNN¶
-
class
deslib.des.des_knn.
DESKNN
(pool_classifiers, k=7, DFP=False, with_IH=False, safe_k=None, IH_rate=0.3, mode='selection', pct_accuracy=0.5, pct_diversity=0.3, more_diverse=True, metric='DF')[source]¶ Dynamic ensemble Selection KNN (DES-KNN). This method selects an ensemble of classifiers taking into account the accuracy and more_diverse of the base classifiers. First the most accurate classifiers are selected. Next, the most diverse classifiers, in relation to the selected classifiers, are added to the ensemble
Parameters: - pool_classifiers : type, the generated_pool of classifiers trained for the corresponding
- classification problem.
- k : int (Default = 5)
Number of neighbors used to estimate the competence of the base classifiers.
- DFP : Boolean (Default = False)
Determines if the dynamic frienemy pruning is applied.
- with_IH : Boolean (Default = False)
Whether the hardness level of the region of competence is used to decide between using the DS algorithm or the KNN for classification of a given query sample.
- safe_k : int (default = None)
The size of the indecision region.
- IH_rate : float (default = 0.3)
Hardness threshold. If the hardness level of the competence region is lower than the IH_rate the KNN classifier is used. Otherwise, the DS algorithm is used for classification.
- mode : String (Default = “selection”)
whether the technique will perform dynamic selection, dynamic weighting or an hybrid approach for classification
- pct_accuracy : float (Default = 0.5)
Percentage of base classifiers selected based on accuracy
- pct_diversity : float (Default = 0.3)
Percentage of base classifiers selected based n diversity
- more_diverse : Boolean (Default = True)
Whether we select the most or the least diverse classifiers to add to the pre-selected ensemble
- metric : String (Default = ‘df’)
Diversity diversity_func used to estimate the diversity of the base classifiers. Can be either the double fault (df), Q-statistics (Q), or error correlation (corr)
References
Soares, R. G., Santana, A., Canuto, A. M., & de Souto, M. C. P. “Using accuracy and more_diverse to select classifiers to build ensembles.” International Joint Conference on Neural Networks (IJCNN)., 2006.
Britto, Alceu S., Robert Sabourin, and Luiz ES Oliveira. “Dynamic selection of classifiers—a comprehensive review.” Pattern Recognition 47.11 (2014): 3665-3680.
R. M. O. Cruz, R. Sabourin, and G. D. Cavalcanti, “Dynamic classifier selection: Recent advances and perspectives,” Information Fusion, vol. 41, pp. 195 – 216, 2018.
-
estimate_competence
(query)[source]¶ get the competence estimates of each base classifier ci for the classification of the query sample x.
The competence is estimated using the accuracy and diversity criteria. First the classification accuracy of the base classifiers in the region of competence is estimated. Then the diversity of the base classifiers in the region of competence is estimated.
The method returns two arrays: One containing the accuracy and the other the diversity of each base classifier.
Parameters: - query : array cf shape = [n_features]
The query sample
- Returns
- ——-
- competences : array of shape = [n_classifiers]
The competence level estimated for each base classifier
- diversity : array of shape = [n_classifiers]
The diversity estimated for each base classifier
-
fit
(X, y)[source]¶ Prepare the DS model by setting the KNN algorithm and pre-processing the information required to apply the DS methods
Parameters: - X : matrix of shape = [n_samples, n_features] with the data.
- y : class labels of each sample in X.
Returns: - self
-
predict
(X)[source]¶ Predict the class label for each sample in X.
Parameters: - X : array of shape = [n_samples, n_features]
The input data.
Returns: - predicted_labels : array of shape = [n_samples]
Predicted class label for each sample in X.
-
predict_proba
(X)[source]¶ Estimates the posterior probabilities for sample in X.
Parameters: - X : array of shape = [n_samples, n_features]
The input data.
Returns: - predicted_proba : array of shape = [n_samples, n_classes] with the
- probabilities estimates for each class in the classifier model.
-
score
(X, y, sample_weight=None)[source]¶ Returns the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
Parameters: - X : array-like, shape = (n_samples, n_features)
Test samples.
- y : array-like, shape = (n_samples) or (n_samples, n_outputs)
True labels for X.
- sample_weight : array-like, shape = [n_samples], optional
Sample weights.
Returns: - score : float
Mean accuracy of self.predict(X) wrt. y.