DES Clustering¶
-
class
deslib.des.des_clustering.
DESClustering
(pool_classifiers, k=5, mode='selection', pct_accuracy=0.5, pct_diversity=0.33, more_diverse=True, metric='DF', rng=<mtrand.RandomState object>)[source]¶ Dynamic ensemble selection-Clustering (DES-Clustering). This method selects an ensemble of classifiers taking into account the accuracy and more_diverse of the base classifiers. The K-means algorithm is used to define the region of competence First the most accurate classifiers are selected. Next, the most diverse classifiers, in relation to the selected classifiers, are added to the ensemble
Parameters: - pool_classifiers : list of classifiers
The generated_pool of classifiers trained for the corresponding classification problem. The classifiers should support methods “predict” and “predict_proba”.
- k : int (Default = 5)
Number of neighbors used to estimate the competence of the base classifiers.
- mode : String (Default = “selection”)
whether the technique will perform dynamic selection, dynamic weighting or an hybrid approach for classification
- pct_accuracy : float (Default = 0.5)
Percentage of base classifiers selected based on accuracy
- pct_diversity : float (Default = 0.33)
Percentage of base classifiers selected based n diversity
- more_diverse : Boolean (Default = True)
Whether we select the most or the least diverse classifiers to add to the pre-selected ensemble
- metric : String (Default = ‘df’)
Diversity diversity_func used to estimate the diversity of the base classifiers. Can be either the double fault (df), Q-statistics (Q), or error correlation (corr)
- rng : numpy.random.RandomState instance
Random number generator to assure reproducible results.
References
Soares, R. G., Santana, A., Canuto, A. M., & de Souto, M. C. P. “Using accuracy and more_diverse to select classifiers to build ensembles.” International Joint Conference on Neural Networks (IJCNN)., 2006.
Britto, Alceu S., Robert Sabourin, and Luiz ES Oliveira. “Dynamic selection of classifiers—a comprehensive review.” Pattern Recognition 47.11 (2014): 3665-3680.
R. M. O. Cruz, R. Sabourin, and G. D. Cavalcanti, “Dynamic classifier selection: Recent advances and perspectives,” Information Fusion, vol. 41, pp. 195 – 216, 2018.
-
estimate_competence
(query)[source]¶ get the competence estimates of each base classifier ci for the classification of the query sample x.
In this case, the competences are pre-calculated based on each cluster. So this method computes the nearest cluster of the query sample and get the pre-calculated competences of the base classifiers for the nearest cluster.
Parameters: - query : array of shape = [n_features]
The query sample
Returns: - competences : array = [n_classifiers]
The competence level estimated for each base classifier
-
fit
(X, y)[source]¶ Train the DS model by setting the Clustering algorithm and pre-processing the information required to apply the DS methods. In this case, after fitting the roc_algorithm method, the ensemble containing most competent classifiers taking into account accuracy and diversity are estimated for each cluster.
Parameters: - X : array of shape = [n_samples, n_features]
The input data.
- y : class labels of each sample in X.
Returns: - self
-
predict
(X)[source]¶ Predict the class label for each sample in X.
Parameters: - X : array of shape = [n_samples, n_features]
The input data.
Returns: - predicted_labels : array of shape = [n_samples]
Predicted class label for each sample in X.
-
predict_proba
(X)[source]¶ Estimates the posterior probabilities for sample in X.
Parameters: - X : array of shape = [n_samples, n_features]
The input data.
Returns: - predicted_proba : array of shape = [n_samples, n_classes] with the
- probabilities estimates for each class in the classifier model.
-
score
(X, y, sample_weight=None)[source]¶ Returns the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
Parameters: - X : array-like, shape = (n_samples, n_features)
Test samples.
- y : array-like, shape = (n_samples) or (n_samples, n_outputs)
True labels for X.
- sample_weight : array-like, shape = [n_samples], optional
Sample weights.
Returns: - score : float
Mean accuracy of self.predict(X) wrt. y.
-
select
(query)[source]¶ Select an ensemble with the most accurate and most diverse classifier for the classification of the query.
Since the method is based on roc_algorithm, the ensemble for each cluster is already pre-calculated. So, we only need to estimate which is the nearest cluster and then get the classifiers that were pre-selected for this cluster
Parameters: - query : array of shape = [n_features]
The query sample
Returns: - indices : List containing the indices of the selected base classifiers