DES Clustering

class deslib.des.des_clustering.DESClustering(pool_classifiers, k=5, mode='selection', pct_accuracy=0.5, pct_diversity=0.33, more_diverse=True, metric='DF', rng=<mtrand.RandomState object>)[source]

Dynamic ensemble selection-Clustering (DES-Clustering). This method selects an ensemble of classifiers taking into account the accuracy and more_diverse of the base classifiers. The K-means algorithm is used to define the region of competence First the most accurate classifiers are selected. Next, the most diverse classifiers, in relation to the selected classifiers, are added to the ensemble

Parameters:
pool_classifiers : list of classifiers

The generated_pool of classifiers trained for the corresponding classification problem. The classifiers should support methods “predict” and “predict_proba”.

k : int (Default = 5)

Number of neighbors used to estimate the competence of the base classifiers.

mode : String (Default = “selection”)

whether the technique will perform dynamic selection, dynamic weighting or an hybrid approach for classification

pct_accuracy : float (Default = 0.5)

Percentage of base classifiers selected based on accuracy

pct_diversity : float (Default = 0.33)

Percentage of base classifiers selected based n diversity

more_diverse : Boolean (Default = True)

Whether we select the most or the least diverse classifiers to add to the pre-selected ensemble

metric : String (Default = ‘df’)

Diversity diversity_func used to estimate the diversity of the base classifiers. Can be either the double fault (df), Q-statistics (Q), or error correlation (corr)

rng : numpy.random.RandomState instance

Random number generator to assure reproducible results.

References

Soares, R. G., Santana, A., Canuto, A. M., & de Souto, M. C. P. “Using accuracy and more_diverse to select classifiers to build ensembles.” International Joint Conference on Neural Networks (IJCNN)., 2006.

Britto, Alceu S., Robert Sabourin, and Luiz ES Oliveira. “Dynamic selection of classifiers—a comprehensive review.” Pattern Recognition 47.11 (2014): 3665-3680.

R. M. O. Cruz, R. Sabourin, and G. D. Cavalcanti, “Dynamic classifier selection: Recent advances and perspectives,” Information Fusion, vol. 41, pp. 195 – 216, 2018.

estimate_competence(query)[source]

get the competence estimates of each base classifier ci for the classification of the query sample x.

In this case, the competences are pre-calculated based on each cluster. So this method computes the nearest cluster of the query sample and get the pre-calculated competences of the base classifiers for the nearest cluster.

Parameters:
query : array of shape = [n_features]

The query sample

Returns:
competences : array = [n_classifiers]

The competence level estimated for each base classifier

fit(X, y)[source]

Train the DS model by setting the Clustering algorithm and pre-processing the information required to apply the DS methods. In this case, after fitting the roc_algorithm method, the ensemble containing most competent classifiers taking into account accuracy and diversity are estimated for each cluster.

Parameters:
X : array of shape = [n_samples, n_features]

The input data.

y : class labels of each sample in X.
Returns:
self
predict(X)[source]

Predict the class label for each sample in X.

Parameters:
X : array of shape = [n_samples, n_features]

The input data.

Returns:
predicted_labels : array of shape = [n_samples]

Predicted class label for each sample in X.

predict_proba(X)[source]

Estimates the posterior probabilities for sample in X.

Parameters:
X : array of shape = [n_samples, n_features]

The input data.

Returns:
predicted_proba : array of shape = [n_samples, n_classes] with the
probabilities estimates for each class in the classifier model.
score(X, y, sample_weight=None)[source]

Returns the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:
X : array-like, shape = (n_samples, n_features)

Test samples.

y : array-like, shape = (n_samples) or (n_samples, n_outputs)

True labels for X.

sample_weight : array-like, shape = [n_samples], optional

Sample weights.

Returns:
score : float

Mean accuracy of self.predict(X) wrt. y.

select(query)[source]

Select an ensemble with the most accurate and most diverse classifier for the classification of the query.

Since the method is based on roc_algorithm, the ensemble for each cluster is already pre-calculated. So, we only need to estimate which is the nearest cluster and then get the classifiers that were pre-selected for this cluster

Parameters:
query : array of shape = [n_features]

The query sample

Returns:
indices : List containing the indices of the selected base classifiers