META-DES¶

class deslib.des.meta_des.METADES(pool_classifiers=None, meta_classifier=None, k=7, Kp=5, Hc=1.0, selection_threshold=0.5, mode='selection', DFP=False, with_IH=False, safe_k=None, IH_rate=0.3, random_state=None, knn_classifier='knn', DSEL_perc=0.5)[source]¶

Meta learning for dynamic ensemble selection (META-DES).

The META-DES framework is based on the assumption that the dynamic ensemble selection problem can be considered as a meta-problem. This meta-problem uses different criteria regarding the behavior of a base classifier \(c_{i}\), in order to decide whether it is competent enough to classify a given test sample.

The framework performs a meta-training stage, in which, the meta-features are extracted from each instance belonging to the training and the dynamic selection dataset (DSEL). Then, the extracted meta-features are used to train the meta-classifier \(\lambda\). The meta-classifier is trained to predict whether or not a base classifier \(c_{i}\) is competent enough to classify a given input sample.

When an unknown sample is presented to the system, the meta-features for each base classifier \(c_{i}\) in relation to the input sample are calculated and presented to the meta-classifier. The meta-classifier estimates the competence level of the base classifier \(c_{i}\) for the classification of the query sample. Base classifiers with competence level higher than a pre-defined threshold are selected. If no base classifier is selected, the whole pool is used for classification.

Parameters:

pool_classifiers : list of classifiers (Default = None)

The generated_pool of classifiers trained for the corresponding classification problem. Each base classifiers should support the method “predict”. If None, then the pool of classifiers is a bagging classifier.

meta_classifier : sklearn.estimator (Default = None)

Classifier model used for the meta-classifier. If None, a Multinomial naive Bayes classifier is used.

k : int (Default = 7)

Number of neighbors used to estimate the competence of the base classifiers.

Kp : int (Default = 5)

Number of output profiles used to estimate the competence of the base classifiers.

Hc : float (Default = 1.0)

Sample selection threshold.

selection_threshold : float(Default = 0.5)

Threshold used to select the base classifier. Only the base classifiers with competence level higher than the selection_threshold are selected to compose the ensemble.

mode : String (Default = “selection”)

Determines the mode of META-des that is used (selection, weighting or hybrid).

DFP : Boolean (Default = False)

Determines if the dynamic frienemy pruning is applied.

with_IH : Boolean (Default = False)

Whether the hardness level of the region of competence is used to decide between using the DS algorithm or the KNN for classification of a given query sample.

safe_k : int (default = None)

The size of the indecision region.

IH_rate : float (default = 0.3)

Hardness threshold. If the hardness level of the competence region is lower than the IH_rate the KNN classifier is used. Otherwise, the DS algorithm is used for classification.

random_state : int, RandomState instance or None, optional (default=None)

If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

knn_classifier : {‘knn’, ‘faiss’, None} (Default = ‘knn’)

The algorithm used to estimate the region of competence:

‘knn’ will use KNeighborsClassifier from sklearn
‘faiss’ will use Facebook’s Faiss similarity search through the class FaissKNNClassifier
None, will use sklearn KNeighborsClassifier.

DSEL_perc : float (Default = 0.5)

Percentage of the input data used to fit DSEL. Note: This parameter is only used if the pool of classifier is None or unfitted.

References

Cruz, R.M., Sabourin, R., Cavalcanti, G.D. and Ren, T.I., 2015. META-DES: A dynamic ensemble selection framework using meta-learning. Pattern Recognition, 48(5), pp.1925-1935.

Cruz, R.M., Sabourin, R. and Cavalcanti, G.D., 2015, July. META-des. H: a dynamic ensemble selection technique using meta-learning and a dynamic weighting approach. In Neural Networks (IJCNN), 2015 International Joint Conference on (pp. 1-8).

R. M. O. Cruz, R. Sabourin, and G. D. Cavalcanti, “Dynamic classifier selection: Recent advances and perspectives,” Information Fusion, vol. 41, pp. 195 – 216, 2018.

estimate_competence_from_proba(query, neighbors, probabilities, distances=None)[source]¶

Estimate the competence of each base classifier \(c_i\) the classification of the query sample. This method received an array with the pre-calculated probability estimates for each query.

First, the meta-features of each base classifier \(c_i\) for the classification of the query sample are estimated. These meta-features are passed down to the meta-classifier \(\lambda\) for the competence level estimation.

Parameters:

query : array of shape = [n_samples, n_features]: The test examples.
neighbors : array of shale = [n_samples, n_neighbors]: Indices of the k nearest neighbors according for each test sample.
distances : array of shale = [n_samples, n_neighbors]: Distances of the k nearest neighbors according for each test sample.
probabilities : array of shape = [n_samples, n_classifiers, n_classes]: Probabilities estimates obtained by each each base classifier for each query sample.

Returns:

competences : array of shape = [n_samples, n_classifiers]: The competence level estimated for each base classifier and test example.

fit(X, y)[source]¶

Prepare the DS model by setting the KNN algorithm and pre-processing the information required to apply the DS method.

This method also extracts the meta-features and trains the meta-classifier \(\lambda\) if the meta-classifier was not yet trained.

Parameters:	X : array of shape = [n_samples, n_features] Data used to fit the model. y : array of shape = [n_samples] class labels of each example in X.
Returns:	self

predict(X)[source]¶

Predict the class label for each sample in X.

Parameters:	X : array of shape = [n_samples, n_features] The input data.
Returns:	predicted_labels : array of shape = [n_samples] Predicted class label for each sample in X.

predict_proba(X)[source]¶

Estimates the posterior probabilities for sample in X.

Parameters:	X : array of shape = [n_samples, n_features] The input data.
Returns:	predicted_proba : array of shape = [n_samples, n_classes] Probabilities estimates for each sample in X.

score(X, y, sample_weight=None)[source]¶

Returns the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:	X : array-like, shape = (n_samples, n_features) Test samples. y : array-like, shape = (n_samples) or (n_samples, n_outputs) True labels for X. sample_weight : array-like, shape = [n_samples], optional Sample weights.
Returns:	score : float Mean accuracy of self.predict(X) wrt. y.

select(competences)[source]¶

Selects the base classifiers that obtained a competence level higher than the predefined threshold defined in self.selection_threshold.

Parameters:	competences : array of shape = [n_samples, n_classifiers] The competence level estimated for each base classifier and test example.
Returns:	selected_classifiers : array of shape = [n_samples, n_classifiers] Boolean matrix containing True if the base classifier is selected, False otherwise.