META-DES

class deslib.des.meta_des.METADES(pool_classifiers=None, meta_classifier=None, k=7, Kp=5, Hc=1.0, selection_threshold=0.5, mode='selection', DFP=False, with_IH=False, safe_k=None, IH_rate=0.3, random_state=None, knn_classifier='knn', DSEL_perc=0.5)[source]

Meta learning for dynamic ensemble selection (META-DES).

The META-DES framework is based on the assumption that the dynamic ensemble selection problem can be considered as a meta-problem. This meta-problem uses different criteria regarding the behavior of a base classifier \(c_{i}\), in order to decide whether it is competent enough to classify a given test sample.

The framework performs a meta-training stage, in which, the meta-features are extracted from each instance belonging to the training and the dynamic selection dataset (DSEL). Then, the extracted meta-features are used to train the meta-classifier \(\lambda\). The meta-classifier is trained to predict whether or not a base classifier \(c_{i}\) is competent enough to classify a given input sample.

When an unknown sample is presented to the system, the meta-features for each base classifier \(c_{i}\) in relation to the input sample are calculated and presented to the meta-classifier. The meta-classifier estimates the competence level of the base classifier \(c_{i}\) for the classification of the query sample. Base classifiers with competence level higher than a pre-defined threshold are selected. If no base classifier is selected, the whole pool is used for classification.

Parameters:
pool_classifiers : list of classifiers (Default = None)

The generated_pool of classifiers trained for the corresponding classification problem. Each base classifiers should support the method “predict”. If None, then the pool of classifiers is a bagging classifier.

meta_classifier : sklearn.estimator (Default = None)

Classifier model used for the meta-classifier. If None, a Multinomial naive Bayes classifier is used.

k : int (Default = 7)

Number of neighbors used to estimate the competence of the base classifiers.

Kp : int (Default = 5)

Number of output profiles used to estimate the competence of the base classifiers.

Hc : float (Default = 1.0)

Sample selection threshold.

selection_threshold : float(Default = 0.5)

Threshold used to select the base classifier. Only the base classifiers with competence level higher than the selection_threshold are selected to compose the ensemble.

mode : String (Default = “selection”)

Determines the mode of META-des that is used (selection, weighting or hybrid).

DFP : Boolean (Default = False)

Determines if the dynamic frienemy pruning is applied.

with_IH : Boolean (Default = False)

Whether the hardness level of the region of competence is used to decide between using the DS algorithm or the KNN for classification of a given query sample.

safe_k : int (default = None)

The size of the indecision region.

IH_rate : float (default = 0.3)

Hardness threshold. If the hardness level of the competence region is lower than the IH_rate the KNN classifier is used. Otherwise, the DS algorithm is used for classification.

random_state : int, RandomState instance or None, optional (default=None)

If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

knn_classifier : {‘knn’, ‘faiss’, None} (Default = ‘knn’)

The algorithm used to estimate the region of competence:

  • ‘knn’ will use KNeighborsClassifier from sklearn
  • ‘faiss’ will use Facebook’s Faiss similarity search through the class FaissKNNClassifier
  • None, will use sklearn KNeighborsClassifier.
DSEL_perc : float (Default = 0.5)

Percentage of the input data used to fit DSEL. Note: This parameter is only used if the pool of classifier is None or unfitted.

References

Cruz, R.M., Sabourin, R., Cavalcanti, G.D. and Ren, T.I., 2015. META-DES: A dynamic ensemble selection framework using meta-learning. Pattern Recognition, 48(5), pp.1925-1935.

Cruz, R.M., Sabourin, R. and Cavalcanti, G.D., 2015, July. META-des. H: a dynamic ensemble selection technique using meta-learning and a dynamic weighting approach. In Neural Networks (IJCNN), 2015 International Joint Conference on (pp. 1-8).

R. M. O. Cruz, R. Sabourin, and G. D. Cavalcanti, “Dynamic classifier selection: Recent advances and perspectives,” Information Fusion, vol. 41, pp. 195 – 216, 2018.

estimate_competence_from_proba(query, neighbors, probabilities, distances=None)[source]

Estimate the competence of each base classifier \(c_i\) the classification of the query sample. This method received an array with the pre-calculated probability estimates for each query.

First, the meta-features of each base classifier \(c_i\) for the classification of the query sample are estimated. These meta-features are passed down to the meta-classifier \(\lambda\) for the competence level estimation.

Parameters:
query : array of shape = [n_samples, n_features]

The test examples.

neighbors : array of shale = [n_samples, n_neighbors]

Indices of the k nearest neighbors according for each test sample.

distances : array of shale = [n_samples, n_neighbors]

Distances of the k nearest neighbors according for each test sample.

probabilities : array of shape = [n_samples, n_classifiers, n_classes]

Probabilities estimates obtained by each each base classifier for each query sample.

Returns:
competences : array of shape = [n_samples, n_classifiers]

The competence level estimated for each base classifier and test example.

fit(X, y)[source]

Prepare the DS model by setting the KNN algorithm and pre-processing the information required to apply the DS method.

This method also extracts the meta-features and trains the meta-classifier \(\lambda\) if the meta-classifier was not yet trained.

Parameters:
X : array of shape = [n_samples, n_features]

Data used to fit the model.

y : array of shape = [n_samples]

class labels of each example in X.

Returns:
self
predict(X)[source]

Predict the class label for each sample in X.

Parameters:
X : array of shape = [n_samples, n_features]

The input data.

Returns:
predicted_labels : array of shape = [n_samples]

Predicted class label for each sample in X.

predict_proba(X)[source]

Estimates the posterior probabilities for sample in X.

Parameters:
X : array of shape = [n_samples, n_features]

The input data.

Returns:
predicted_proba : array of shape = [n_samples, n_classes]

Probabilities estimates for each sample in X.

score(X, y, sample_weight=None)[source]

Returns the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:
X : array-like, shape = (n_samples, n_features)

Test samples.

y : array-like, shape = (n_samples) or (n_samples, n_outputs)

True labels for X.

sample_weight : array-like, shape = [n_samples], optional

Sample weights.

Returns:
score : float

Mean accuracy of self.predict(X) wrt. y.

select(competences)[source]

Selects the base classifiers that obtained a competence level higher than the predefined threshold defined in self.selection_threshold.

Parameters:
competences : array of shape = [n_samples, n_classifiers]

The competence level estimated for each base classifier and test example.

Returns:
selected_classifiers : array of shape = [n_samples, n_classifiers]

Boolean matrix containing True if the base classifier is selected, False otherwise.