META-DES¶

class deslib.des.meta_des.METADES(pool_classifiers=None, meta_classifier=None, k=7, Kp=5, Hc=1.0, selection_threshold=0.5, mode='selection', DFP=False, with_IH=False, safe_k=None, IH_rate=0.3, random_state=None, knn_classifier='knn', knne=False, knn_metric='minkowski', DSEL_perc=0.5, n_jobs=-1, voting='hard')[source]¶

Meta learning for dynamic ensemble selection (META-DES).

The META-DES framework is based on the assumption that the dynamic ensemble selection problem can be considered as a meta-problem. This meta-problem uses different criteria regarding the behavior of a base classifier \(c_{i}\), in order to decide whether it is competent enough to classify a given test sample.

The framework performs a meta-training stage, in which, the meta-features are extracted from each instance belonging to the training and the dynamic selection dataset (DSEL). Then, the extracted meta-features are used to train the meta-classifier \(\lambda\). The meta-classifier is trained to predict whether or not a base classifier \(c_{i}\) is competent enough to classify a given input sample.

When an unknown sample is presented to the system, the meta-features for each base classifier \(c_{i}\) in relation to the input sample are calculated and presented to the meta-classifier. The meta-classifier estimates the competence level of the base classifier \(c_{i}\) for the classification of the query sample. Base classifiers with competence level higher than a pre-defined threshold are selected. If no base classifier is selected, the whole pool is used for classification.

Parameters:

pool_classifiers : list of classifiers (Default = None)

The generated_pool of classifiers trained for the corresponding classification problem. Each base classifiers should support the method “predict”. If None, then the pool of classifiers is a bagging classifier.

meta_classifier : sklearn.estimator (Default = None)

Classifier model used for the meta-classifier. If None, a Multinomial naive Bayes classifier is used.

k : int (Default = 7)

Number of neighbors used to estimate the competence of the base classifiers.

Kp : int (Default = 5)

Number of output profiles used to estimate the competence of the base classifiers.

Hc : float (Default = 1.0)

Sample selection threshold.

selection_threshold : float(Default = 0.5)

Threshold used to select the base classifier. Only the base classifiers with competence level higher than the selection_threshold are selected to compose the ensemble.

mode : String (Default = “selection”)

Determines the mode of META-des that is used (selection, weighting or hybrid).

DFP : Boolean (Default = False)

Determines if the dynamic frienemy pruning is applied.

with_IH : Boolean (Default = False)

Whether the hardness level of the region of competence is used to decide between using the DS algorithm or the KNN for classification of a given query sample.

safe_k : int (default = None)

The size of the indecision region.

IH_rate : float (default = 0.3)

Hardness threshold. If the hardness level of the competence region is lower than the IH_rate the KNN classifier is used. Otherwise, the DS algorithm is used for classification.

random_state : int, RandomState instance or None, optional (default=None)

If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

knn_classifier : {‘knn’, ‘faiss’, None} (Default = ‘knn’)

The algorithm used to estimate the region of competence:

‘knn’ will use KNeighborsClassifier from sklearn

KNNE available on deslib.utils.knne

‘faiss’ will use Facebook’s Faiss similarity search through the class FaissKNNClassifier
None, will use sklearn KNeighborsClassifier.

knn_metric : {‘minkowski’, ‘cosine’, ‘mahalanobis’} (Default = ‘minkowski’)

The metric used by the k-NN classifier to estimate distances.

‘minkowski’ will use minkowski distance.
‘cosine’ will use the cosine distance.
‘mahalanobis’ will use the mahalonibis distance.

Note: This parameter only affects the neighborhood search applied in the feature space.

knne : bool (Default=False)

Whether to use K-Nearest Neighbor Equality (KNNE) for the region of competence estimation.

DSEL_perc : float (Default = 0.5)

Percentage of the input data used to fit DSEL. Note: This parameter is only used if the pool of classifier is None or unfitted.

voting : {‘hard’, ‘soft’}, default=’hard’

If ‘hard’, uses predicted class labels for majority rule voting. Else if ‘soft’, predicts the class label based on the argmax of the sums of the predicted probabilities, which is recommended for an ensemble of well-calibrated classifiers.

n_jobs : int, default=-1

The number of parallel jobs to run. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. Doesn’t affect fit method.

References

Cruz, R.M., Sabourin, R., Cavalcanti, G.D. and Ren, T.I., 2015. META-DES: A dynamic ensemble selection framework using meta-learning. Pattern Recognition, 48(5), pp.1925-1935.

Cruz, R.M., Sabourin, R. and Cavalcanti, G.D., 2015, July. META-des. H: a dynamic ensemble selection technique using meta-learning and a dynamic weighting approach. In Neural Networks (IJCNN), 2015 International Joint Conference on (pp. 1-8).

R. M. O. Cruz, R. Sabourin, and G. D. Cavalcanti, “Dynamic classifier selection: Recent advances and perspectives,” Information Fusion, vol. 41, pp. 195 – 216, 2018.

estimate_competence_from_proba(neighbors, probabilities, distances=None)[source]¶

Estimate the competence of each base classifier \(c_i\) the classification of the query sample. This method received an array with the pre-calculated probability estimates for each query.

First, the meta-features of each base classifier \(c_i\) for the classification of the query sample are estimated. These meta-features are passed down to the meta-classifier \(\lambda\) for the competence level estimation.

Parameters:	neighbors : array of shape (n_samples, n_neighbors) Indices of the k nearest neighbors according for each test sample. distances : array of shape (n_samples, n_neighbors) Distances from the k nearest neighbors to the query. probabilities : array of shape (n_samples, n_classifiers, n_classes) Probabilities estimates obtained by each each base classifier for each query sample.
Returns:	competences : array of shape (n_samples, n_classifiers) The competence level estimated for each base classifier and test example.

fit(X, y)[source]¶

Prepare the DS model by setting the KNN algorithm and pre-processing the information required to apply the DS method.

This method also extracts the meta-features and trains the meta-classifier \(\lambda\) if the meta-classifier was not yet trained.

Parameters:	X : array of shape (n_samples, n_features) Data used to fit the model. y : array of shape (n_samples) class labels of each example in X.
Returns:	self

predict(X)[source]¶

Predict the class label for each sample in X.

Parameters:	X : array of shape (n_samples, n_features) The input data.
Returns:	predicted_labels : array of shape (n_samples) Predicted class label for each sample in X.

predict_proba(X)[source]¶

Estimates the posterior probabilities for sample in X.

Parameters:	X : array of shape (n_samples, n_features) The input data.
Returns:	predicted_proba : array of shape (n_samples, n_classes) Probabilities estimates for each sample in X.

score(X, y, sample_weight=None)[source]¶

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:	X : array-like of shape (n_samples, n_features) Test samples. y : array-like of shape (n_samples,) or (n_samples, n_outputs) True labels for X. sample_weight : array-like of shape (n_samples,), default=None Sample weights.
Returns:	score : float Mean accuracy of `self.predict(X)` wrt. y.

select(competences)[source]¶

Selects the base classifiers that obtained a competence level higher than the predefined threshold defined in self.selection_threshold.

Parameters:	competences : array of shape (n_samples, n_classifiers) The competence level estimated for each base classifier and test example.
Returns:	selected_classifiers : array of shape (n_samples, n_classifiers) Boolean matrix containing True if the base classifier is selected, False otherwise.