DCS base class

class deslib.dcs.base.BaseDCS(pool_classifiers=None, k=7, DFP=False, safe_k=None, with_IH=False, IH_rate=0.3, selection_method='best', diff_thresh=0.1, random_state=None, knn_classifier='knn', DSEL_perc=0.5)[source]

Base class for a Dynamic Classifier Selection (dcs) method. All dynamic classifier selection classes should inherit from this class.

Warning: This class should not be used directly, use derived classes instead.

Parameters:
pool_classifiers : list of classifiers (Default = None)

The generated_pool of classifiers trained for the corresponding classification problem. Each base classifiers should support the method “predict”. If None, then the pool of classifiers is a bagging classifier.

k : int (Default = 7)

Number of neighbors used to estimate the competence of the base classifiers.

DFP : Boolean (Default = False)

Determines if the dynamic frienemy pruning is applied.

with_IH : Boolean (Default = False)

Whether the hardness level of the region of competence is used to decide between using the DS algorithm or the KNN for classification of a given query sample.

safe_k : int (default = None)

The size of the indecision region.

IH_rate : float (default = 0.3)

Hardness threshold. If the hardness level of the competence region is lower than the IH_rate the KNN classifier is used. Otherwise, the DS algorithm is used for classification.

selection_method : String (Default = “best”)

Determines which method is used to select the base classifier after the competences are estimated.

diff_thresh : float (Default = 0.1)

Threshold to measure the difference between the competence level of the base classifiers for the random and diff selection schemes. If the difference is lower than the threshold, their performance are considered equivalent.

random_state : int, RandomState instance or None, optional (default=None)

If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

knn_classifier : {‘knn’, ‘faiss’, None} (Default = ‘knn’)

The algorithm used to estimate the region of competence:

  • ‘knn’ will use KNeighborsClassifier from sklearn
  • ‘faiss’ will use Facebook’s Faiss similarity search through the class FaissKNNClassifier
  • None, will use sklearn KNeighborsClassifier.
DSEL_perc : float (Default = 0.5)

Percentage of the input data used to fit DSEL. Note: This parameter is only used if the pool of classifier is None or unfitted.

References

Woods, Kevin, W. Philip Kegelmeyer, and Kevin Bowyer. “Combination of multiple classifiers using local accuracy estimates.” IEEE transactions on pattern analysis and machine intelligence 19.4 (1997): 405-410.

Britto, Alceu S., Robert Sabourin, and Luiz ES Oliveira. “Dynamic selection of classifiers—a comprehensive review.” Pattern Recognition 47.11 (2014): 3665-3680.

G. Giacinto and F. Roli, Methods for Dynamic Classifier Selection. 10th Int. Conference on Image Analysis and Proc., Venice, Italy (1999), 659-664.

R. M. O. Cruz, R. Sabourin, and G. D. Cavalcanti, “Dynamic classifier selection: Recent advances and perspectives,” Information Fusion, vol. 41, pp. 195 – 216, 2018.

classify_with_ds(query, predictions, probabilities=None, neighbors=None, distances=None, DFP_mask=None)[source]

Predicts the class label of the corresponding query sample.

If self.selection_method == “all”, the majority voting scheme is used to aggregate the predictions of all classifiers with the max competence level estimates for each test examples.

Parameters:
query : array of shape = [n_samples, n_features]

The test examples.

predictions : array of shape = [n_samples, n_classifiers]

Predictions of the base classifiers for all test examples

probabilities : array of shape = [n_samples, n_classifiers, n_classes]

Probabilities estimates of each base classifier for all test examples (For methods that always require probabilities from the base classifiers)

neighbors : array of shale = [n_samples, n_neighbors]

Indices of the k nearest neighbors according for each test sample

distances : array of shale = [n_samples, n_neighbors]

Distances of the k nearest neighbors according for each test sample

DFP_mask : array of shape = [n_samples, n_classifiers]

Mask containing 1 for the selected base classifier and 0 otherwise.

Returns:
predicted_label : array of shape = [n_samples]

The predicted label for each query

estimate_competence(query, neighbors, distances=None, predictions=None)[source]

Estimate the competence of each base classifier for the classification of the query sample.

Parameters:
query : array of shape = [n_samples, n_features]

The test examples.

neighbors : array of shale = [n_samples, n_neighbors]

Indices of the k nearest neighbors according for each test sample

distances : array of shale = [n_samples, n_neighbors]

Distances of the k nearest neighbors according for each test sample

predictions : array of shape = [n_samples, n_classifiers]

Predictions of the base classifiers for the test examples.

Returns:
competences : array of shape = [n_samples, n_classifiers]

Competence level estimated for each base classifier and test example.

predict_proba_with_ds(query, predictions, probabilities, neighbors=None, distances=None, DFP_mask=None)[source]

Predicts the posterior probabilities of the corresponding query sample.

If self.selection_method == “all”, get the probability estimates of the selected ensemble. Otherwise, the technique gets the probability estimates from the selected base classifier

Parameters:
query : array of shape = [n_samples, n_features]

The test examples.

predictions : array of shape = [n_samples, n_classifiers]

Predictions of the base classifiers for all test examples

probabilities : array of shape = [n_samples, n_classifiers, n_classes]

The predictions of each base classifier for all samples (For methods that always require probabilities from the base classifiers).

neighbors : array of shape = [n_samples, n_neighbors]

Indices of the k nearest neighbors according for each test sample

distances : array of shale = [n_samples, n_neighbors]

Distances of the k nearest neighbors according for each test sample

DFP_mask : array of shape = [n_samples, n_classifiers]

Mask containing 1 for the selected base classifier and 0 otherwise.

Returns:
predicted_proba: array of shape = [n_samples, n_classes]

Posterior probabilities estimates for each test example.

select(competences)[source]

Select the most competent classifier for the classification of the query sample given the competence level estimates. Four selection schemes are available.

Best : The base classifier with the highest competence level is selected. In cases where more than one base classifier achieves the same competence level, the one with the lowest index is selected. This method is the standard for the LCA, OLA, MLA techniques.

Diff : Select the base classifier that is significantly better than the others in the pool (when the difference between its competence level and the competence level of the other base classifiers is higher than a predefined threshold). If no base classifier is significantly better, the base classifier is selected randomly among the member with equivalent competence level.

Random : Selects a random base classifier among all base classifiers that achieved the same competence level.

ALL : all base classifiers with the max competence level estimates are selected (note that in this case the DCS technique becomes a DES technique).

Parameters:
competences : array of shape = [n_samples, n_classifiers]

Competence level estimated for each base classifier and test example.

Returns:
selected_classifiers : array of shape [n_samples]

Indices of the selected base classifier for each sample. If the selection_method is set to ‘all’, a boolean matrix is returned, containing True for the selected base classifiers, otherwise false.