DES Multiclass Imbalance (DES-MI)

class deslib.des.des_mi.DESMI(pool_classifiers=None, k=7, pct_accuracy=0.4, alpha=0.9, DFP=False, with_IH=False, safe_k=None, IH_rate=0.3, random_state=None, knn_classifier='knn', DSEL_perc=0.5)[source]

Dynamic ensemble Selection for multi-class imbalanced datasets (DES-MI).

Parameters:
pool_classifiers : list of classifiers (Default = None)

The generated_pool of classifiers trained for the corresponding classification problem. Each base classifiers should support the method “predict”. If None, then the pool of classifiers is a bagging classifier.

k : int (Default = 7)

Number of neighbors used to estimate the competence of the base classifiers.

DFP : Boolean (Default = False)

Determines if the dynamic frienemy pruning is applied.

with_IH : Boolean (Default = False)

Whether the hardness level of the region of competence is used to decide between using the DS algorithm or the KNN for classification of a given query sample.

safe_k : int (default = None)

The size of the indecision region.

IH_rate : float (default = 0.3)

Hardness threshold. If the hardness level of the competence region is lower than the IH_rate the KNN classifier is used. Otherwise, the DS algorithm is used for classification.

alpha : float (Default = 0.9)

Scaling coefficient to regulate the weight value

random_state : int, RandomState instance or None, optional (default=None)

If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

knn_classifier : {‘knn’, ‘faiss’, None} (Default = ‘knn’)

The algorithm used to estimate the region of competence:

  • ‘knn’ will use KNeighborsClassifier from sklearn
  • ‘faiss’ will use Facebook’s Faiss similarity search through the class FaissKNNClassifier
  • None, will use sklearn KNeighborsClassifier.
DSEL_perc : float (Default = 0.5)

Percentage of the input data used to fit DSEL. Note: This parameter is only used if the pool of classifier is None or unfitted.

References

García, S.; Zhang, Z.-L.; Altalhi, A.; Alshomrani, S. & Herrera, F. “Dynamic ensemble selection for multi-class imbalanced datasets.” Information Sciences, 2018, 445-446, 22 - 37

Britto, Alceu S., Robert Sabourin, and Luiz ES Oliveira. “Dynamic selection of classifiers—a comprehensive review.” Pattern Recognition 47.11 (2014): 3665-3680

R. M. O. Cruz, R. Sabourin, and G. D. Cavalcanti, “Dynamic classifier selection: Recent advances and perspectives,” Information Fusion, vol. 41, pp. 195 – 216, 2018.

estimate_competence(query, neighbors, distances=None, predictions=None)[source]

estimate the competence level of each base classifier \(c_{i}\) for the classification of the query sample. Returns a ndarray containing the competence level of each base classifier.

The competence is estimated using the accuracy criteria. The accuracy is estimated by the weighted results of classifiers who correctly classify the members in the competence region. The weight of member \(x_i\) is related to the number of samples of the same class of \(x_i\) in the training dataset. For detail, please see the first reference, Algorithm 2.

Parameters:
query : array cf shape = [n_samples, n_features]

The query sample.

neighbors : array of shale = [n_samples, n_neighbors]

Indices of the k nearest neighbors according for each test sample.

distances : array of shale = [n_samples, n_neighbors]

Distances of the k nearest neighbors according for each test sample.

predictions : array of shape = [n_samples, n_classifiers]

Predictions of the base classifiers for all test examples.

Returns:
accuracy : array of shape = [n_samples, n_classifiers}

Local Accuracy estimates (competences) of the base classifiers for all query samples.

fit(X, y)[source]

Prepare the DS model by setting the KNN algorithm and pre-processing the information required to apply the DS methods

Parameters:
X : array of shape = [n_samples, n_features]

The input data.

y : array of shape = [n_samples]

class labels of each example in X.

Returns:
self
predict(X)[source]

Predict the class label for each sample in X.

Parameters:
X : array of shape = [n_samples, n_features]

The input data.

Returns:
predicted_labels : array of shape = [n_samples]

Predicted class label for each sample in X.

predict_proba(X)[source]

Estimates the posterior probabilities for sample in X.

Parameters:
X : array of shape = [n_samples, n_features]

The input data.

Returns:
predicted_proba : array of shape = [n_samples, n_classes]

Probabilities estimates for each sample in X.

score(X, y, sample_weight=None)[source]

Returns the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:
X : array-like, shape = (n_samples, n_features)

Test samples.

y : array-like, shape = (n_samples) or (n_samples, n_outputs)

True labels for X.

sample_weight : array-like, shape = [n_samples], optional

Sample weights.

Returns:
score : float

Mean accuracy of self.predict(X) wrt. y.

select(competences)[source]

Select an ensemble containing the N most accurate classifiers for the classification of the query sample.

Parameters:
competences : array of shape = [n_samples, n_classifiers]

Competence estimates of each base classifiers for all query samples.

Returns:
selected_classifiers : array of shape = [n_samples, self.N]

Matrix containing the indices of the N selected base classifier for each test example.