k-Nearest Output Profiles (KNOP)

class deslib.des.knop.KNOP(pool_classifiers=None, k=7, DFP=False, with_IH=False, safe_k=None, IH_rate=0.3, random_state=None, knn_classifier='knn', knne=False, DSEL_perc=0.5, n_jobs=-1, voting='hard')[source]

k-Nearest Output Profiles (KNOP).

This method selects all classifiers that correctly classified at least one sample belonging to the region of competence of the query sample. In this case, the region of competence is estimated using the decisions of the base classifier (output profiles). Thus, the similarity between the query and the validation samples are measured in the decision space rather than the feature space. Each selected classifier has a number of votes equals to the number of samples in the region of competence that it predicts the correct label. The votes obtained by all base classifiers are aggregated to obtain the final ensemble decision.

Parameters:
pool_classifiers : list of classifiers (Default = None)

The generated_pool of classifiers trained for the corresponding classification problem. Each base classifiers should support the method “predict”. If None, then the pool of classifiers is a bagging classifier.

k : int (Default = 7)

Number of neighbors used to estimate the competence of the base classifiers.

DFP : Boolean (Default = False)

Determines if the dynamic frienemy pruning is applied.

with_IH : Boolean (Default = False)

Whether the hardness level of the region of competence is used to decide between using the DS algorithm or the KNN for classification of a given query sample.

safe_k : int (default = None)

The size of the indecision region.

IH_rate : float (default = 0.3)

Hardness threshold. If the hardness level of the competence region is lower than the IH_rate the KNN classifier is used. Otherwise, the DS algorithm is used for classification.

random_state : int, RandomState instance or None, optional (default=None)

If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

knn_classifier : {‘knn’, ‘faiss’, None} (Default = ‘knn’)

The algorithm used to estimate the region of competence:

  • ‘knn’ will use KNeighborsClassifier from sklearn

KNNE available on deslib.utils.knne

  • ‘faiss’ will use Facebook’s Faiss similarity search through the class FaissKNNClassifier
  • None, will use sklearn KNeighborsClassifier.
knne : bool (Default=False)

Whether to use K-Nearest Neighbor Equality (KNNE) for the region of competence estimation.

DSEL_perc : float (Default = 0.5)

Percentage of the input data used to fit DSEL. Note: This parameter is only used if the pool of classifier is None or unfitted.

voting : {‘hard’, ‘soft’}, default=’hard’

If ‘hard’, uses predicted class labels for majority rule voting. Else if ‘soft’, predicts the class label based on the argmax of the sums of the predicted probabilities, which is recommended for an ensemble of well-calibrated classifiers.

n_jobs : int, default=-1

The number of parallel jobs to run. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. Doesn’t affect fit method.

References

Cavalin, Paulo R., Robert Sabourin, and Ching Y. Suen. “LoGID: An adaptive framework combining local and global incremental learning for dynamic selection of ensembles of HMMs.” Pattern Recognition 45.9 (2012): 3544-3556.

Cavalin, Paulo R., Robert Sabourin, and Ching Y. Suen. “Dynamic selection approaches for multiple classifier systems.” Neural Computing and Applications 22.3-4 (2013): 673-688.

Ko, Albert HR, Robert Sabourin, and Alceu Souza Britto Jr. “From dynamic classifier selection to dynamic ensemble selection.” Pattern Recognition 41.5 (2008): 1718-1731.

Britto, Alceu S., Robert Sabourin, and Luiz ES Oliveira. “Dynamic selection of classifiers—a comprehensive review.” Pattern Recognition 47.11 (2014): 3665-3680

R. M. O. Cruz, R. Sabourin, and G. D. Cavalcanti, “Dynamic classifier selection: Recent advances and perspectives,” Information Fusion, vol. 41, pp. 195 – 216, 2018.

estimate_competence_from_proba(probabilities, neighbors=None, distances=None)[source]

The competence of the base classifiers is simply estimated as the number of samples in the region of competence that it correctly classified. This method received an array with the pre-calculated probability estimates for each query.

This information is later used to determine the number of votes obtained for each base classifier.

Parameters:
neighbors : array of shape (n_samples, n_neighbors)

Indices of the k nearest neighbors.

distances : array of shape (n_samples, n_neighbors)

Distances from the k nearest neighbors to the query.

probabilities : array of shape (n_samples, n_classifiers, n_classes)

Probabilities estimates obtained by each each base classifier for each query sample.

Returns:
competences : array of shape (n_samples, n_classifiers)

Competence level estimated for each base classifier and test example.

fit(X, y)[source]

Train the DS model by setting the KNN algorithm and pre-process the information required to apply the DS methods. In this case, the scores of the base classifiers for the dynamic selection dataset (DSEL) are pre-calculated to transform each sample in DSEL into an output profile.

Parameters:
X : array of shape (n_samples, n_features)

Data used to fit the model.

y : array of shape (n_samples)

class labels of each example in X.

Returns:
self
predict(X)[source]

Predict the class label for each sample in X.

Parameters:
X : array of shape (n_samples, n_features)

The input data.

Returns:
predicted_labels : array of shape (n_samples)

Predicted class label for each sample in X.

predict_proba(X)[source]

Estimates the posterior probabilities for sample in X.

Parameters:
X : array of shape (n_samples, n_features)

The input data.

Returns:
predicted_proba : array of shape (n_samples, n_classes)

Probabilities estimates for each sample in X.

score(X, y, sample_weight=None)[source]

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:
X : array-like of shape (n_samples, n_features)

Test samples.

y : array-like of shape (n_samples,) or (n_samples, n_outputs)

True labels for X.

sample_weight : array-like of shape (n_samples,), default=None

Sample weights.

Returns:
score : float

Mean accuracy of self.predict(X) wrt. y.

select(competences)[source]

Select the base classifiers for the classification of the query sample.

Each base classifier can be selected more than once. The number of times a base classifier is selected (votes) is equals to the number of samples it correctly classified in the region of competence.

Parameters:
competences : array of shape (n_samples, n_classifiers)

Competence level estimated for each base classifier and test example.

Returns:
selected_classifiers : array of shape (n_samples, n_classifiers)

Boolean matrix containing True if the base classifier is selected, False otherwise.