Single Best

class deslib.static.single_best.SingleBest(pool_classifiers=None, scoring=None, random_state=None, n_jobs=-1)[source]

Classification method that selects the classifier in the pool with highest score to be used for classification. Usually, the performance of the single best classifier is estimated based on the validation data.

Parameters:
pool_classifiers : list of classifiers (Default = None)

The generated_pool of classifiers trained for the corresponding classification problem. Each base classifiers should support the method “predict”. If None, then the pool of classifiers is a bagging classifier.

scoring : string, callable (default = None)

A single string or a callable to evaluate the predictions on the validation set.

random_state : int, RandomState instance or None, optional (default=None)

If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

n_jobs : int, default=-1

The number of parallel jobs to run. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. Doesn’t affect fit method.

References

Britto, Alceu S., Robert Sabourin, and Luiz ES Oliveira. “Dynamic selection of classifiers—a comprehensive review.” Pattern Recognition 47.11 (2014): 3665-3680.

Kuncheva, Ludmila I. Combining pattern classifiers: methods and algorithms. John Wiley & Sons, 2004.

R. M. O. Cruz, R. Sabourin, and G. D. Cavalcanti, “Dynamic classifier selection: Recent advances and perspectives,” Information Fusion, vol. 41, pp. 195 – 216, 2018.

fit(X, y)[source]

Fit the model by selecting the base classifier with the highest accuracy in the dataset. The single best classifier is kept in self.best_clf and its index is kept in self.best_clf_index.

Parameters:
X : array of shape (n_samples, n_features)

Data used to fit the model.

y : array of shape (n_samples)

class labels of each example in X.

predict(X)[source]

Predict the label of each sample in X and returns the predicted label.

Parameters:
X : array of shape (n_samples, n_features)

The data to be classified

Returns:
predicted_labels : array of shape (n_samples)

Predicted class for each sample in X.

predict_proba(X)[source]

Estimates the posterior probabilities for each class for each sample in X. The returned probability estimates for all classes are ordered by the label of classes.

Parameters:
X : array of shape (n_samples, n_features)

The data to be classified

Returns:
predicted_proba : array of shape (n_samples, n_classes)

Posterior probabilities estimates for each class.

score(X, y, sample_weight=None)[source]

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:
X : array-like of shape (n_samples, n_features)

Test samples.

y : array-like of shape (n_samples,) or (n_samples, n_outputs)

True labels for X.

sample_weight : array-like of shape (n_samples,), default=None

Sample weights.

Returns:
score : float

Mean accuracy of self.predict(X) wrt. y.