class deslib.des.probabilistic.BaseProbabilistic(pool_classifiers=None, k=None, DFP=False, with_IH=False, safe_k=None, IH_rate=0.3, mode='selection', voting='hard', selection_threshold=None, random_state=None, knn_classifier='knn', DSEL_perc=0.5, n_jobs=-1)[source]

Base class for a DS method based on the potential function model. All DS methods based on the Potential function should inherit from this class.

Warning: This class should not be used directly. Use derived classes instead.

estimate_competence(competence_region, distances, predictions=None)[source]

estimate the competence of each base classifier \(c_{i}\) using the source of competence \(C_{src}\) and the potential function model. The source of competence \(C_{src}\) for all data points in DSEL is already pre-computed in the fit() steps.

\[\delta_{i,j} = \frac{\sum_{k=1}^{N}C_{src} \: exp(-d (\mathbf{x}_{k}, \mathbf{x}_{q})^{2})} {exp( -d (\mathbf{x}_{k}, \mathbf{x}_{q})^{2} )}\]
competence_region : array of shape (n_samples, n_neighbors)

Indices of the k nearest neighbors according for each test sample.

distances : array of shape (n_samples, n_neighbors)

Distances from the k nearest neighbors to the query.

predictions : array of shape (n_samples, n_classifiers)

Predictions of the base classifiers for all test examples.

competences : array of shape (n_samples, n_classifiers)

Competence level estimated for each base classifier and test example.

fit(X, y)[source]

Train the DS model by setting the KNN algorithm and pre-processing the information required to apply the DS methods. In the case of probabilistic techniques, the source of competence (C_src) is calculated for each data point in DSEL in order to speed up the process during the testing phases.

C_src is estimated with the source_competence() function that is overridden by each DS method based on this paradigm.

X : array of shape (n_samples, n_features)

Data used to fit the model.

y : array of shape (n_samples)

class labels of each example in X.

self : object

Returns self.

static potential_func(dist)[source]

Gaussian potential function to decrease the influence of the source of competence as the distance between \(\mathbf{x}_{k}\) and the query \(\mathbf{x}_{q}\) increases. The function is computed using the following equation:

\[potential = exp( -dist (\mathbf{x}_{k}, \mathbf{x}_{q})^{2} )\]

where dist represents the Euclidean distance between \(\mathbf{x}_{k}\) and \(\mathbf{x}_{q}\)

dist : array of shape = [self.n_samples]

distance between the corresponding sample to the query

The result of the potential function for each value in (dist)

Selects the base classifiers that obtained a competence level higher than the predefined threshold. In this case, the threshold indicates the competence of the random classifier.

competences : array of shape (n_samples, n_classifiers)

Competence level estimated for each base classifier and test example.

selected_classifiers : array of shape (n_samples, n_classifiers)

Boolean matrix containing True if the base classifier is selected, False otherwise.


Method used to estimate the source of competence at each data point.

Each DS technique based on this paradigm should define its computation of C_src

C_src : array of shape (n_samples, n_classifiers)

The competence source for each base classifier at each data point.