Probabilistic

class deslib.des.probabilistic.BaseProbabilistic(pool_classifiers=None, k=None, DFP=False, with_IH=False, safe_k=None, IH_rate=0.3, mode='selection', voting='hard', selection_threshold=None, random_state=None, knn_classifier='knn', knn_metric='minkowski', DSEL_perc=0.5, n_jobs=-1)[source]

Base class for a DS method based on the potential function model. All DS methods based on the Potential function should inherit from this class.

Warning: This class should not be used directly. Use derived classes instead.

estimate_competence(competence_region, distances, predictions=None)[source]

estimate the competence of each base classifier \(c_{i}\) using the source of competence \(C_{src}\) and the potential function model. The source of competence \(C_{src}\) for all data points in DSEL is already pre-computed in the fit() steps.

\[\delta_{i,j} = \frac{\sum_{k=1}^{N}C_{src} \: exp(-d (\mathbf{x}_{k}, \mathbf{x}_{q})^{2})} {exp( -d (\mathbf{x}_{k}, \mathbf{x}_{q})^{2} )}\]
Parameters:
competence_region : array of shape (n_samples, n_neighbors)

Indices of the k nearest neighbors according for each test sample.

distances : array of shape (n_samples, n_neighbors)

Distances from the k nearest neighbors to the query.

predictions : array of shape (n_samples, n_classifiers)

Predictions of the base classifiers for all test examples.

Returns:
competences : array of shape (n_samples, n_classifiers)

Competence level estimated for each base classifier and test example.

fit(X, y)[source]

Train the DS model by setting the KNN algorithm and pre-processing the information required to apply the DS methods. In the case of probabilistic techniques, the source of competence (C_src) is calculated for each data point in DSEL in order to speed up the process during the testing phases.

C_src is estimated with the source_competence() function that is overridden by each DS method based on this paradigm.

Parameters:
X : array of shape (n_samples, n_features)

Data used to fit the model.

y : array of shape (n_samples)

class labels of each example in X.

Returns:
self : object

Returns self.

static potential_func(dist)[source]

Gaussian potential function to decrease the influence of the source of competence as the distance between \(\mathbf{x}_{k}\) and the query \(\mathbf{x}_{q}\) increases. The function is computed using the following equation:

\[potential = exp( -dist (\mathbf{x}_{k}, \mathbf{x}_{q})^{2} )\]

where dist represents the Euclidean distance between \(\mathbf{x}_{k}\) and \(\mathbf{x}_{q}\)

Parameters:
dist : array of shape = [self.n_samples]

distance between the corresponding sample to the query

Returns:
The result of the potential function for each value in (dist)
select(competences)[source]

Selects the base classifiers that obtained a competence level higher than the predefined threshold. In this case, the threshold indicates the competence of the random classifier.

Parameters:
competences : array of shape (n_samples, n_classifiers)

Competence level estimated for each base classifier and test example.

Returns:
selected_classifiers : array of shape (n_samples, n_classifiers)

Boolean matrix containing True if the base classifier is selected, False otherwise.

source_competence()[source]

Method used to estimate the source of competence at each data point.

Each DS technique based on this paradigm should define its computation of C_src

Returns:
C_src : array of shape (n_samples, n_classifiers)

The competence source for each base classifier at each data point.