Probabilistic¶

class deslib.des.probabilistic.Probabilistic(pool_classifiers, k=None, DFP=False, with_IH=False, safe_k=None, IH_rate=0.3, mode='selection', selection_threshold=None)[source]¶

Base class for a DS method based on the potential function model. ALL DS methods based on the Potential function should inherit from this class

Warning: This class should not be used directly. Use derived classes instead.

Parameters:

pool_classifiers : list of classifiers: The generated_pool of classifiers trained for the corresponding classification problem. The classifiers should support methods “predict” and “predict_proba”.
k : int (Default = None): Number of neighbors used to estimate the competence of the base classifiers. If k = None, the whole dynamic selection dataset is used, and the influence of each sample is based on its distance to the query.
DFP : Boolean (Default = False): Determines if the dynamic frienemy pruning is applied.
with_IH : Boolean (Default = False): Whether the hardness level of the region of competence is used to decide between using the DS algorithm or the KNN for classification of a given query sample.
safe_k : int (default = None): The size of the indecision region.
IH_rate : float (default = 0.3): Hardness threshold. If the hardness level of the competence region is lower than the IH_rate the KNN classifier is used. Otherwise, the DS algorithm is used for classification.
mode : String (Default = “selection”): Whether the technique will perform dynamic selection, dynamic weighting or an hybrid approach for classification.

References

T.Woloszynski, M. Kurzynski, A probabilistic model of classifier competence for dynamic ensemble selection, Pattern Recognition 44 (2011) 2656–2668.

Rastrigin, R. Erenstein, Method of collective recognition, Vol. 595, 1981, (in Russian).

Britto, Alceu S., Robert Sabourin, and Luiz ES Oliveira. “Dynamic selection of classifiers—a comprehensive review.” Pattern Recognition 47.11 (2014): 3665-3680.

R. M. O. Cruz, R. Sabourin, and G. D. Cavalcanti, “Dynamic classifier selection: Recent advances and perspectives,” Information Fusion, vol. 41, pp. 195 – 216, 2018.

estimate_competence(query, predictions=None)[source]¶

estimate the competence of each base classifier \(c_{i}\) using the source of competence \(C_{src}\) and the potential function model. The source of competence \(C_{src}\) for all data points in DSEL is already pre-computed in the fit() steps.

\[\delta_{i,j} = \frac{\sum_{k=1}^{N}C_{src} \: exp( -d (\mathbf{x}_{k}, \mathbf{x}_{q})^{2} )} {exp( -d (\mathbf{x}_{k}, \mathbf{x}_{q})^{2} )}\]

Parameters:	query : array of shape = [n_samples, n_features] The test examples. predictions : array of shape = [n_samples, n_classifiers] Predictions of the base classifiers for all test examples. Returns ——- competences : array of shape = [n_samples, n_classifiers] Competence level estimated for each base classifier and test example.

fit(X, y)[source]¶

Train the DS model by setting the KNN algorithm and pre-processing the information required to apply the DS methods. In the case of probabilistic techniques, the source of competence (C_src) is calculated for each data point in DSEL in order to speed up the process during the testing phases.

C_src is estimated with the source_competence() function that is overridden by each DS method based on this paradigm

Parameters:	X : array of shape = [n_samples, n_features] Data used to fit the model. y : array of shape = [n_samples] class labels of each example in X.
Returns:	self

static potential_func(dist)[source]¶

Gaussian potential function to decrease the influence of the source of competence as the distance between \(\mathbf{x}_{k}\) and the query \(\mathbf{x}_{q}\) increases. The function is computed using the following equation:

\[potential = exp( -dist (\mathbf{x}_{k}, \mathbf{x}_{q})^{2} )\]

where dist represents the Euclidean distance between \(\mathbf{x}_{k}\) and \(\mathbf{x}_{q}\)

Parameters:	dist : array of shape = [self.n_samples] distance between the corresponding sample to the query
Returns:	The result of the potential function for each value in (dist)

select(competences)[source]¶

Selects the base classifiers that obtained a competence level higher than the predefined threshold. In this case, the threshold indicates the competence of the random classifier.

Parameters:	competences : array of shape = [n_samples, n_classifiers] Competence level estimated for each base classifier and test example.
Returns:	selected_classifiers : array of shape = [n_samples, n_classifiers] Boolean matrix containing True if the base classifier is select, False otherwise.

source_competence()[source]¶

Method used to estimate the source of competence at each data point.

Each DS technique based on this paradigm should define its computation of C_src

Returns:	C_src : array of shape = [n_samples, n_classifiers] The competence source for each base classifier at each data point.