This file contains the implementation of different measures of instance hardness.
hardness_region_competence(neighbors_idx, labels, safe_k)¶
Calculate the Instance hardness of the sample based on its neighborhood. The sample is deemed hard to classify when there is overlap between different classes in the region of competence. This method does not takes into account the target label of the test sample
This hardness measure is used to select whether use DS or use the KNN for the classification of a given query sample
- neighbors_idx : array of shape = [n_samples_test, k]
Indices of the nearest neighbors for each considered sample
- labels : array of shape = [n_samples_train]
labels associated with each training sample
- safe_k : int
Number of neighbors used to estimate the hardness of the corresponding region
- hardness : array of shape = [n_samples_test]
The Hardness level associated with each example.
Smith, M.R., Martinez, T. and Giraud-Carrier, C., 2014. An instance level analysis of data complexity. Machine learning, 95(2), pp.225-256
kdn_score(X, y, k)¶
Calculates the K-Disagreeing Neighbors score (KDN) of each sample in the input dataset.
- X : array of shape (n_samples, n_features)
The input data.
- y : array of shape (n_samples)
class labels of each example in X.
- k : int
Neighborhood size for calculating the KDN score.
- score : array of shape = [n_samples,1]
KDN score of each sample in X.
- neighbors : array of shape = [n_samples,k]
Indexes of the k neighbors of each sample in X.
M. R. Smith, T. Martinez, C. Giraud-Carrier, An instance level analysis of data complexity, Machine Learning 95 (2) (2014) 225-256.