FAISS Wrapper¶
-
class
deslib.util.faiss_knn_wrapper.
FaissKNNClassifier
(n_neighbors=5, n_jobs=None, algorithm='brute', n_cells=100, n_probes=1)[source]¶ Scikit-learn wrapper interface for Faiss KNN.
Parameters: - n_neighbors : int (Default = 5)
Number of neighbors used in the nearest neighbor search.
- n_jobs : int (Default = None)
- The number of jobs to run in parallel for both fit and predict.
If -1, then the number of jobs is set to the number of cores.
- algorithm : {‘brute’, ‘voronoi’} (Default = ‘brute’)
Algorithm used to compute the nearest neighbors:
- ‘brute’ will use the :class: IndexFlatL2 class from faiss.
- ‘voronoi’ will use
IndexIVFFlat
class from faiss. - ‘hierarchical’ will use
IndexHNSWFlat
class from faiss.
Note that selecting ‘voronoi’ the system takes more time during training, however it can significantly improve the search time on inference. ‘hierarchical’ produce very fast and accurate indexes, however it has a higher memory requirement. It’s recommended when you have a lots of RAM or the dataset is small.
For more information see: https://github.com/facebookresearch/faiss/wiki/Guidelines-to-choose-an-index
- n_cells : int (Default = 100)
Number of voronoi cells. Only used when algorithm==’voronoi’.
- n_probes : int (Default = 1)
Number of cells that are visited to perform the search. Note that the search time roughly increases linearly with the number of probes. Only used when algorithm==’voronoi’.
References
Johnson Jeff, Matthijs Douze, and Hervé Jégou. “Billion-scale similarity search with gpus.” arXiv preprint arXiv:1702.08734 (2017).
-
fit
(X, y)[source]¶ Fit the model according to the given training data.
Parameters: - X : array of shape (n_samples, n_features)
Data used to fit the model.
- y : array of shape (n_samples)
class labels of each example in X.
-
kneighbors
(X, n_neighbors=None, return_distance=True)[source]¶ Finds the K-neighbors of a point.
Parameters: - X : array of shape (n_samples, n_features)
The input data.
- n_neighbors : int
Number of neighbors to get (default is the value passed to the constructor).
- return_distance : boolean, optional. Defaults to True.
If False, distances will not be returned
Returns: - dists : list of shape = [n_samples, k]
The distances between the query and each sample in the region of competence. The vector is ordered in an ascending fashion.
- idx : list of shape = [n_samples, k]
Indices of the instances belonging to the region of competence of the given query sample.