Datasets¶
This file contains routines to generate 2D classification datasets that can be used to test the performance of different machine learning algorithms.
- P2 Dataset
- Circle and Square
- Banana
- Banana 2
-
deslib.util.datasets.
make_P2
(size_classes, random_state=None)[source]¶ Generate the P2 Dataset:
The P2 is a two-class problem, presented by Valentini[1], in which each class is defined in multiple decision regions delimited by polynomial and trigonometric functions (E1, E2, E3 and E4):
\[\begin{split}\begin{eqnarray} \label{eq:problem1} E1(x) = sin(x) + 5 \\ \label{eq:problem2} E2(x) = (x - 2)^{2} + 1 \\ \label{eq:problem3} E3(x) = -0.1 \cdot x^{2} + 0.6sin(4x) + 8 \\ \label{eq:problem4} E4(x) = \frac{(x - 10)^{2}}{2} + 7.902 \end{eqnarray}\end{split}\]Parameters: - size_classes : list with the number of samples for each class.
- random_state : int, RandomState instance or None, optional (default=None)
If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
Returns: - X : array of shape = [size_classes, 2]
The generated data points.
- y : array of shape = [size_classes]
Class labels associated with each class.
References
G. Valentini, An experimental bias-variance analysis of svm ensembles based on resampling techniques, IEEE Transactions on Systems, Man, and Cybernetics, Part B 35 (2005) 1252–1271.
-
deslib.util.datasets.
make_banana
(size_classes, na=0.1, random_state=None)[source]¶ Generate the Banana dataset.
Parameters: - size_classes : list with the number of samples for each class.
- na : float (Default = 0.2),
Noise amplitude. It must be < 1.0
- random_state : int, RandomState instance or None, optional (default=None)
If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
Returns: - X : array of shape = [size_classes, 2]
The generated data points.
- y : array of shape = [size_classes]
Class labels associated with each class.
References
Kuncheva, Ludmila I. Combining pattern classifiers: methods and algorithms. John Wiley & Sons, 2004.
-
deslib.util.datasets.
make_banana2
(size_classes, sigma=1, random_state=None)[source]¶ Generate the Banana dataset similar to the Matlab PRTools toolbox.
Parameters: - size_classes : list with the number of samples for each class.
- sigma : float (Default = 1),
variance of the normal distribution
- random_state : int, RandomState instance or None, optional (default=None)
If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
Returns: - X : array of shape = [size_classes, 2]
The generated data points.
- y : array of shape = [size_classes]
Class labels associated with each class.
References
R.P.W. Duin, P. Juszczak, D.de Ridder, P. Paclik, E. Pekalska, D.M.Tax, Prtools, a matlab toolbox for pattern recognition, 2004. URL 〈http://www.prtools.org〉.
-
deslib.util.datasets.
make_circle_square
(size_classes, random_state=None)[source]¶ Generate the circle square dataset.
Parameters: - size_classes : list with the number of samples for each class.
- random_state : int, RandomState instance or None, optional (default=None)
If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
Returns: - X : array of shape = [size_classes, 2]
The generated data points.
- y : array of shape = [size_classes]
Class labels associated with each class.
References
P. Henniges, E. Granger, R. Sabourin, Factors of overtraining with fuzzy artmap neural networks, International Joint Conference on Neural Networks (2005) 1075–1080.
-
deslib.util.datasets.
make_xor
(n_samples, random_state=None)[source]¶ Generate the exclusive-or (XOR) dataset.
Parameters: - n_samples : int
Number of generated data points.
- random_state : int, RandomState instance or None, optional (default=None)
If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
Returns: - X : array of shape = [size_classes, 2]
The generated data points.
- y : array of shape = [size_classes]
Class labels associated with each class.