Datasets¶

This file contains routines to generate 2D classification datasets that can be used to test the performance of different machine learning algorithms.

P2 Dataset
Circle and Square
Banana
Banana 2

deslib.util.datasets.make_P2(size_classes, random_state=None)[source]¶

Generate the P2 Dataset:

The P2 is a two-class problem, presented by Valentini[1], in which each class is defined in multiple decision regions delimited by polynomial and trigonometric functions (E1, E2, E3 and E4):

\[\begin{split}\begin{eqnarray} \label{eq:problem1} E1(x) = sin(x) + 5 \\ \label{eq:problem2} E2(x) = (x - 2)^{2} + 1 \\ \label{eq:problem3} E3(x) = -0.1 \cdot x^{2} + 0.6sin(4x) + 8 \\ \label{eq:problem4} E4(x) = \frac{(x - 10)^{2}}{2} + 7.902 \end{eqnarray}\end{split}\]

Parameters:	size_classes : list with the number of samples for each class. random_state : int, RandomState instance or None, optional (default=None) If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
Returns:	X : array of shape = [size_classes, 2] The generated data points. y : array of shape = [size_classes] Class labels associated with each class.

References

G. Valentini, An experimental bias-variance analysis of svm ensembles based on resampling techniques, IEEE Transactions on Systems, Man, and Cybernetics, Part B 35 (2005) 1252–1271.

deslib.util.datasets.make_banana(size_classes, na=0.1, random_state=None)[source]¶

Generate the Banana dataset.

Parameters:

size_classes : list with the number of samples for each class.
na : float (Default = 0.2),: Noise amplitude. It must be < 1.0
random_state : int, RandomState instance or None, optional (default=None): If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

Returns:

X : array of shape = [size_classes, 2]: The generated data points.
y : array of shape = [size_classes]: Class labels associated with each class.

References

Kuncheva, Ludmila I. Combining pattern classifiers: methods and algorithms. John Wiley & Sons, 2004.

deslib.util.datasets.make_banana2(size_classes, sigma=1, random_state=None)[source]¶

Generate the Banana dataset similar to the Matlab PRTools toolbox.

Parameters:

size_classes : list with the number of samples for each class.
sigma : float (Default = 1),: variance of the normal distribution
random_state : int, RandomState instance or None, optional (default=None): If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

Returns:

X : array of shape = [size_classes, 2]: The generated data points.
y : array of shape = [size_classes]: Class labels associated with each class.

References

R.P.W. Duin, P. Juszczak, D.de Ridder, P. Paclik, E. Pekalska, D.M.Tax, Prtools, a matlab toolbox for pattern recognition, 2004. URL 〈http://www.prtools.org〉.

deslib.util.datasets.make_circle_square(size_classes, random_state=None)[source]¶

Generate the circle square dataset.

Parameters:	size_classes : list with the number of samples for each class. random_state : int, RandomState instance or None, optional (default=None) If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
Returns:	X : array of shape = [size_classes, 2] The generated data points. y : array of shape = [size_classes] Class labels associated with each class.

References

P. Henniges, E. Granger, R. Sabourin, Factors of overtraining with fuzzy artmap neural networks, International Joint Conference on Neural Networks (2005) 1075–1080.

deslib.util.datasets.make_xor(n_samples, random_state=None)[source]¶

Generate the exclusive-or (XOR) dataset.

Parameters:	n_samples : int Number of generated data points. random_state : int, RandomState instance or None, optional (default=None) If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
Returns:	X : array of shape = [size_classes, 2] The generated data points. y : array of shape = [size_classes] Class labels associated with each class.