dualing.datasets¶
Because we need data, right? Datasets are composed of classes and methods that allow preparing data for further application in dual-based learning.
A dataset package to transform data into real datasets.
- class dualing.datasets.BalancedPairDataset(data: numpy.array, labels: numpy.array, n_pairs: Optional[int] = 2, batch_size: Optional[int] = 1, input_shape: Optional[Tuple[int, Ellipsis]] = None, normalize: Optional[Tuple[int, int]] = (0, 1), shuffle: Optional[bool] = True, seed: Optional[int] = 0)¶
Bases:
dualing.core.Dataset
A BalancedPairDataset class is responsible for implementing a dataset that creates balanced pairs of data, as well as their similarity (1) or dissimilarity (0).
- __init__(self, data: numpy.array, labels: numpy.array, n_pairs: Optional[int] = 2, batch_size: Optional[int] = 1, input_shape: Optional[Tuple[int, Ellipsis]] = None, normalize: Optional[Tuple[int, int]] = (0, 1), shuffle: Optional[bool] = True, seed: Optional[int] = 0)¶
Initialization method.
- Parameters
data – Array of samples.
labels – Array of labels.
n_pairs – Number of pairs.
batch_size – Batch size.
input_shape – Shape of the reshaped array.
normalize – Normalization bounds.
shuffle – Whether data should be shuffled or not.
seed – Provides deterministic traits when using random module.
- _build(self, pairs: Tuple[tensorflow.Tensor, tensorflow.Tensor])¶
Builds the class.
- Parameters
pairs – Pairs of samples along their labels.
- property batches(self)¶
Batches of data (samples, labels).
- create_pairs(self, data: numpy.array, labels: numpy.array)¶
Creates balanced pairs from data and labels.
- Parameters
data (np.array) – Array of samples.
labels (np.array) – Array of labels.
- Returns
Tuple containing pairs of samples along their labels.
- Return type
(Tuple[tf.Tensor, tf.Tensor, tf.Tensor])
- property n_pairs(self)¶
Amount of pairs.
- class dualing.datasets.BatchDataset(data: numpy.array, labels: numpy.array, batch_size: Optional[int] = 1, input_shape: Optional[Tuple[int, Ellipsis]] = None, normalize: Optional[Tuple[int, int]] = (0, 1), shuffle: Optional[bool] = True, seed: Optional[int] = 0)¶
Bases:
dualing.core.Dataset
A BatchDataset class is responsible for implementing a standard dataset that uses input data and labels to provide batches.
- __init__(self, data: numpy.array, labels: numpy.array, batch_size: Optional[int] = 1, input_shape: Optional[Tuple[int, Ellipsis]] = None, normalize: Optional[Tuple[int, int]] = (0, 1), shuffle: Optional[bool] = True, seed: Optional[int] = 0)¶
Initialization method.
- Parameters
data – Array of samples.
labels – Array of labels.
batch_size – Batch size.
input_shape – Shape of the reshaped array.
normalize – Normalization bounds.
shuffle – Whether data should be shuffled or not.
seed – Provides deterministic traits when using random module.
- _build(self, data: numpy.array, labels: numpy.array)¶
Builds the class.
- Parameters
data – Array of samples.
labels – Array of labels.
- property batches(self)¶
Batches of data (samples, labels).
- class dualing.datasets.RandomPairDataset(data: numpy.array, labels: numpy.array, batch_size: Optional[int] = 1, input_shape: Optional[Tuple[int, Ellipsis]] = None, normalize: Optional[Tuple[int, int]] = (0, 1), seed: Optional[int] = 0)¶
Bases:
dualing.core.Dataset
A RandomPairDataset class is responsible for implementing a dataset that randomly creates pairs of data, as well as their similarity (1) or not (0).
- __init__(self, data: numpy.array, labels: numpy.array, batch_size: Optional[int] = 1, input_shape: Optional[Tuple[int, Ellipsis]] = None, normalize: Optional[Tuple[int, int]] = (0, 1), seed: Optional[int] = 0)¶
Initialization method.
- Parameters
data – Array of samples.
labels – Array of labels.
batch_size – Batch size.
input_shape – Shape of the reshaped array.
normalize – Normalization bounds.
seed – Provides deterministic traits when using random module.
- _build(self, pairs: Tuple[tensorflow.Tensor, tensorflow.Tensor])¶
Builds the class.
- Parameters
pairs – Pairs of samples along their labels.
- property batches(self)¶
Batches of data (samples, labels).
- create_pairs(self, data: numpy.array, labels: numpy.array)¶
Creates random pairs from data and labels.
- Parameters
data (np.array) – Array of samples.
labels (np.array) – Array of labels.
- Returns
Tuple containing pairs of samples along their labels.
- Return type
(Tuple[tf.Tensor, tf.Tensor, tf.Tensor])