dualing.datasets

Because we need data, right? Datasets are composed of classes and methods that allow preparing data for further application in dual-based learning.

A dataset package to transform data into real datasets.

class dualing.datasets.BalancedPairDataset(data: numpy.array, labels: numpy.array, n_pairs: Optional[int] = 2, batch_size: Optional[int] = 1, input_shape: Optional[Tuple[int, Ellipsis]] = None, normalize: Optional[Tuple[int, int]] = (0, 1), shuffle: Optional[bool] = True, seed: Optional[int] = 0)

Bases: dualing.core.Dataset

A BalancedPairDataset class is responsible for implementing a dataset that creates balanced pairs of data, as well as their similarity (1) or dissimilarity (0).

__init__(self, data: numpy.array, labels: numpy.array, n_pairs: Optional[int] = 2, batch_size: Optional[int] = 1, input_shape: Optional[Tuple[int, Ellipsis]] = None, normalize: Optional[Tuple[int, int]] = (0, 1), shuffle: Optional[bool] = True, seed: Optional[int] = 0)

Initialization method.

Parameters
  • data – Array of samples.

  • labels – Array of labels.

  • n_pairs – Number of pairs.

  • batch_size – Batch size.

  • input_shape – Shape of the reshaped array.

  • normalize – Normalization bounds.

  • shuffle – Whether data should be shuffled or not.

  • seed – Provides deterministic traits when using random module.

_build(self, pairs: Tuple[tensorflow.Tensor, tensorflow.Tensor])

Builds the class.

Parameters

pairs – Pairs of samples along their labels.

property batches(self)

Batches of data (samples, labels).

create_pairs(self, data: numpy.array, labels: numpy.array)

Creates balanced pairs from data and labels.

Parameters
  • data (np.array) – Array of samples.

  • labels (np.array) – Array of labels.

Returns

Tuple containing pairs of samples along their labels.

Return type

(Tuple[tf.Tensor, tf.Tensor, tf.Tensor])

property n_pairs(self)

Amount of pairs.

class dualing.datasets.BatchDataset(data: numpy.array, labels: numpy.array, batch_size: Optional[int] = 1, input_shape: Optional[Tuple[int, Ellipsis]] = None, normalize: Optional[Tuple[int, int]] = (0, 1), shuffle: Optional[bool] = True, seed: Optional[int] = 0)

Bases: dualing.core.Dataset

A BatchDataset class is responsible for implementing a standard dataset that uses input data and labels to provide batches.

__init__(self, data: numpy.array, labels: numpy.array, batch_size: Optional[int] = 1, input_shape: Optional[Tuple[int, Ellipsis]] = None, normalize: Optional[Tuple[int, int]] = (0, 1), shuffle: Optional[bool] = True, seed: Optional[int] = 0)

Initialization method.

Parameters
  • data – Array of samples.

  • labels – Array of labels.

  • batch_size – Batch size.

  • input_shape – Shape of the reshaped array.

  • normalize – Normalization bounds.

  • shuffle – Whether data should be shuffled or not.

  • seed – Provides deterministic traits when using random module.

_build(self, data: numpy.array, labels: numpy.array)

Builds the class.

Parameters
  • data – Array of samples.

  • labels – Array of labels.

property batches(self)

Batches of data (samples, labels).

class dualing.datasets.RandomPairDataset(data: numpy.array, labels: numpy.array, batch_size: Optional[int] = 1, input_shape: Optional[Tuple[int, Ellipsis]] = None, normalize: Optional[Tuple[int, int]] = (0, 1), seed: Optional[int] = 0)

Bases: dualing.core.Dataset

A RandomPairDataset class is responsible for implementing a dataset that randomly creates pairs of data, as well as their similarity (1) or not (0).

__init__(self, data: numpy.array, labels: numpy.array, batch_size: Optional[int] = 1, input_shape: Optional[Tuple[int, Ellipsis]] = None, normalize: Optional[Tuple[int, int]] = (0, 1), seed: Optional[int] = 0)

Initialization method.

Parameters
  • data – Array of samples.

  • labels – Array of labels.

  • batch_size – Batch size.

  • input_shape – Shape of the reshaped array.

  • normalize – Normalization bounds.

  • seed – Provides deterministic traits when using random module.

_build(self, pairs: Tuple[tensorflow.Tensor, tensorflow.Tensor])

Builds the class.

Parameters

pairs – Pairs of samples along their labels.

property batches(self)

Batches of data (samples, labels).

create_pairs(self, data: numpy.array, labels: numpy.array)

Creates random pairs from data and labels.

Parameters
  • data (np.array) – Array of samples.

  • labels (np.array) – Array of labels.

Returns

Tuple containing pairs of samples along their labels.

Return type

(Tuple[tf.Tensor, tf.Tensor, tf.Tensor])