lightonml.encoding¶
lightonml.encoding.base¶
Encoders
These modules contains implementations of Encoders that can transform data in the binary uint8 format required by the OPU. Compatible with numpy.ndarray and torch.Tensor.
-
class
BinaryThresholdEncoder
(threshold_enc=25, greater_is_one=True)[source]¶ Bases:
object
Implements binary encoding using a threshold function.
- Parameters
-
class
ConcatenatedBitPlanEncoder
(n_bits=8, starting_bit=0)[source]¶ Bases:
object
Implements an encoding that works by concatenating bitplanes along the feature dimension.
n_bits + starting_bit
must be lower than the bitwidth of data that are going to be fed to the encoder. E.g. ifX.dtype
isuint8
, thenn_bits + starting_bit
must be lower than 8. If insteadX.dtype
isuint32
, thenn_bits + starting_bit
must be lower than 32.Read more in the Examples section.
- Parameters
-
fit
(X, y=None)[source]¶ No-op. This method doesn’t do anything. It exists purely for compatibility with the scikit-learn transformer API.
- Parameters
X (2D np.ndarray) –
y (1D np.ndarray) –
- Returns
self
- Return type
-
transform
(X)[source]¶ Performs the encoding.
- Parameters
X (2D np.ndarray of uint8, 16, 32 or 64 [n_samples, n_features],) – input data to encode.
- Returns
X_enc – encoded input data. A line is arranged as [bits_for_first_feature, …, bits_for_last_feature].
- Return type
2D np.ndarray of uint8 [n_samples, n_features*n_bits]
-
class
ConcatenatedFloat32Encoder
(sign_bit=True, exp_bits=8, mantissa_bits=23)[source]¶ Bases:
object
Implements an encoding that works by concatenating bitplanes and selecting how many bits to keep for sign, mantissa and exponent of the float32.
- Parameters
-
class
ConcatenatingBitPlanDecoder
(n_bits=8, decoding_decay=0.5)[source]¶ Bases:
object
Implements a decoding that works by concatenating bitplanes.
n_bits
MUST be the same value used in SeparatedBitPlanEncoder. Read more in the Examples section.- Parameters
-
class
Float32Encoder
(sign_bit=True, exp_bits=8, mantissa_bits=23)[source]¶ Bases:
object
Implements an encoding that works by separating bitplans and selecting how many bits to keep for sign, mantissa and exponent of the float32.
- Parameters
-
class
MixingBitPlanDecoder
(n_bits=8, decoding_decay=0.5)[source]¶ Bases:
object
Implements a decoding that works by mixing bitplanes.
n_bits
MUST be the same value used in SeparatedBitPlanEncoder. Read more in the Examples section.- Parameters
-
class
MultiThresholdEncoder
(thresholds=None, n_bins=8, columnwise=False)[source]¶ Bases:
object
Implements binary encoding using multiple thresholds.
- Parameters
-
thresholds
¶ thresholds for the binary encoder.
- Type
np.ndarray,
-
columnwise
¶ whether to use different thresholds for each column or a common set of thresholds for everything.
- Type
bool,
-
n_bins
¶ number of different values the encoding can take. A value is encoded into n_bins-1 bits.
- Type
int,
-
fit
(X, y=None)[source]¶ If thresholds is not None, this method doesn’t do anything. If thresholds is None, computes n_bins thresholds equally spaced on the range of X. The range of X is determined column-wise but the number of bins is the same for all features.
- Parameters
X (2D np.ndarray) –
y (1D np.ndarray) –
- Returns
self
- Return type
-
transform
(X)[source]¶ Transforms an array to a uint8 binary array of [0, 1].
The bins defined by the thresholds are not mutually exclusive, i.e a value x will activate all the bins corresponding to thresholds lesser than x.
- Parameters
X (np.ndarray of size n_sample x n_features) – The input data to encode.
- Returns
X_enc – The encoded data.
- Return type
np.ndarray of uint8, of size n_samples x (n_features x n_bins)
-
class
SeparatedBitPlanEncoder
(n_bits=8, starting_bit=0)[source]¶ Bases:
object
Implements an encoding that works by separating bitplans.
n_bits + starting_bit
must be lower than the bitwidth of data that are going to be fed to the encoder. E.g. ifX.dtype
isuint8
, thenn_bits + starting_bit
must be lower than 8. If insteadX.dtype
isuint32
, thenn_bits + starting_bit
must be lower than 32.Read more in the Examples section.
- Parameters
-
class
SequentialBaseTwoEncoder
(n_gray_levels=16)[source]¶ Bases:
object
Implements a base 2 encoding.
E.g. \(5\) is written \(101\) in base 2: \(1 * 2^2 + 0 * 2^1 + 1 * 2^0\) = (1)*4 +(0)*2 +(1)*1, so the encoder will give 1111001.
- Parameters
n_gray_levels (int,) – number of values that can be encoded. Must be a power of 2.
-
fit
(X, y=None)[source]¶ Computes parameters for the normalization.
Must be run only on the training set to avoid leaking information to the dev/test set.
- Parameters
X (np.ndarray of uint [n_samples, n_features],) – the input data to encode.
y (np.ndarray,) – the targets data.
- Returns
self
- Return type
SequentialBaseTwoEncoder.
lightonml.encoding.models¶
-
class
ConvAE
(in_ch, out_ch, kernel_size, beta, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros', flatten=False)[source]¶ Bases:
lightonml.encoding.models.AE
Autoencoder consisting of two convolutional layers (encoder, decoder).
- Parameters
in_ch (int,) – number of input channels
out_ch (int,) – number of output channels
kernel_size (int or tuple,) – size of the convolutional filters
beta (float, default 1.,) – inverse temperature for tanh(beta x)
stride (int or tuple, optional, default 1,) – stride of the convolution
padding (int or tuple, optional, default 0,) – zero-padding added to both sides of the input
padding_mode (str, optional, default 'zeros',) – ‘zeros’ or ‘circular’
dilation (int or tuple, optional, default 1,) – spacing between kernel elements
groups (int, optional, default 1,) – number of blocked connections from input to output channels
bias (bool, optional, default
True
,) – adds a learnable bias to the output.flatten (bool, default False,) – whether to return a 2D flattened array (batch_size, x) or a 4D (batch_size, out_ch, out_h, out_w) when encoding
-
encoder
¶ encoding layer
- Type
nn.Conv2d,
-
decoder
¶ decoding layer
- Type
nn.TransposeConv2d,
-
flatten
¶ whether to return a 2D flattened array (batch_size, x) or a 4D (batch_size, out_ch, out_h, out_w) when encoding
- Type
bool, default False,
-
forward
(input)[source]¶ Returns the reconstructed input or the binary code, depending on self.training. Call .eval() on the module for the binary code, .train() for the reconstruction.
- Parameters
input (torch.Tensor,) – tensor holding the input data
- Returns
rec (torch.Tensor float,) – if self.training=True returns the reconstruction of the input
binary_code (torch.Tensor uint8,) – tensor holding the binary code if self.training=False.
-
class
EncoderDecoder
(input_size, hidden_size)[source]¶ Bases:
lightonml.encoding.models.AE
Autoencoder consisting of two dense layers (encoder, decoder). The decoder weights are the transpose of the encoder ones. Backpropagation updates the decoder, in this way the encoder is also updated despite the non-differentiable non-linearity. Architecture from Tissier et al. (https://arxiv.org/abs/1803.09065).
-
proj
¶ encoding-decoding layer
- Type
nn.Linear,
-
forward
(input)[source]¶ Returns the reconstructed input or the binary code, depending on self.training. Call .eval() on the module for the binary code, .train() for the reconstruction.
- Parameters
input (torch.Tensor,) – tensor holding the input data
- Returns
rec (torch.Tensor float,) – if self.training=True returns the reconstruction of the input
binary_code (torch.Tensor uint8,) – tensor holding the binary code if self.training=False.
-
-
class
LinearAE
(input_size, hidden_size, beta=1.0)[source]¶ Bases:
lightonml.encoding.models.AE
Autoencoder consisting of two dense layers (encoder, decoder). The autoencoder learns to produce a binary output starting from tanh(beta x)/beta with beta=1 and gradually increasing beta to resemble a step function
- Parameters
-
encoder
¶ encoding layer
- Type
nn.Linear,
-
decoder
¶ decoding layer
- Type
nn.Linear,
-
forward
(input)[source]¶ Returns the reconstructed input or the binary code, depending on self.training. Call .eval() on the module for the binary code, .train() for the reconstruction.
- Parameters
input (torch.Tensor,) – tensor holding the input data
- Returns
rec (torch.Tensor float,) – if self.training=True returns the reconstruction of the input
binary_code (torch.Tensor uint8,) – tensor holding the binary code if self.training=False.
-
train
(model, dataloader, optimizer, criterion=torch.nn.functional.mse_loss, epochs=10, beta_interval=5, device=None, verbose=True)[source]¶ Utility function to train autoencoders quickly.
- Parameters
model (nn.Module,) – autoencoder to trained
dataloader (torch.utils.data.Dataloader,) – loader for the training dataset of the autoencoder
optimizer (torch.optim.Optimizer,) – optimizer used to perform the training
criterion (callable, default torch.nn.functional.mse_loss) – loss function for training
epochs (int, default 10,) – number of epochs of training
beta_interval (int, default 5,) – interval in epochs for beta increase by factor 10
device (str, 'cpu' or 'cuda:{idx}') – device used to perform the training.
verbose (bool, default True,) – whether to print info on the training
- Returns
model – trained model
- Return type
nn.Module,