lightonml.encoding

lightonml.encoding.base

Encoders

These modules contains implementations of Encoders that can transform data in the binary uint8 format required by the OPU. Compatible with numpy.ndarray and torch.Tensor.

class BinaryThresholdEncoder(threshold_enc=25, greater_is_one=True)[source]

Bases: object

Implements binary encoding using a threshold function.

Parameters
  • threshold_enc (int) – Threshold for the binary encoder. Must be in the interval [0, 255]

  • greater_is_one (bool) – If True, above threshold is 1 and below 0. Vice versa if False.

threshold_enc

Threshold for the binary encoder. Must be in the interval [0, 255]

Type

int

greater_is_one

If True, above threshold is 1 and below 0. Vice versa if False.

Type

bool

fit(X, y=None)[source]

No-op. This method doesn’t do anything. It exists purely for compatibility with the scikit-learn transformer API.

Parameters
  • X (np.ndarray,) – the input data to encode.

  • y (np.ndarray,) – the targets data.

Returns

self

Return type

BinaryThresholdEncoding

transform(X, y=None)[source]

Transform a uint8 array in [0, 255] in a uint8 binary array of [0, 1].

Parameters

X (np.ndarray of uint8,) – the input data to encode.

Returns

X_enc – the encoded data.

Return type

np.ndarray of uint8,

class ConcatenatedBitPlanEncoder(n_bits=8, starting_bit=0)[source]

Bases: object

Implements an encoding that works by concatenating bitplanes along the feature dimension.

n_bits + starting_bit must be lower than the bitwidth of data that are going to be fed to the encoder. E.g. if X.dtype is uint8, then n_bits + starting_bit must be lower than 8. If instead X.dtype is uint32, then n_bits + starting_bit must be lower than 32.

Read more in the Examples section.

Parameters
  • n_bits (int, defaults to 8,) – number of bits to keep during the encoding. Must be positive.

  • starting_bit (int, defaults to 0,) – bit used to start the encoding, previous bits will be thrown away. Must be positive.

n_bits

number of bits to keep during the encoding.

Type

int,

starting_bit

bit used to start the encoding, previous bits will be thrown away.

Type

int,

fit(X, y=None)[source]

No-op. This method doesn’t do anything. It exists purely for compatibility with the scikit-learn transformer API.

Parameters
  • X (2D np.ndarray) –

  • y (1D np.ndarray) –

Returns

self

Return type

SeparatedBitPlanEncoder

transform(X)[source]

Performs the encoding.

Parameters

X (2D np.ndarray of uint8, 16, 32 or 64 [n_samples, n_features],) – input data to encode.

Returns

X_enc – encoded input data. A line is arranged as [bits_for_first_feature, …, bits_for_last_feature].

Return type

2D np.ndarray of uint8 [n_samples, n_features*n_bits]

class ConcatenatedFloat32Encoder(sign_bit=True, exp_bits=8, mantissa_bits=23)[source]

Bases: object

Implements an encoding that works by concatenating bitplanes and selecting how many bits to keep for sign, mantissa and exponent of the float32.

Parameters
  • sign_bit (bool, defaults to True,) – if True keeps the bit for the sign.

  • exp_bits (int, defaults to 8,) – number of bits of the exponent to keep.

  • mantissa_bits (int, defaults to 23,) – number of bits of the mantissa to keep.

sign_bit

if True keeps the bit for the sign.

Type

bool, defaults to True,

exp_bits

number of bits of the exponent to keep.

Type

int, defaults to 8,

mantissa_bits

number of bits of the mantissa to keep.

Type

int, defaults to 23,

n_bits

total number of bits to keep.

Type

int,

indices

list of the indices of the bits to keep.

Type

list,

fit(X, y=None)[source]

No-op. This method doesn’t do anything. It exists purely for compatibility with the scikit-learn transformer API.

Parameters
  • X (2D np.ndarray) –

  • y (1D np.ndarray) –

Returns

self

Return type

ConcatenatedFloat32Encoder

transform(X)[source]

Performs the encoding.

Parameters

X (2D np.ndarray of uint8, 16, 32 or 64 [n_samples, n_features],) – input data to encode.

Returns

X_enc – encoded input data.

Return type

2D np.ndarray of uint8 [n_samples*n_bits, n_features],

class ConcatenatingBitPlanDecoder(n_bits=8, decoding_decay=0.5)[source]

Bases: object

Implements a decoding that works by concatenating bitplanes.

n_bits MUST be the same value used in SeparatedBitPlanEncoder. Read more in the Examples section.

Parameters
  • n_bits (int, defaults to 8,) – number of bits used during the encoding.

  • decoding_decay (float, defaults to 0.5,) – decay to apply to the bits during the decoding.

n_bits

number of bits used during the encoding.

Type

int,

decoding_decay

decay to apply to the bits during the decoding.

Type

float, defaults to 0.5,

fit(X, y=None)[source]

No-op. This method doesn’t do anything. It exists purely for compatibility with the scikit-learn transformer API.

Parameters
  • X (np.ndarray) –

  • y (np.ndarray, optional, defaults to None.) –

Returns

self

Return type

MixingBitPlanDecoder

transform(X, y=None)[source]

Performs the decoding.

Parameters

X (2D np.ndarray of uint8 or uint16,) – input data to decode.

Returns

X_dec – decoded data.

Return type

2D np.ndarray of floats

class Float32Encoder(sign_bit=True, exp_bits=8, mantissa_bits=23)[source]

Bases: object

Implements an encoding that works by separating bitplans and selecting how many bits to keep for sign, mantissa and exponent of the float32.

Parameters
  • sign_bit (bool, defaults to True,) – if True keeps the bit for the sign.

  • exp_bits (int, defaults to 8,) – number of bits of the exponent to keep.

  • mantissa_bits (int, defaults to 23,) – number of bits of the mantissa to keep.

sign_bit

if True keeps the bit for the sign.

Type

bool, defaults to True,

exp_bits

number of bits of the exponent to keep.

Type

int, defaults to 8,

mantissa_bits

number of bits of the mantissa to keep.

Type

int, defaults to 23,

n_bits

total number of bits to keep.

Type

int,

indices

list of the indices of the bits to keep.

Type

list,

fit(X, y=None)[source]

No-op. This method doesn’t do anything. It exists purely for compatibility with the scikit-learn transformer API.

Parameters
  • X (2D np.ndarray) –

  • y (1D np.ndarray) –

Returns

self

Return type

SeparatedBitPlanEncoder

transform(X)[source]

Performs the encoding.

Parameters

X (2D np.ndarray of uint8, 16, 32 or 64 [n_samples, n_features],) – input data to encode.

Returns

X_enc – encoded input data.

Return type

2D np.ndarray of uint8 [n_samples*n_bits, n_features],

class MixingBitPlanDecoder(n_bits=8, decoding_decay=0.5)[source]

Bases: object

Implements a decoding that works by mixing bitplanes.

n_bits MUST be the same value used in SeparatedBitPlanEncoder. Read more in the Examples section.

Parameters
  • n_bits (int, defaults to 8,) – number of bits used during the encoding.

  • decoding_decay (float, defaults to 0.5,) – decay to apply to the bits during the decoding.

n_bits

number of bits used during the encoding.

Type

int,

decoding_decay

decay to apply to the bits during the decoding.

Type

float, defaults to 0.5,

fit(X, y=None)[source]

No-op. This method doesn’t do anything. It exists purely for compatibility with the scikit-learn transformer API.

Parameters
  • X (np.ndarray) –

  • y (np.ndarray, optional, defaults to None.) –

Returns

self

Return type

MixingBitPlanDecoder

transform(X, y=None)[source]

Performs the decoding.

Parameters

X (2D np.ndarray of uint8 or uint16,) – input data to decode.

Returns

X_dec – decoded data.

Return type

2D np.ndarray of floats

class MultiThresholdEncoder(thresholds=None, n_bins=8, columnwise=False)[source]

Bases: object

Implements binary encoding using multiple thresholds.

Parameters
  • thresholds (np.ndarray,) – thresholds for the binary encoder

  • columnwise (bool,) – whether to use different thresholds for each column or a common set of thresholds for everything.

  • n_bins (int,) – if thresholds is not specified, n_bins - 1 thresholds will be created equally spaced on the input range

thresholds

thresholds for the binary encoder.

Type

np.ndarray,

columnwise

whether to use different thresholds for each column or a common set of thresholds for everything.

Type

bool,

n_bins

number of different values the encoding can take. A value is encoded into n_bins-1 bits.

Type

int,

fit(X, y=None)[source]

If thresholds is not None, this method doesn’t do anything. If thresholds is None, computes n_bins thresholds equally spaced on the range of X. The range of X is determined column-wise but the number of bins is the same for all features.

Parameters
  • X (2D np.ndarray) –

  • y (1D np.ndarray) –

Returns

self

Return type

MultiThresholdEncoder

transform(X)[source]

Transforms an array to a uint8 binary array of [0, 1].

The bins defined by the thresholds are not mutually exclusive, i.e a value x will activate all the bins corresponding to thresholds lesser than x.

Parameters

X (np.ndarray of size n_sample x n_features) – The input data to encode.

Returns

X_enc – The encoded data.

Return type

np.ndarray of uint8, of size n_samples x (n_features x n_bins)

class NoDecoding[source]

Bases: object

Implements a No-Op Decoding class for API consistency.

class NoEncoding[source]

Bases: object

Implements a No-Op Encoding class for API consistency.

class SeparatedBitPlanEncoder(n_bits=8, starting_bit=0)[source]

Bases: object

Implements an encoding that works by separating bitplans.

n_bits + starting_bit must be lower than the bitwidth of data that are going to be fed to the encoder. E.g. if X.dtype is uint8, then n_bits + starting_bit must be lower than 8. If instead X.dtype is uint32, then n_bits + starting_bit must be lower than 32.

Read more in the Examples section.

Parameters
  • n_bits (int, defaults to 8,) – number of bits to keep during the encoding. Must be positive.

  • starting_bit (int, defaults to 0,) – bit used to start the encoding, previous bits will be thrown away. Must be positive.

n_bits

number of bits to keep during the encoding.

Type

int,

starting_bit

bit used to start the encoding, previous bits will be thrown away.

Type

int,

fit(X, y=None)[source]

No-op. This method doesn’t do anything. It exists purely for compatibility with the scikit-learn transformer API.

Parameters
  • X (2D np.ndarray or torch.Tensor) –

  • y (1D np.ndarray or torch.Tensor) –

Returns

self

Return type

SepartatedBitPlanEncoder

transform(X)[source]

Performs the encoding.

Parameters

X (2D np.ndarray or torch.Tensor of uint8, 16, 32 or 64 [n_samples, n_features],) – input data to encode.

Returns

X_enc – encoded input data.

Return type

2D np.ndarray or torch.Tensor of uint8 [n_samples*n_bits, n_features]

class SequentialBaseTwoEncoder(n_gray_levels=16)[source]

Bases: object

Implements a base 2 encoding.

E.g. \(5\) is written \(101\) in base 2: \(1 * 2^2 + 0 * 2^1 + 1 * 2^0\) = (1)*4 +(0)*2 +(1)*1, so the encoder will give 1111001.

Parameters

n_gray_levels (int,) – number of values that can be encoded. Must be a power of 2.

n_gray_levels

number of values that can be encoded. Must be a power of 2.

Type

int,

n_bits

number of bits needed to encode n_gray_levels values.

Type

int,

offset

value to subtract to get the minimum to 0.

Type

float,

scale

scaling factor to normalize the data.

Type

float,

fit(X, y=None)[source]

Computes parameters for the normalization.

Must be run only on the training set to avoid leaking information to the dev/test set.

Parameters
  • X (np.ndarray of uint [n_samples, n_features],) – the input data to encode.

  • y (np.ndarray,) – the targets data.

Returns

self

Return type

SequentialBaseTwoEncoder.

normalize(X)[source]

Normalize the data in the right range before the integer casting.

Parameters

X (np.ndarray of uint [n_samples, n_features],) – the input data to normalize.

Returns

X_norm – normalized data.

Return type

np.ndarray of uint8 [n_samples, n_features],

transform(X, y=None)[source]

Performs the encoding.

Parameters

X (2D np.ndarray of uint [n_samples, n_features],) – input data to encode.

Returns

X_enc – encoded input data.

Return type

2D np.ndarray of uint8 [n_samples, n_features*(n_gray_levels-1)

lightonml.encoding.models

class AE[source]

Bases: torch.nn.Module

class ConvAE(in_ch, out_ch, kernel_size, beta, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros', flatten=False)[source]

Bases: lightonml.encoding.models.AE

Autoencoder consisting of two convolutional layers (encoder, decoder).

Parameters
  • in_ch (int,) – number of input channels

  • out_ch (int,) – number of output channels

  • kernel_size (int or tuple,) – size of the convolutional filters

  • beta (float, default 1.,) – inverse temperature for tanh(beta x)

  • stride (int or tuple, optional, default 1,) – stride of the convolution

  • padding (int or tuple, optional, default 0,) – zero-padding added to both sides of the input

  • padding_mode (str, optional, default 'zeros',) – ‘zeros’ or ‘circular’

  • dilation (int or tuple, optional, default 1,) – spacing between kernel elements

  • groups (int, optional, default 1,) – number of blocked connections from input to output channels

  • bias (bool, optional, default True,) – adds a learnable bias to the output.

  • flatten (bool, default False,) – whether to return a 2D flattened array (batch_size, x) or a 4D (batch_size, out_ch, out_h, out_w) when encoding

encoder

encoding layer

Type

nn.Conv2d,

decoder

decoding layer

Type

nn.TransposeConv2d,

beta

inverse temperature for tanh(beta x)

Type

float,

flatten

whether to return a 2D flattened array (batch_size, x) or a 4D (batch_size, out_ch, out_h, out_w) when encoding

Type

bool, default False,

forward(input)[source]

Returns the reconstructed input or the binary code, depending on self.training. Call .eval() on the module for the binary code, .train() for the reconstruction.

Parameters

input (torch.Tensor,) – tensor holding the input data

Returns

  • rec (torch.Tensor float,) – if self.training=True returns the reconstruction of the input

  • binary_code (torch.Tensor uint8,) – tensor holding the binary code if self.training=False.

class EncoderDecoder(input_size, hidden_size)[source]

Bases: lightonml.encoding.models.AE

Autoencoder consisting of two dense layers (encoder, decoder). The decoder weights are the transpose of the encoder ones. Backpropagation updates the decoder, in this way the encoder is also updated despite the non-differentiable non-linearity. Architecture from Tissier et al. (https://arxiv.org/abs/1803.09065).

Parameters
  • input_size (int,) – size of the input

  • hidden_size (int,) – size of the hidden layer

proj

encoding-decoding layer

Type

nn.Linear,

forward(input)[source]

Returns the reconstructed input or the binary code, depending on self.training. Call .eval() on the module for the binary code, .train() for the reconstruction.

Parameters

input (torch.Tensor,) – tensor holding the input data

Returns

  • rec (torch.Tensor float,) – if self.training=True returns the reconstruction of the input

  • binary_code (torch.Tensor uint8,) – tensor holding the binary code if self.training=False.

class LinearAE(input_size, hidden_size, beta=1.0)[source]

Bases: lightonml.encoding.models.AE

Autoencoder consisting of two dense layers (encoder, decoder). The autoencoder learns to produce a binary output starting from tanh(beta x)/beta with beta=1 and gradually increasing beta to resemble a step function

Parameters
  • input_size (int,) – size of the input

  • hidden_size (int,) – size of the hidden layer

  • beta (float, default 1.,) – inverse temperature for tanh(beta x)

encoder

encoding layer

Type

nn.Linear,

decoder

decoding layer

Type

nn.Linear,

beta

inverse temperature for tanh(beta x)

Type

float,

forward(input)[source]

Returns the reconstructed input or the binary code, depending on self.training. Call .eval() on the module for the binary code, .train() for the reconstruction.

Parameters

input (torch.Tensor,) – tensor holding the input data

Returns

  • rec (torch.Tensor float,) – if self.training=True returns the reconstruction of the input

  • binary_code (torch.Tensor uint8,) – tensor holding the binary code if self.training=False.

train(model, dataloader, optimizer, criterion=torch.nn.functional.mse_loss, epochs=10, beta_interval=5, device=None, verbose=True)[source]

Utility function to train autoencoders quickly.

Parameters
  • model (nn.Module,) – autoencoder to trained

  • dataloader (torch.utils.data.Dataloader,) – loader for the training dataset of the autoencoder

  • optimizer (torch.optim.Optimizer,) – optimizer used to perform the training

  • criterion (callable, default torch.nn.functional.mse_loss) – loss function for training

  • epochs (int, default 10,) – number of epochs of training

  • beta_interval (int, default 5,) – interval in epochs for beta increase by factor 10

  • device (str, 'cpu' or 'cuda:{idx}') – device used to perform the training.

  • verbose (bool, default True,) – whether to print info on the training

Returns

model – trained model

Return type

nn.Module,