lightonml.encoding

lightonml.encoding.base

Encoders

These modules contains implementations of Encoders that can transform data in the binary uint8 format required by the OPU. Compatible with numpy.ndarray and torch.Tensor.

class BaseTransformer[source]

Bases: object

Base class for all basic encoders and decoders. Mainly for avoiding empty fit methods and provide an automatic fit_transform method.

fit(X, y=None)[source]

No-op, exists for compatibility with the scikit-learn API.

Parameters
  • X – 2D np.ndarray or torch.Tensor

  • y – 1D np.ndarray or torch.Tensor

Returns

Encoder object

transform(X)[source]

Function to encode or decode an array X.

Parameters

X – 2D np.ndarray or torch.Tensor

Returns

2D np.ndarray or torch.Tensor of uint8

class BinaryThresholdEncoder(threshold_enc='auto', greater_is_one=True)[source]

Bases: lightonml.encoding.base.BaseTransformer

Implements binary encoding using a threshold function.

Parameters
  • threshold_enc (int or str) – Threshold for the binary encoder. Default 0. ‘auto’ will set threshold_enc to feature-wise median of the data passed to the fit function.

  • greater_is_one (bool) – If True, above threshold is 1 and below 0. Vice versa if False.

threshold_enc

Threshold for the binary encoder.

Type

int or str

greater_is_one

If True, above threshold is 1 and below 0. Vice versa if False.

Type

bool

fit(X, y=None)[source]

When threshold_enc is ‘auto’, this method sets it to a vector containing the median of each column of X. Otherwise, it does nothing except print a warning in case threshold_enc is not in the range covered by X.

Parameters
  • X (np.ndarray,) – the input data to encode.

  • y (np.ndarray,) – the targets data.

Returns

self

Return type

BinaryThresholdEncoding

transform(X)[source]

Transforms any numpy array in a uint8 binary array of [0, 1].

Parameters

X (np.ndarray or torch.Tensor) – the input data to encode.

Returns

X_enc – uint8 containing only zeros and ones the encoded data.

Return type

np.ndarray or torch.Tensor

class ConcatenatedBitPlanEncoder(n_bits=8, starting_bit=0)[source]

Bases: lightonml.encoding.base.BaseTransformer

Implements an encoding that works by concatenating bitplanes along the feature dimension.

n_bits + starting_bit must be lower than the bitwidth of data that are going to be fed to the encoder. E.g. if X.dtype is uint8, then n_bits + starting_bit must be lower than 8. If instead X.dtype is uint32, then n_bits + starting_bit must be lower than 32.

Read more in the Examples section.

Parameters
  • n_bits (int, defaults to 8,) – number of bits to keep during the encoding. Must be positive.

  • starting_bit (int, defaults to 0,) – bit used to start the encoding, previous bits will be thrown away. Must be positive.

n_bits

number of bits to keep during the encoding.

Type

int,

starting_bit

bit used to start the encoding, previous bits will be thrown away.

Type

int,

transform(X)[source]

Performs the encoding.

Parameters

X (2D np.ndarray of uint8, 16, 32 or 64 [n_samples, n_features],) – input data to encode.

Returns

X_enc – encoded input data. A line is arranged as [bits_for_first_feature, …, bits_for_last_feature].

Return type

2D np.ndarray of uint8 [n_samples, n_features*n_bits]

class ConcatenatedFloat32Encoder(sign_bit=True, exp_bits=8, mantissa_bits=23)[source]

Bases: lightonml.encoding.base.BaseTransformer

Implements an encoding that works by concatenating bitplanes and selecting how many bits to keep for sign, mantissa and exponent of the float32.

Parameters
  • sign_bit (bool, defaults to True,) – if True keeps the bit for the sign.

  • exp_bits (int, defaults to 8,) – number of bits of the exponent to keep.

  • mantissa_bits (int, defaults to 23,) – number of bits of the mantissa to keep.

sign_bit

if True keeps the bit for the sign.

Type

bool, defaults to True,

exp_bits

number of bits of the exponent to keep.

Type

int, defaults to 8,

mantissa_bits

number of bits of the mantissa to keep.

Type

int, defaults to 23,

n_bits

total number of bits to keep.

Type

int,

indices

list of the indices of the bits to keep.

Type

list,

transform(X)[source]

Performs the encoding.

Parameters

X (2D np.ndarray of float32 [n_samples, n_features],) – input data to encode.

Returns

X_enc – encoded input data.

Return type

2D np.ndarray of uint8 [n_samples*n_bits, n_features],

class ConcatenatingBitPlanDecoder(n_bits=8, decoding_decay=0.5)[source]

Bases: lightonml.encoding.base.BaseTransformer

Implements a decoding that works by concatenating bitplanes.

n_bits MUST be the same value used in SeparatedBitPlanEncoder. Read more in the Examples section.

Parameters
  • n_bits (int, defaults to 8,) – number of bits used during the encoding.

  • decoding_decay (float, defaults to 0.5,) – decay to apply to the bits during the decoding.

n_bits

number of bits used during the encoding.

Type

int,

decoding_decay

decay to apply to the bits during the decoding.

Type

float, defaults to 0.5,

transform(X)[source]

Performs the decoding.

Parameters

X (2D np.ndarray of uint8 or uint16,) – input data to decode.

Returns

X_dec – decoded data.

Return type

2D np.ndarray of floats

class Float32Encoder(sign_bit=True, exp_bits=8, mantissa_bits=23)[source]

Bases: lightonml.encoding.base.BaseTransformer

Implements an encoding that works by separating bitplans and selecting how many bits to keep for sign, mantissa and exponent of the float32.

Parameters
  • sign_bit (bool, defaults to True,) – if True keeps the bit for the sign.

  • exp_bits (int, defaults to 8,) – number of bits of the exponent to keep.

  • mantissa_bits (int, defaults to 23,) – number of bits of the mantissa to keep.

sign_bit

if True keeps the bit for the sign.

Type

bool, defaults to True,

exp_bits

number of bits of the exponent to keep.

Type

int, defaults to 8,

mantissa_bits

number of bits of the mantissa to keep.

Type

int, defaults to 23,

n_bits

total number of bits to keep.

Type

int,

indices

list of the indices of the bits to keep.

Type

list,

transform(X)[source]

Performs the encoding.

Parameters

X (2D np.ndarray of float32 [n_samples, n_features],) – input data to encode.

Returns

X_enc – encoded input data.

Return type

2D np.ndarray of uint8 [n_samples*n_bits, n_features],

class MultiThresholdEncoder(thresholds='linspace', n_bins=8, columnwise=False)[source]

Bases: lightonml.encoding.base.BaseTransformer

Implements binary encoding using multiple thresholds.

Parameters
  • thresholds (list, np.ndarray or str) – thresholds for the binary encoder. If a list or an array is passed, the thresholds will be used unmodified. If thresholds=’linspace’, the values will be evenly distributed along the data range. If thresholds=’quantile’, the values will be set to the quantiles corresponding to n_bins. If n_bins=4, the thresholds will be the 1st, 2nd and 3rd quartiles.

  • columnwise (bool,) – whether to use different thresholds for each column or a common set of thresholds for everything.

  • n_bins (int,) – if thresholds is ‘linspace’ or ‘quantiles’, n_bins - 1 thresholds will be created?

thresholds

thresholds for the binary encoder.

Type

np.ndarray,

columnwise

whether to use different thresholds for each column or a common set of thresholds for everything.

Type

bool,

n_bins

number of different values the encoding can take. A value is encoded into n_bins-1 bits.

Type

int,

fit(X, y=None)[source]

If thresholds is not None, this method doesn’t do anything. If thresholds is None, computes n_bins thresholds equally spaced on the range of X. The range of X is determined column-wise but the number of bins is the same for all features.

Parameters
  • X (2D np.ndarray) –

  • y (1D np.ndarray) –

Returns

self

Return type

MultiThresholdEncoder

transform(X)[source]

Transforms an array to a uint8 binary array of [0, 1].

The bins defined by the thresholds are not mutually exclusive, i.e a value x will activate all the bins corresponding to thresholds lesser than x.

Parameters

X (np.ndarray of size n_sample x n_features) – The input data to encode.

Returns

X_enc – The encoded data.

Return type

np.ndarray of uint8, of size n_samples x (n_features x n_bins)

class NoDecoding[source]

Bases: lightonml.encoding.base.BaseTransformer

Implements a No-Op Decoding class for API consistency.

transform(X)[source]

Function to encode or decode an array X.

Parameters

X – 2D np.ndarray or torch.Tensor

Returns

2D np.ndarray or torch.Tensor of uint8

class NoEncoding[source]

Bases: lightonml.encoding.base.BaseTransformer

Implements a No-Op Encoding class for API consistency.

transform(X)[source]

Function to encode or decode an array X.

Parameters

X – 2D np.ndarray or torch.Tensor

Returns

2D np.ndarray or torch.Tensor of uint8

class SeparatedBitPlanDecoder(precision, magnitude_p=1, magnitude_n=0, decoding_decay=0.5)[source]

Bases: lightonml.encoding.base.BaseTransformer

transform(X)[source]

Function to encode or decode an array X.

Parameters

X – 2D np.ndarray or torch.Tensor

Returns

2D np.ndarray or torch.Tensor of uint8

class SeparatedBitPlanEncoder(precision=6, **kwargs)[source]

Bases: lightonml.encoding.base.BaseTransformer

Implements an encoder for floating point input

Parameters

precision (int, optional) – The number of binary projections that are preformed to reconstruct an unsigned floating point projection. if the input contains both positive and negative values, the total number of projections is 2*precision

Returns

X_bits

Return type

np.array(dtype = np.unit8)

get_params()[source]

internal information necessary to undo the transformation, must be passed to the SeparatedBitPlanDecoder init.

transform(X)[source]

Function to encode or decode an array X.

Parameters

X – 2D np.ndarray or torch.Tensor

Returns

2D np.ndarray or torch.Tensor of uint8

class SequentialBaseTwoEncoder(n_gray_levels=16)[source]

Bases: lightonml.encoding.base.BaseTransformer

Implements a base 2 encoding.

E.g. \(5\) is written \(101\) in base 2: \(1 * 2^2 + 0 * 2^1 + 1 * 2^0\) = (1)*4 +(0)*2 +(1)*1, so the encoder will give 1111001.

Parameters

n_gray_levels (int,) – number of values that can be encoded. Must be a power of 2.

n_gray_levels

number of values that can be encoded. Must be a power of 2.

Type

int,

n_bits

number of bits needed to encode n_gray_levels values.

Type

int,

offset

value to subtract to get the minimum to 0.

Type

float,

scale

scaling factor to normalize the data.

Type

float,

fit(X, y=None)[source]

Computes parameters for the normalization.

Must be run only on the training set to avoid leaking information to the dev/test set.

Parameters
  • X (np.ndarray of uint [n_samples, n_features],) – the input data to encode.

  • y (np.ndarray,) – the targets data.

Returns

self

Return type

SequentialBaseTwoEncoder.

normalize(X)[source]

Normalize the data in the right range before the integer casting.

Parameters

X (np.ndarray of uint [n_samples, n_features],) – the input data to normalize.

Returns

X_norm – normalized data.

Return type

np.ndarray of uint8 [n_samples, n_features],

transform(X)[source]

Performs the encoding.

Parameters

X (2D np.ndarray of uint [n_samples, n_features],) – input data to encode.

Returns

X_enc – encoded input data.

Return type

2D np.ndarray of uint8 [n_samples, n_features*(n_gray_levels-1)

lightonml.encoding.models

class AE(*args: Any, **kwargs: Any)[source]

Bases: torch.nn.

class ConvAE(*args: Any, **kwargs: Any)[source]

Bases: torch.nn.

Autoencoder consisting of two convolutional layers (encoder, decoder).

Parameters
  • in_ch (int,) – number of input channels

  • out_ch (int,) – number of output channels

  • kernel_size (int or tuple,) – size of the convolutional filters

  • beta (float, default 1.,) – inverse temperature for tanh(beta x)

  • stride (int or tuple, optional, default 1,) – stride of the convolution

  • padding (int or tuple, optional, default 0,) – zero-padding added to both sides of the input

  • padding_mode (str, optional, default 'zeros',) – ‘zeros’ or ‘circular’

  • dilation (int or tuple, optional, default 1,) – spacing between kernel elements

  • groups (int, optional, default 1,) – number of blocked connections from input to output channels

  • bias (bool, optional, default True,) – adds a learnable bias to the output.

  • flatten (bool, default False,) – whether to return a 2D flattened array (batch_size, x) or a 4D (batch_size, out_ch, out_h, out_w) when encoding

encoder

encoding layer

Type

nn.Conv2d,

decoder

decoding layer

Type

nn.TransposeConv2d,

beta

inverse temperature for tanh(beta x)

Type

float,

flatten

whether to return a 2D flattened array (batch_size, x) or a 4D (batch_size, out_ch, out_h, out_w) when encoding

Type

bool, default False,

forward(input)[source]

Returns the reconstructed input or the binary code, depending on self.training. Call eval() on the module for the binary code, train() for the reconstruction.

Parameters

input (torch.Tensor,) – tensor holding the input data

Returns

  • rec (torch.Tensor float,) – if self.training=True returns the reconstruction of the input

  • binary_code (torch.Tensor uint8,) – tensor holding the binary code if self.training=False.

class EncoderDecoder(*args: Any, **kwargs: Any)[source]

Bases: torch.nn.

Autoencoder consisting of two dense layers (encoder, decoder). The decoder weights are the transpose of the encoder ones. Backpropagation updates the decoder, in this way the encoder is also updated despite the non-differentiable non-linearity. Architecture from Tissier et al. (https://arxiv.org/abs/1803.09065).

Parameters
  • input_size (int,) – size of the input

  • hidden_size (int,) – size of the hidden layer

proj

encoding-decoding layer

Type

nn.Linear,

forward(input)[source]

Returns the reconstructed input or the binary code, depending on self.training. Call eval() on the module for the binary code, train() for the reconstruction.

Parameters

input (torch.Tensor,) – tensor holding the input data

Returns

  • rec (torch.Tensor float,) – if self.training=True returns the reconstruction of the input

  • binary_code (torch.Tensor uint8,) – tensor holding the binary code if self.training=False.

class LinearAE(*args: Any, **kwargs: Any)[source]

Bases: torch.nn.

Autoencoder consisting of two dense layers (encoder, decoder). The autoencoder learns to produce a binary output starting from tanh(beta x)/beta with beta=1 and gradually increasing beta to resemble a step function

Parameters
  • input_size (int,) – size of the input

  • hidden_size (int,) – size of the hidden layer

  • beta (float, default 1.,) – inverse temperature for tanh(beta x)

encoder

encoding layer

Type

nn.Linear,

decoder

decoding layer

Type

nn.Linear,

beta

inverse temperature for tanh(beta x)

Type

float,

forward(input)[source]

Returns the reconstructed input or the binary code, depending on self.training. Call eval() on the module for the binary code, train() for the reconstruction.

Parameters

input (torch.Tensor,) – tensor holding the input data

Returns

  • rec (torch.Tensor float,) – if self.training=True returns the reconstruction of the input

  • binary_code (torch.Tensor uint8,) – tensor holding the binary code if self.training=False.

train(model, dataloader, optimizer, criterion=torch.nn.functional.mse_loss, epochs=10, beta_interval=5, device=None, verbose=True)[source]

Utility function to train autoencoders quickly.

Parameters
  • model (nn.Module,) – autoencoder to trained

  • dataloader (torch.utils.data.Dataloader,) – loader for the training dataset of the autoencoder

  • optimizer (torch.optim.Optimizer,) – optimizer used to perform the training

  • criterion (callable, default torch.nn.functional.mse_loss) – loss function for training

  • epochs (int, default 10,) – number of epochs of training

  • beta_interval (int, default 5,) – interval in epochs for beta increase by factor 10

  • device (str, 'cpu' or 'cuda:{idx}') – device used to perform the training.

  • verbose (bool, default True,) – whether to print info on the training

Returns

model – trained model

Return type

nn.Module,