# Extending LightOnML¶

To extend the LightOnML API, follow the guide to extend scikit-learn. LightOnML uses their same API for all objects, with methods fit, transform, predict and score.

## Writing custom Encoders and Decoders for sklearn¶

THe OPU accepts data in binary format, i.e. as arrays of zeros and ones, therefore we need to convert the data we want to treat in a format compatible with the OPU. This operation is called encoding. A selection of encoders is provided in lightonml.encoding.base, but it’s possible to write and use new ones.

Following the guide to extend sklearn, an encoder has the methods fit and transform in order to follow the scikit-learn Transformer interface.

It should accept an np.ndarray of any shape and any type and return an 2D np.ndarray of zeros and ones and dtype=uint8.

For example we can write an encoder that separates the bitplans of uint8 elements and passes each bitplan to the OPU. Remark: the following implementation shouldn’t be used in your code, because error handling has been removed for clarity.

class SeparatedBitPlanEncoder:
def __init__(self, n_bits=8, starting_bit=0):
self.n_bits = n_bits
self.starting_bit = starting_bit

def fit(self, X, y=None):
# no-op: we don't need to fit anything for this encoder
return self

def transform(self, X):
bitwidth = X.dtype.itemsize*8
n_samples, n_features = X.shape

# add a dimension [n_samples, n_features, 1] and returns a view of the data as uint8
X_uint8 = np.expand_dims(X, axis=2).view(np.uint8)

# Unpacks the bits along the auxiliary axis
X_uint8_unpacked = np.unpackbits(X_uint8, axis=2)

# Reverse the order of bits: from LSB to MSB
X_uint8_reversed = np.flip(X_uint8_unpacked, axis=2)

# Transpose and reshape to 2D
X_enc = np.transpose(X_uint8_reversed, [0, 2, 1])
X_enc = X_enc[:, self.starting_bit:self.n_bits + self.starting_bit, :]
X_enc = X_enc.reshape((n_samples * self.n_bits, n_features))
return X_enc


The class attributes are assigned in the __init__ method and transform performs a series of transformation on the input array until it returns a 2D np.ndarray of uint8 containing only zeros and ones.

When designing encoders one should keep in mind that there is a trade-off between fine-grained resolution and performance. Models generally don’t need high resolution, a coarse representation can be sufficient and even act as a regularizer. For example, the last bitplan of RGB images is often just noise.

Some encoders just transform the input data to a binary format (e.g. BinaryThresholdEncoder), some others, like SeparatedBitPlanEncoder, need a decoding step after the data have been transformed by the OPU.

Custom decoders can be created following the same steps: implementation of fit and transform methods.

As an example, we write the code for the MixingBitPlanDecoder:

class MixingBitPlanDecoder:
def __init__(self, n_bits=8, decoding_decay=0.5):
self.n_bits = n_bits
self.decoding_decay = decoding_decay

def fit(self, X, y=None):
# no-op: we don't need to fit anything for this decoder
return self

def transform(self, X, y=None):
n_out, n_features = X.shape
n_dim_0 = int(n_out / self.n_bits)
X = np.reshape(X, (n_dim_0, self.n_bits, n_features))

# compute factors for each bit to weight their significance
decay_factors = np.reshape(self.decoding_decay ** np.arange(self.n_bits), self.n_bits)
X_dec = np.einsum('ijk,j->ik', X, decay_factors).astype('float32')

return X_dec


Again, the class attributes are defined in the __init__ call and transform performs a series of operation in the input vector until it returns an np.ndarray.