# Extending LightOnML¶

To extend the LightOnML API, follow the guide to extend scikit-learn.
LightOnML uses their same API for all objects, with methods `fit`

, `transform`

, `predict`

and `score`

.

## Writing custom Encoders and Decoders for sklearn¶

THe OPU accepts data in binary format, i.e. as arrays of zeros and ones, therefore we need to convert the data we want to treat
in a format compatible with the OPU. This operation is called *encoding*.
A selection of encoders is provided in `lightonml.encoding.base`

, but it’s possible to write and use new ones.

Following the guide to extend `sklearn`

, an encoder has the methods `fit`

and `transform`

in order to follow the scikit-learn Transformer interface.

It should accept an `np.ndarray`

of any shape and any type and return an 2D `np.ndarray`

of zeros and ones and `dtype=uint8`

.

For example we can write an encoder that separates the bitplans of `uint8`

elements and passes each bitplan to the OPU.
*Remark*: the following implementation shouldn’t be used in your code, because error handling has been removed for clarity.

```
class SeparatedBitPlanEncoder:
def __init__(self, n_bits=8, starting_bit=0):
self.n_bits = n_bits
self.starting_bit = starting_bit
def fit(self, X, y=None):
# no-op: we don't need to fit anything for this encoder
return self
def transform(self, X):
bitwidth = X.dtype.itemsize*8
n_samples, n_features = X.shape
# add a dimension [n_samples, n_features, 1] and returns a view of the data as uint8
X_uint8 = np.expand_dims(X, axis=2).view(np.uint8)
# Unpacks the bits along the auxiliary axis
X_uint8_unpacked = np.unpackbits(X_uint8, axis=2)
# Reverse the order of bits: from LSB to MSB
X_uint8_reversed = np.flip(X_uint8_unpacked, axis=2)
# Transpose and reshape to 2D
X_enc = np.transpose(X_uint8_reversed, [0, 2, 1])
X_enc = X_enc[:, self.starting_bit:self.n_bits + self.starting_bit, :]
X_enc = X_enc.reshape((n_samples * self.n_bits, n_features))
return X_enc
```

The class attributes are assigned in the `__init__`

method and `transform`

performs a series of transformation on the input array
until it returns a 2D `np.ndarray`

of `uint8`

containing only zeros and ones.

When designing encoders one should keep in mind that there is a trade-off between fine-grained resolution and performance. Models generally don’t need high resolution, a coarse representation can be sufficient and even act as a regularizer. For example, the last bitplan of RGB images is often just noise.

Some encoders just transform the input data to a binary format (e.g. `BinaryThresholdEncoder`

), some others, like
`SeparatedBitPlanEncoder`

, need a decoding step after the data have been transformed by the OPU.

- Custom decoders can be created following the same steps: implementation of
`fit`

and`transform`

methods. As an example, we write the code for the

`MixingBitPlanDecoder`

:

```
class MixingBitPlanDecoder:
def __init__(self, n_bits=8, decoding_decay=0.5):
self.n_bits = n_bits
self.decoding_decay = decoding_decay
def fit(self, X, y=None):
# no-op: we don't need to fit anything for this decoder
return self
def transform(self, X, y=None):
n_out, n_features = X.shape
n_dim_0 = int(n_out / self.n_bits)
X = np.reshape(X, (n_dim_0, self.n_bits, n_features))
# compute factors for each bit to weight their significance
decay_factors = np.reshape(self.decoding_decay ** np.arange(self.n_bits), self.n_bits)
X_dec = np.einsum('ijk,j->ik', X, decay_factors).astype('float32')
return X_dec
```

Again, the class attributes are defined in the `__init__`

call and `transform`

performs a series of operation in the input
vector until it returns an `np.ndarray`

.