# Extending LightOnML¶

To extend the LightOnML API, follow the guide to extend scikit-learn.
LightOnML uses their same API for all objects, with methods `fit`

, `transform`

, `predict`

and `score`

.

## Writing custom Encoders and Decoders¶

THe OPU accepts data in binary format, i.e. as arrays of zeros and ones, therefore we need to convert the data we want to treat
in a format compatible with the OPU. This operation is called *encoding*.
A selection of encoders is provided in `lightonml.encoding.base`

, but it’s possible to write and use new ones.

Following the guide to extend `sklearn`

, an encoder inherits from `BaseEstimator`

and `TransformerMixin`

and has the methods
`fit`

and `transform`

.
It should accept an `np.ndarray`

of any shape and any type and return an 2D `np.ndarray`

of zeros and ones and `dtype=uint8`

.
For example we can write an encoder that separates the bitplans of `uint8`

elements and passes each bitplan to the OPU.
*Remark*: the following implementation shouldn’t be used in your code, because error handling has been removed for clarity.

```
class SeparatedBitPlanEncoder(BaseEstimator, TransformerMixin):
# multiple inheritance from BaseEstimator and TransformerMixin
def __init__(self, n_bits=8, starting_bit=0):
super(SeparatedBitPlanEncoder, self).__init__()
self.n_bits = n_bits
self.starting_bit = starting_bit
def fit(self, X, y=None):
# no-op: we don't need to fit anything for this encoder
return self
def transform(self, X):
bitwidth = X.dtype.itemsize*8
n_samples, n_features = X.shape
# add a dimension [n_samples, n_features, 1] and returns a view of the data as uint8
X_uint8 = np.expand_dims(X, axis=2).view(np.uint8)
# Unpacks the bits along the auxiliary axis
X_uint8_unpacked = np.unpackbits(X_uint8, axis=2)
# Reverse the order of bits: from LSB to MSB
X_uint8_reversed = np.flip(X_uint8_unpacked, axis=2)
# Transpose and reshape to 2D
X_enc = np.transpose(X_uint8_reversed, [0, 2, 1])
X_enc = X_enc[:, self.starting_bit:self.n_bits + self.starting_bit, :]
X_enc = X_enc.reshape((n_samples * self.n_bits, n_features))
return X_enc
```

The class attributes are assigned in the `__init__`

method and `transform`

performs a series of transformation on the input array
until it returns a 2D `np.ndarray`

of `uint8`

containing only zeros and ones.

When designing encoders one should keep in mind that there is a trade-off between fine-grained resolution and performance. Models generally don’t need high resolution, a coarse representation can be sufficient and even act as a regularizer. For example, the last bitplan of RGB images is often just noise.

Some encoders just transform the input data to a binary format (e.g. `BinaryThresholdEncoder`

), some others, like
`SeparatedBitPlanEncoder`

, need a decoding step after the data have been transformed by the OPU.

Custom decoders can be created following the same steps: multiple inheritance from `Base Estimator`

and `TransformerMixin`

and implementation of `fit`

and `transform`

methods. As an example, we write the code for the `MixingBitPlanDecoder`

:

```
class MixingBitPlanDecoder(BaseEstimator, TransformerMixin):
# multiple inheritance from BaseEstimator and TransformerMixin
def __init__(self, n_bits=8, decoding_decay=0.5):
super(MixingBitPlanDecoder, self).__init__()
self.n_bits = n_bits
self.decoding_decay = decoding_decay
def fit(self, X, y=None):
# no-op: we don't need to fit anything for this decoder
return self
def transform(self, X, y=None):
n_out, n_features = X.shape
n_dim_0 = int(n_out / self.n_bits)
X = np.reshape(X, (n_dim_0, self.n_bits, n_features))
# compute factors for each bit to weight their significance
decay_factors = np.reshape(self.decoding_decay ** np.arange(self.n_bits), self.n_bits)
X_dec = np.einsum('ijk,j->ik', X, decay_factors).astype('float32')
return X_dec
```

Again, the class attributes are defined in the `__init__`

call and `transform`

performs a series of operation in the input
vector until it returns an `np.ndarray`

.

## Formatting mechanics¶

OPURandomMapping accepts a parameter `position`

that influences how the samples are displayed on the DMD. WHen the OPU receives a
2D array of shape `(n_samples, n_features)`

, before each row gets displayed on the DMD, the low-level OPU interface transforms
it in a 1D binary array of size `(1.140 * 912) = (1.039.680)`

.
Each value in the row gets repeated a few times in a small region of the DMD to improve the signal-to-noise ratio (SNR).
These regions are called *macropixels*. If the ROI on the DMD is smaller than its total area, the macropixels are built in the ROI and the array is
padded with zeros.

The formatting function can be chosen by passing the name of the desired formatting as the parameter `position`

when initializing `OPURandomMapping`

.
The formatting happens in three steps in `lightonml.encoding.utils`

:
- there is a selection of functions that compute the indices that each value in the row will occupy in the ROI;
- a function `compute_new_indices_greater_rectangle`

that takes the indices for the ROI and computes them for the whole DMD area;
- a C++ function `to_opu_format_multiple`

wrapped in Python takes care of the heavy lifting by building the array placing the values at the right indices.
The function `get_formatting_function`

in `lightonml.encoding.utils`

returns the function that performs the chosen formatting. This is
used internally in the `transform`

method of `OPURandomMapping.`

The `OPURandomMapping`

class accepts also a `callable`

as `position`

parameter, therefore to use a custom formatting, follow these steps:
- implement a function that computes the indices where each value will go in the ROI;
- use `compute_new_indices_greater_rectangle`

to compute the indices in the whole DMD area from the ones in the ROI;
- use `to_opu_format_multiple`

to perform the upsampling;
- wrap these operations in a single function that returns the formatted array and pass it to `OPURandomMapping`

as `position`

.

Here, for example, the implementation of a formatting function that simply repeats each value a certain number of times and pads the resulting array if needed.

```
import numpy as np
from lightonml.encoding.opu_formatting import to_opu_format_multiple
from lightonml.encoding.utils import compute_new_indices_greater_rectangle
def compute_indices_lined(n_features, rectangle_shape):
rectangle_size = rectangle_shape[0] * rectangle_shape[1]
# compute how many times it is possible to repeat each value
factor = int(np.floor(rectangle_size / n_features))
indices = np.arange(n_features * factor, dtype=np.int32)
return indices, factor
def formatting_function_lined(x, roi_shape=(1140, 912), roi_position=(0, 0),
dmd_shape=(1140, 912)):
# number of features is always the last dimension (2D and 3D case)
n_features = x.shape[-1]
# compute indices in the ROI
indices_roi, factor = compute_indices_lined(n_features, roi_shape)
# compute indices in the whole DMD
indices_dmd = compute_new_indices_greater_rectangle(indices_roi, roi_shape,
roi_position, dmd_shape)
# format the array
formatted_array = to_opu_format_multiple(indices_dmd, x, factor)
return formatted_array
```

Now `formatting_function_lined`

can be used as `position`

parameter:

```
import numpy as np
from lightonml.random_projections.opu import OPURandomMapping
from lightonopu.opu import OPU
x = np.ones((200, 10000), dtype='uint8')
opu = OPU()
mapping = OPURandomMapping(opu, n_components=50000, position=formatting_function_lined)
y= mapping.fit_transform(x)
```