Training a binary autoencoder

This tutorial explains how to train a binary autoencoder in order to obtain a satisfying encoding of your data, to be used as input to the OPU. The architecture and training procedure is adapted from https://arxiv.org/abs/1803.09065.

[1]:
import matplotlib.pyplot as plt
import numpy as np

from sklearn.datasets import make_blobs
from sklearn.decomposition import PCA

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import DataLoader
[2]:
# fake data
n_samples = 1000
n_features = 100
X = torch.FloatTensor(n_samples, n_features).normal_()
y = torch.FloatTensor(n_samples, n_features).normal_()

In the next cell, we define the autoencoder. The encoder consists of a linear layer, followed by a step function, yielding the binary representation of the data. The decoder is simply the transpose of the encoder. This allows us to learn only the decoder via backprop, which will change the encoder at the same time. The non-differentiable activation is therefore not a problem.

[3]:
from lightonml.encoding.models import EncoderDecoder
from lightonml.encoding.models import train
[4]:
batch_size = 64
loader = DataLoader((X, y), batch_size=batch_size)
[5]:
bits_per_feature = 10
encoder = EncoderDecoder(n_features, n_features * bits_per_feature)
optimizer = optim.Adam(encoder.parameters(), lr=1e-3)

A newly created encoder is in training mode and will return the reconstructed input:

[6]:
encoder.training
[6]:
True

We now train it on our data, it is quite fast. The train function from lightonml.encoding.models will automatically move the encoder to GPU if one is available.

[7]:
model = train(encoder, loader, optimizer, criterion=F.mse_loss, epochs=10)
Epoch: [1/10], Training Loss: 3.3905184268951416
Epoch: [2/10], Training Loss: 3.2169742584228516
Epoch: [3/10], Training Loss: 3.048205614089966
Epoch: [4/10], Training Loss: 2.8862180709838867
Epoch: [5/10], Training Loss: 2.730818748474121
Epoch: [6/10], Training Loss: 2.581012725830078
Epoch: [7/10], Training Loss: 2.438007116317749
Epoch: [8/10], Training Loss: 2.3007233142852783
Epoch: [9/10], Training Loss: 2.1698696613311768
Epoch: [10/10], Training Loss: 2.044557571411133

We set the encoder to eval mode:

[8]:
model.eval()
model.training
[8]:
False

It is ready to encode:

[9]:
# we move the data to the GPU where the encoder lives
# and fetch the binary code from it
Xenc = encoder(X.to('cuda')).cpu()
Xenc.shape, Xenc.dtype, torch.unique(Xenc)
[9]:
(torch.Size([1000, 1000]), torch.uint8, tensor([0, 1], dtype=torch.uint8))

Of course, encoder can also be used on validation and test data, that weren’t used to train the autoencoder.

Using a “real” toy dataset

[10]:
n_samples = 10000
n_features = 50
X, y = make_blobs(n_samples=n_samples, n_features=n_features, centers=5)

We visualise a PCA of the data:

[11]:
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X)
[12]:
fig, ax = plt.subplots(figsize=(6,6))
for i in np.unique(y):
    ax.scatter(X_pca[y==i,0], X_pca[y==i,1], s=2, label='y={}'.format(i))

ax.legend()
[12]:
<matplotlib.legend.Legend at 0x7fde68155240>
../_images/examples_autoencoder_tutorial_19_1.png

We see the 5 clusters created by make_blobs. Ideally, our encoder should preserve this structure in the binary encoding. Let us encode the data:

[13]:
X = torch.from_numpy(X).float()  # by default X in numpy is double, so we cast to float
loader = DataLoader((X, X), batch_size=batch_size)  # loader in `train` assumes a tuple
[14]:
encoder = EncoderDecoder(n_features, n_features * bits_per_feature)
optimizer = optim.Adam(encoder.parameters(), lr=1e-3)
[15]:
encoder.training
[15]:
True
[16]:
model = train(encoder, loader, optimizer, criterion=F.mse_loss, epochs=10)
Epoch: [1/10], Training Loss: 13.765605926513672
Epoch: [2/10], Training Loss: 13.812228202819824
Epoch: [3/10], Training Loss: 13.858051300048828
Epoch: [4/10], Training Loss: 13.903722763061523
Epoch: [5/10], Training Loss: 13.948320388793945
Epoch: [6/10], Training Loss: 13.99094295501709
Epoch: [7/10], Training Loss: 14.032180786132812
Epoch: [8/10], Training Loss: 14.070993423461914
Epoch: [9/10], Training Loss: 14.106682777404785
Epoch: [10/10], Training Loss: 14.139419555664062
[17]:
encoder.eval()
# we move the encoder to cpu, but we could also move the data to GPU
# for faster processing as we did before
encoder.to('cpu')
Xenc = encoder(X)
Xenc.shape, Xenc.dtype, torch.unique(Xenc)
[17]:
(torch.Size([10000, 500]), torch.uint8, tensor([0, 1], dtype=torch.uint8))

And we visualize it again:

[18]:
pca = PCA(n_components=2)
Xenc_pca = pca.fit_transform(Xenc.numpy())
[19]:
fig, ax = plt.subplots(figsize=(6,6))
for i in np.unique(y):
    ax.scatter(Xenc_pca[y==i,0], Xenc_pca[y==i,1], s=2, label='y={}'.format(i))

ax.legend()
[19]:
<matplotlib.legend.Legend at 0x7fde680c7b38>
../_images/examples_autoencoder_tutorial_28_1.png

The 5 original clusters are well preserved. The encoder does its job !