lightonml.datasets

This module contains functions to load some common datasets. All datasets return tuples of train and test examples and labels. Grayscale images have shape (height, width), RGB images have shape (3, height, width). All functions look for a .lightonml_config file to read the data location. If it doesn’t exist, they create one, with your home directory as the default data directory location. You can change it by changing the config file .lighton.json.

CIFAR10()[source]

Data Loader for the CIFAR10 dataset.

Returns

  • (X_train, y_train) (tuple of np.ndarray of np.uint8, of shape (50000, 3, 96, 96) and (50000,)) – train CIFAR10 images and labels.

  • (X_test, y_test) (tuple of np.ndarray of np.uint8, of shape (10000, 3, 96, 96) and (10000,)) – test CIFAR10 images and labels.

CIFAR100()[source]

Data Loader for the CIFAR100 dataset.

Returns

  • (X_train, y_train) (tuple of np.ndarray of np.uint8, of shape (50000, 3, 96, 96) and (50000,)) – train CIFAR100 images and labels.

  • (X_test, y_test) (tuple of np.ndarray of np.uint8, of shape (10000, 3, 96, 96) and (10000,)) – test CIFAR100 images and labels.

FashionMNIST()[source]

Data Loader for the FashionMNIST dataset.

Returns

  • (X_train, y_train) (tuple of np.ndarray of np.uint8, of shape (60000, 28, 28) and (60000,)) – train flattened FashionMNIST images and labels.

  • (X_test, y_test) (tuple of np.ndarray of np.uint8, of shape (10000, 28, 28) and (10000,)) – test flattened FashionMNIST images and labels.

MNIST()[source]

Data loader for the MNIST dataset.

Returns

  • (X_train, y_train) (tuple of np.ndarray of np.uint8, of shape (60000, 28, 28) and (60000,)) – train flattened MNIST images and labels.

  • (X_test, y_test) (tuple of np.ndarray of np.uint8, of shape (10000, 28, 28) and (10000,)) – test flattened MNIST images and labels.

STL10(unlabeled=False)[source]

Data Loader for the STL10 dataset.

Parameters

unlabeled (bool, default to False,) – if True returns also the unlabeled part of the dataset

Returns

  • (X_train, y_train) (tuple of np.ndarray of np.uint8, of shape (5000, 3, 96, 96) and (5000,)) – train STL10 images and labels.

  • (X_test, y_test) (tuple of np.ndarray of np.uint8, of shape (8000, 3, 96, 96) and (8000,)) – test STL10 images and labels.

  • X_unlabeled (np.ndarray of np.uint8, of shape (100000, 3, 96, 96),) – unlabeled images from STL10.

SignMNIST()[source]

Data Loader for the SignMNIST dataset. Each training and test case represents a label (0-25) as a one-to-one map for each alphabetic letter A-Z.

https://www.kaggle.com/datamunge/sign-language-mnist/home

Returns

  • (X_train, y_train) (tuple of np.ndarray of np.uint8, of shape (27455, 784) and (27455,)) – train flattened SignMNIST images and labels.

  • (X_test, y_test) (tuple of np.ndarray of np.uint8, of shape (7172, 784) and (7172,)) – test flattened SignMNIST images and labels.

movielens100k(processed=False)[source]

Data Loader for the Movielens-100k dataset. It consists of 100,000 ratings (1-5) from 943 users on 1682 movies.

F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4, Article 19 (December 2015), 19 pages. DOI=http://dx.doi.org/10.1145/2827872

Parameters

processed (bool, default False,) – if False, returns the raw data in a list of lists. If True, the user-item ratings matrix of shape (943, 1682)

Returns

ratings

Return type

depending on the value of processed, a list of lists or user-item ratings matrix (np.ndarray)

movielens20m(processed=False, id_to_movie=False)[source]

Data Loader for the Movielens-20m dataset. It consists of 20000263 ratings (1-5) from 138493 users on 27278 movies.

F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4, Article 19 (December 2015), 19 pages. DOI=http://dx.doi.org/10.1145/2827872

Parameters
  • processed (bool, default False,) – if False, returns the raw data in a list of lists. If True, the user-item ratings matrix of shape (138493, 27278)

  • id_to_movie (bool, default False,) – if True returns also the mapping from movieId to movie name.

Returns

  • ratings (depending on the value of processed, a list of lists or user-item ratings matrix (np.ndarray))

  • id_to_movie_mapping (None or list of lists,) – mapping between movieId and movie name.