lightonml.utils

This file contains some utils function to deal with data and load and save models.

cast_01_to_uint8(X)[source]

Casts binary data to uint8.

Parameters

X (np.ndarray,) – input data.

Returns

X_uint8 – input data in uint8.

Return type

np.ndarray,

download(url, directory)[source]

Download data from url into directory

get_ml_data_dir_path()[source]

Get the data directory folder.

It can be defined from the following locations (listed in decreasing priority): * LIGHTONML_DATA_DIR environment variable * lightonml.set_ml_data_dir() function * ~/.lighton.json * /etc/lighton.json * /etc/lighton/host.json

For JSON files the parameter is to be defined the ml_data_path field.

Returns

Return type

pathlib.Path, location of the data folder.

load_data_from_numpy_archive(path_to_file)[source]

Loads data from NumPy archive.

Parameters

path_to_file (str,) – path to the numpy archive to load.

Returns

  • (X_train, y_train) (tuple of np.ndarray,) – train set.

  • (X_test, y_test) (tuple of np.ndarray,) – test set.

load_model(model_path)[source]

Loads the model from a pickle file.

Parameters

model_path (str,) – path for the pickle file of the model.

Returns

model – instance of the model.

Return type

BaseEstimator, RegressorMixin or TransformerMixin and children,

save_model(model, model_name, model_path)[source]

Saves a model in a pickle file.

Parameters
  • model (BaseEstimator, RegressorMixin or TransformerMixin and children,) – instance of the model to save.

  • model_name (str,) – name of the pickle file of the saved model.

  • model_path (str,) – path for the directory of the pickle file of the saved model.

select_subset(X, y, classes=range(0, 10), ratio=1, random_state=None)[source]

Selects a subset of a dataset.

Parameters
  • X (2D np.ndarray,) – input data.

  • y (np.ndarray,) – targets.

  • classes (list or np.ndarray,) – number of classes in the dataset.

  • ratio (float,) – controls the ratio between examples.

  • random_state (int, RandomState instance or None, optional, defaults to None,) – controls the pseudo random number generator used to subsample the dataset.

Returns

  • X (np.ndarray,) – subsampled data.

  • y (np.ndarray,) – subsampled targets.