numpy_datasets.utils

numpy_datasets.utils.patchify_1d(x, …) extract patches from a numpy array
numpy_datasets.utils.patchify_2d(x, …)
numpy_datasets.utils.train_test_split(*args) split given data into two non overlapping sets
numpy_datasets.utils.batchify(*args, batch_size)
numpy_datasets.utils.resample_images(images, …)
numpy_datasets.utils.download_dataset(path, …) dataset downlading utility
numpy_datasets.utils.extract_file(filename, …)

Detailed description

numpy_datasets.utils.patchify_1d(x, window_length, stride)[source]

extract patches from a numpy array

Parameters:
  • x (array-like) – the input data to extract patches from, any shape, the last dimension is the one being patched
  • window_length (int) – the length of the patches
  • stride (int) – the amount of stride (bins separating two consecutive patches
Returns:

x_patches – the number of patches is put in the pre-last dimension (-2)

Return type:

array-like

numpy_datasets.utils.patchify_2d(x, window_length, stride)[source]
numpy_datasets.utils.train_test_split(*args, train_size=0.8, stratify=None, seed=None)[source]

split given data into two non overlapping sets

Parameters:
  • *args (inputs) – the sets to be split by the function
  • train_size (scalar) – the amount of data to put in the first set, either an integer value being the actual number of data to keep, or a ratio (0 to 1 number)
  • stratify (array (optional)) – the optimal stratify guide to spit the array s.t. the same proportion based on the stratify array is kep in both set based on the proportion of the split
  • seed (integer (optional)) – the seed for the random number generator for reproducibility
Returns:

  • train_set (list) – returns the train data, the list has the members of *args split
  • test_set (list) – returns the test data, the list has the members of *args split

Example

x = numpy.random.randn(100, 4)
y = numpy.random.randn(100)

train, test = train_test_split(x, y, train_size=0.5)
print(train[0].shape, train[1].shape)
# (50, 4) (50,)
print(test[0].shape, test[1].shape)
# (50, 4) (50,)
class numpy_datasets.utils.batchify(*args, batch_size, option='random', load_func=None, extra_process=0, n_batches=None)[source]
numpy_datasets.utils.resample_images(images, target_shape, ratio='same', order=1, mode='nearest', data_format='channels_first')[source]
numpy_datasets.utils.download_dataset(path, dataset, urls_names, baseurl='', extract=False)[source]

dataset downlading utility

Args:

path: string
the path where the dataset should be download
dataset: string
the name of the dataset, used as the folder name
urls_names: dict
dictionnary mapping urls to filename. If the urls have a common root, then it can be omited from this variable and put into the baseurl argument
baseurl: string
the common url to prepend onto each url in urls_names
numpy_datasets.utils.extract_file(filename, target)[source]