numpy_datasets.utils
¶
numpy_datasets.utils.patchify_1d (x, …) |
extract patches from a numpy array |
numpy_datasets.utils.patchify_2d (x, …) |
|
numpy_datasets.utils.train_test_split (*args) |
split given data into two non overlapping sets |
numpy_datasets.utils.batchify (*args, batch_size) |
|
numpy_datasets.utils.resample_images (images, …) |
|
numpy_datasets.utils.download_dataset (path, …) |
dataset downlading utility |
numpy_datasets.utils.extract_file (filename, …) |
Detailed description¶
-
numpy_datasets.utils.
patchify_1d
(x, window_length, stride)[source]¶ extract patches from a numpy array
Parameters: - x (array-like) – the input data to extract patches from, any shape, the last dimension is the one being patched
- window_length (int) – the length of the patches
- stride (int) – the amount of stride (bins separating two consecutive patches
Returns: x_patches – the number of patches is put in the pre-last dimension (-2)
Return type: array-like
-
numpy_datasets.utils.
train_test_split
(*args, train_size=0.8, stratify=None, seed=None)[source]¶ split given data into two non overlapping sets
Parameters: - *args (inputs) – the sets to be split by the function
- train_size (scalar) – the amount of data to put in the first set, either an integer value being the actual number of data to keep, or a ratio (0 to 1 number)
- stratify (array (optional)) – the optimal stratify guide to spit the array s.t. the same proportion based on the stratify array is kep in both set based on the proportion of the split
- seed (integer (optional)) – the seed for the random number generator for reproducibility
Returns: Example
x = numpy.random.randn(100, 4) y = numpy.random.randn(100) train, test = train_test_split(x, y, train_size=0.5) print(train[0].shape, train[1].shape) # (50, 4) (50,) print(test[0].shape, test[1].shape) # (50, 4) (50,)
-
class
numpy_datasets.utils.
batchify
(*args, batch_size, option='random', load_func=None, extra_process=0, n_batches=None)[source]¶
-
numpy_datasets.utils.
resample_images
(images, target_shape, ratio='same', order=1, mode='nearest', data_format='channels_first')[source]¶
-
numpy_datasets.utils.
download_dataset
(path, dataset, urls_names, baseurl='', extract=False)[source]¶ dataset downlading utility
Args:
- path: string
- the path where the dataset should be download
- dataset: string
- the name of the dataset, used as the folder name
- urls_names: dict
- dictionnary mapping urls to filename. If the urls have a common root, then it can be omited from this variable and put into the baseurl argument
- baseurl: string
- the common url to prepend onto each url in urls_names