Development¶
The numpy-datasets project was started by Randall Balestriero in early 2020. As an open-source project, we highly welcome contributions (current contributors) !
Philosophy¶
numpy-datasets started from the need to combine the best functionalities of Theano, Tensorflow (v1) and Lasagne. While we propose various deep learning oriented methods, numpy-datasets shall remain as general as possible in its core, methods should be grouped as much as possible into specialized submodules, and a complete documentation should be provided, preferably along with a working example located in the Gallery.
How to contribute¶
If you are willing to help, we recommend to follow the following steps before requesting a pull request. Recall that
- Coding conventions: we used the PEP8 style guide for Python Code and the black formatting
- Docstrings: we use the numpydoc docstring guide for documenting the functions directly from the docstrings and automatically generating the documentation with sphinx. Please provide codes with up-to-date docstrings.
- Continuous Integration: to ensure that all the numpy-datasets functionalities are tested after each modifition run
pytest
from the main numpy-datasets directory. All tests should pass before considering a change to be successful. If new functionalities are added, it is highly preferable to also add a simple test in thetests/
directory to ensure that results are as expected. A Github action will automatically test the code at eachpush
(see Test the code).
Build/Test the doc¶
To rebuild the documentation, install several packages:
pip install -r docs/requirements.txt
to generate the documentation, you can do in the docs
directory and run:
make html
You can then see the generated documentation in
docs/_build/html/index.html
.
If examples/code-blocks are added to the documension, it has to be tested.
To do so, add the specific module/function in the tests/doc.py
and run:
>>> python tests/doc.py
if all tests pass, then the changes are ready to be put in a PR. Once the documentation has been changed and all tests pass, the change is ready for review and should be put in a PR.
Every time changes are pushed to Github master
branch the numpy-datasets
documentations (at symjax.readthedocs.io) is rebuilt based on
the .readthedocs.yml
and the docs/conf.py
configuration files.
For each automated documentation build you can see the
documentation build logs.
Test the code¶
To run all the numpy-datasets tests, we recommend using pytest
or pytest-xdist
. First, install pytest-xdist
and pytest-benchmark
by running
pip install pytest-xdist pytest-benchmark
.
Then, from the repository root directory run:
pytest
If all tests pass successfully, the code is ready for a PR.