Skip to content

Latest commit

 

History

History
325 lines (223 loc) · 17.1 KB

DEVELOPING.md

File metadata and controls

325 lines (223 loc) · 17.1 KB

Developing Shogun

This is a very basic list of things on how to get started hacking Shogun. Your first steps should be to

  1. Compile from source, see INSTALL.md.
  2. Run the API examples, see INTERFACES.md, or create your own, see EXAMPLES.md
  3. Run the tests.

As we would like to avoid spending a lot of our time explaining the same basic things many times, please excessively use the internet for any questions on the commands and tools needed. If you feel that this readme is missing something, please send a patch! :)

Quicklinks

Shogun git development cycle

We use the git flow workflow. The steps are

  1. Read the guide.

  2. Register on GitHub.

  3. Fork the shogun repository.

  4. Clone your fork, add the original shogun develop repository as a remote, and check out locally

     git clone https://github.com/YOUR_USERNAME/shogun
     cd shogun
     git remote add upstream https://github.com/shogun-toolbox/shogun
     git branch develop
     git checkout develop
     git pull --rebase upstream develop
    

    The steps until here only need to be executed once, with the exception being the last command: rebasing against the development branch. You will need to rebase every time the develop branch is updated.

  5. Create a feature branch (from develop)

     git branch feature/BRANCH_NAME
    
  6. Your code here: Fix a bug or add a feature. If you add something or fix something, mention it in the NEWS file.

  7. Make sure (!) that locally, your code compiles, it is tested, it complies with the code style described on the wiki.

     make && make test
    

    If something does not work, try to find out whether your change caused it, and why. Read error messages and use the internet to find solutions. Compile errors are the easiest to fix! If all that does not help, ask us.

  8. Commit locally, using neat and informative commit messages, grouping commits, potentially iterate over more changes to the code,

     git commit FILENAME(S) -m "Fix issue #1234"
     git commit FILENAME(S) -m "Add feature XYZ"
    

    The amend option is your friend if you are updating single commits (so that they appear as one)

     git commit --amend FILENAME(S)
    

    If you want to group say the last three commits as one, squash them, for example

     git reset --soft HEAD~3
     git commit -m 'Clear commit message'
    
  9. Check if the code was written in conformity with the Shogun Code Style Guidelines, or the Continuous Integration will fail. Shogun has a custom script called check_format.sh which can be used to verify the code formatting.

     ./scripts/check_format.sh "feature/BRANCH_NAME" "develop"
    

    The script will provide you with the necessary information to fix potential style errors. All of this is done by using clang-format. Make sure to have it installed on your local machine, or the above script won't work. Update the commit once you have fixed the errors.

  10. Rebase against shogun's develop branch. This might cause rebase errors, which you need to solve

     git pull --rebase upstream develop
    
  11. Push your commits to your fork

    git push origin feature/BRANCH_NAME
    

    If you squashed or amended commits after you had pushed already, you might be required to force push via using the git push -f option with care.

  12. Send a pull request (PR) via GitHub. As described above, you can always update a pull request using the git push -f option. Please do not close and send new ones instead, always update.

  13. Once the PR is merged, keep an eye on the buildfarm to see whether your patch broke something.

Requirements for merging your PR

  • Read some tips on how to write good pull requests. Make sure you don't waste your (and our) time by not respecting these basic rules.
  • All tests pass (your pull request causes automatic checks). We will not look at the patch otherwise.
  • The PR is small in terms of lines changes.
  • The PR is clean and addresses one issue.
  • The number of commits is minimal (i.e. one), the message is neat and clear.
  • For C++ code: it is covered by tests, it doesn't leak memory, its API is documented, code style.
  • For API examples: it has a clear scope, it is minimal, it looks polished, it has a passing test
  • For docs: clear, correct English language, spell-checked
  • For notebooks: the cell output is removed, the template is respected, the plots have axis labels.
  • Formatting notebooks/docs: Please every sentence in a single line.

Testing

Three types of tests can be executed locally: C++ unit tests, running the API examples, and integration testing the results of the API examples. To activate them locally, enable the -DENABLE_TESTING=ON cmake switch before running cmake. Which tests are activated depends on your configuration. Adding a test in most cases requires to re-run cmake. All activated tests can be executed with

make && make test

The first make is necessary as some tests need to be generated and/or compiled first.

Sometimes, it is useful to run a single test, which can be done via ctest, for example

ctest -R unit-LibSVR
ctest -R generated_cpp-binary_classifier-kernel_svm
ctest -R integration_meta_cpp-binary_classifier-kernel_svm -V

If a test name (or even the make test target) does not exist, this means that your configuration did not include it.

If you are interested in details on how the test is executed (command, variables, directory), add the -V option. Further details can be extracted from the CMakeLists.txt configuration files in the tests folder.

C++ Unit tests

These are based on the googletest framework and are located in tests/unit/. You can compile them with

make shogun-unit-test

You can execute single tests via ctest, or via directly executing the unit test binary and passing it a filter, which gives a more grained control over which sub-tests are executed

./bin/shogun-unit-test --gtest_filter=GaussianProcessRegression.apply_*

Note that wildcards are allowed. Running single sub-tests is sometimes useful (i.e. for bug hunting)

./bin/shogun-unit-test --gtest_filter=GaussianProcessRegression.apply_regression

Debugging and Memory leaks

All your C++ code and unit tests must be checked to not leak memory! You want to use a memory checker such as valgrind (or a debugger such as GDB). If you do that, you might want to compile with debugging symbols and without compiler optimizations, by using -DCMAKE_BUILD_TYPE=Debug

Then

valgrind ./bin/shogun-unit-test --gtest_filter=GaussianProcessRegression.apply_regression
gdb --args ./bin/shogun-unit-test --gtest_filter=GaussianProcessRegression.apply_regression

The option --leak-check=full for valgrind might be useful. In addition to manually running valgrind on your tests, you can use ctest to check multiple tests. This requires to be enabled in dashboard reports with -DBUILD_DASHBOARD_REPORTS=ON. For example

ctest -D ExperimentalMemCheck -R unit-GaussianProcessRegression

Adding tests

We aim to write clear, minimal, yet exhaustive tests of basic building blocks in Shogun. Whenever you send us a C++ code, we will ask you for a unit test for it. We do test numerical results as compared to reference implementations (e.g. in Python), as well as corner cases, consistency etc. Read on test-driven development (TDD), and search the web for tips on unit tests, e.g. googletest's tips.

Take inspiration from existing tests when writing new ones. Please structure them well.

API example tests

Make sure to read INTERFACES.md and EXAMPLES.md to understand how API examples are generated, you will need the cmake switch -DBUILD_META_EXAMPLES=ON. Every API example is used for two tests: simple execution and continuous integration testing of results. These two tests are executed for every enabled interface language.

Note that code for all interface examples needs to be generated as part of make, or using

make meta_examples

This needs to be done every time you add or modify an example. Examples for compiled interface languages (e.g. C++, Java) need to be compiled, either as part of make, or via more specific targets, e.g.

make build_cpp_meta_examples
make build_java_meta_examples

Check the CMakeLists.txt in examples/meta/* for all such make targets.

Simple execution.

These tests are to make sure the code is executable and to generate results for integration testing. These can be executed with ctest as described above, e.g.

ctest -R generated*
ctest -R generated_cpp-binary_classifier-kernel_svm -V

You can also execute the examples manually as described in INTERFACES.md. Note that the data git submodule is required to run the examples, see INSTALL.md.

Check the CMakeLists.txt in examples/meta/* for further details.

Adding tests

As every example is turned into a test when running cmake, all you need to do is to add an example as described in EXAMPLES.md.

Integration testing of results

You will note that each example produces an output file with the *.dat extension. This is a serialized version of all numerical results of the example. The purpose is to make sure all interface versions (say C++ and Python) of an example produce the same output, and that this output does not change over time.

The reference results are stored in the data git submodule, more precisely in data/testsuite/meta/*. There is a symbolic link for both generated and reference results in the build/tests/meta/ folder. Naturally, these tests depend on executing the corresponding example first. Therefore, running a test does not run the example again, but it simply compares the output to the reference file.

Again ctest can be used,

ctest -R integration_meta_*
ctest -R integration_meta_cpp-binary_classifier-kernel_svm
ctest -R integration_meta_python-binary_classifier-kernel_svm

See the CMakeLists.txt in tests/meta for details on the mechanics.

Adding tests

CMake automatically creates a test for every reference result file that it finds. Therefore, if you want to add a new test, for example after having added an example as described in EXAMPLES.md, then you need to copy its generated output to the reference file folder, e.g.

cp build/tests/meta/generated_results/cpp/regression/kernel_ridge_regression.dat data/testsuite/meta/regression/

Note we usually use the output of the C++ example as reference.

Once that is done, it would be good if you sent us a patch with the new test. This is done via first sending a PR against the shogun-data, just like the standard development cycle, after doing (in the data directory)

git commit testsuite/meta/regression/kernel_ridge_regression.dat -m "Integration testing data for kernel ridge regression"
git push origin

After this PR is merged, you need to send a second PR against the main repository, after committing the updated version hash of the submodule (in the main shogun directory)

git commit data -m "Updated to including kernel ridge regression test data"
git push origin

If everything worked, then the travis build in the second PR will include your test in all interface languages. Please check the logs!

Benchmarking

In Shogun we use the google-benchmark library to test the performance of the modules in the library. To compile and run the benchmarks set the BUILD_BENCHMARKS cmake flags to true, i.e. run the cmake command with -DBUILD_BENCHMARKS=ON flag.

To run all the benchmarks simply run:

ctest -L benchmark

Adding benchmarks

We aim to provide an easy way to benchmark modules in Shogun. Hence, whenever you send us new C++ implementation, please consider writing benchmarks for it.

To add a benchmark to a class, simply create a file with _benchmark.cc suffix, next to the implementation. For example, for the RandomFourierDotFeatures class the benchmarks are available in the RandomFourierDotFeatures_benchmark.cc file. On top of this, you need to explicitly specify in CMakeLists.txt that you have added a new benchmarking file. The ADD_SHOGUN_BENCHMARK cmake helper function should be used in src/shogun/CMakeLists.txt to add a benchmark implementation. For example, to add a benchmark with the path src/classifier/YOLOClassifier_benchmark.cc, use the ADD_SHOGUN_BENCHMARK cmake function in the following way in src/shogun/CMakeLists.txt file within the benchmarking section:

ADD_SHOGUN_BENCHMARK(classifier/YOLOClassifier_benchmark)

Adding benchmarks

We aim to write clear, minimal, yet exhaustive tests of basic building blocks in Shogun. Whenever you send us a C++ code, we will ask you for a unit test for it. We do test numerical results as compared to reference implementations (e.g. in Python), as well as corner cases, consistency, etc. Read on test-driven development, and search the web for tips on unit tests, e.g. googletest's tips.

Build farm

We run two types of buildfarms that are automatically triggered

  1. Travis, and Azure Pipeline executed in a third-party cloud when opening a PR
  2. Buildbot, executed in our own cloud after every merged PR or commit

Also, we have a few hooks on PRs that are executed along with travis, such as a preview of API examples. You will see a list of checks in your PR.

Travis

This is to do basic sanity checks on every PR. All interfaces have a different build, see .travis.yml in the repository. The Docker image that runs the travis tests is based on configs/shogun/Dockerfile and can be found here.

If you obey the dev cycle, in particular, if you run tests before sending a PR, travis should never fail.

If travis fails

  1. Read the logs, find the error message
  2. Try to identify the problem
  3. Find out whether you caused it
  4. If so, reproduce locally
  5. Fix it and update your PR

Buildbot

This service builds and tests Shogun in a large number of different configurations, OS, interfaces, etc. It ensures Shogun is portable, the build is backward compatible. It analysis Shogun's memory usage and performs static code analysis. It often catches very subtle errors

After one of your PR is merged, check the status of the buildbot for a while. The waterfall view is most useful. Again, check the logs if there are problems.

CMake options for developers

See also INSTALL.md. Options for developers (debugging symbols on, optimization off, etc.):

cmake -DCMAKE_BUILD_TYPE=Debug -DENABLE_TESTING=ON -DBUILD_DASHBOARD_REPORTS=ON ..

Options for building the final binaries (debugging off, optimization on):

cmake -DCMAKE_BUILD_TYPE=Release ..

API documentation

Shogun uses doxygen for its API documentation. Every bit of C++ code that is added to Shogun needs doxygen compatible source-code comments.

  • Every class needs a description of what it implements. If possible, use LaTeX for math.
  • Every method needs a description, plus all parameters and return values documented.

Check existing code for inspiration. Documentation is important, so polish as good as you can!

If you have doxygen installed, you can generate the documentation locally via running

make doxygen

and then opening build/doc/doxygen/html/index.html with the browser.

With the DOXYGEN_HTML_OUTPUT cmake flag one can turn off HTML generation of the API documentation (-DDOXYGEN_GENERATE_HTML=NO).