-
Notifications
You must be signed in to change notification settings - Fork 109
Project Ideas Improve Debian package license detection
The goal of this project is to improve Debian/Ubuntu and derivatives package license detection across the board. scancode-toolkit's Debian package detection is OK yet there are several cases where license information is not properly gathered from Debian copyright files. Usually this is because the file is not structured or when we do not detect correctly some license notice.
This project would be a mix of adding new license detection rules to scancode, adding new and improved code to handle the specific patterns of license, creating new license mappings and possibly working with upstream maintainers to improve their license declarations. The approach should be to start with a complete data set of all Debian copyright files and to find patterns of license issues and establish the baseline, possibly with classifiers and ML. The end results should be a significant improvement to the license detection quality for the Debian packages. As a bonus, this would be exposed upstream in the Debian PTS.
This is a reasonably complex project. See also for extra details https://github.com/nexB/scancode.io/issues/103
-
Level
- Advanced
-
Tech
- Python
- URLS
-
Mentors
- @pombredanne https://github.com/pombredanne
- @majurg https://github.com/majurg