-
-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ml sklearn structure (integration to the AB) #385
Open
tzamalisp
wants to merge
66
commits into
metabrainz:master
Choose a base branch
from
tzamalisp:ml_sklearn_structure
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
PEP 8 issues fix 02 remove gaia best models params issue fixed - tracks in gt file do not exist in low-level predict with MBID only (not the whole API low-level url)) add sklearn requirements.txt create gaia evaluation method change name of the model directory load updated files change the path saves - reduce print/logging messages in evaluation requirements.txt --> lowercase remove requirements.txt add requirements.txt with lowercase use of now.isoformat() reports creation - datetime to start of the report, code improvements review updates 01 syntax for documentation strings as in AB fixed ground_truth file typo fixed use of os.makedirs(full_path, exist_ok=True) create results_dict = {....} in a single call simplify processing step dict creation - documentation added add lower-case in all process steps params save best model after training to the whole data - add README split evaluation in separate methods split grid classification into separate methods single logger setup change logging set from int to str dynamic project yaml saving - update readme predict section add new arguments in readme readme - add predict invoking readme MBID in prediction create project - default values declaration - documentation relative imports update predict readme and script arguments import classification_project readme path file required readme, how ti works session - remove requirements jupyter notebook requirements.txt - add requests, remove tensorflow readme - add how training and predicting modes work add new dockerfile for ML tool readme - update doc only for parameters add exports_directory in config when specified in arguments sklearn model inserted in AB
Hello @tzamalisp! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
Comment last updated at 2021-07-08 11:27:54 UTC |
tzamalisp
force-pushed
the
ml_sklearn_structure
branch
from
August 24, 2020 12:20
36eb75e
to
7b81889
Compare
Also move the selection of the tool to the advanced svm settings
This reverts commit 89d3276.
The export_path parameter should contain the complete path where the results should be stored. Everywhere we use exports_dir, we append it to the export_path. If its a matter of user convenience, we can add it back of the command line but merge it with export_path asap.
Making a class to just call a util method does not seem useful. Also, the directories are always created by the ClassifcationTaskManager and DatasetExplorer so we can just refer to their path with os.path.join elsewher. We already have a similar function in AB utils but that is used on Python 2.7 which does not have exist_ok arg available. Hence, keeping this.
Revert the change from previous commit for logging directory because we setup logging before trying to create all other output directories.
Instead of setting up logger again and again, we pass the logger around manually. In subsequent steps, this will be entirely replaced to utilise python's hierarchical logging. Can't do this just now because the file handler requires the dataset name.
This reverts commit 581b713.
Eliminate unused methods and move useful ones to individual methods.
We never really access these dicts other than iterating by calling `.items()` so the defaultdict are redundant and only add confusion.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New Machine Learning Infrastructure - Tool Integration (Google Summer of Code 2020)
In this Pull Request (PR), the main goal is the integration of the Machine Learning tool that is built, into the AcousticBrainz platform. Several aspects have been completed or have to be completed in order for the tool to be fully integrated into the AcousticBrainz project. This PR follows the previous PR, "New Machine Learning Infrastructure - Tool Development and Code Review by the Mentor" (closed), which is related to the phase of the GSoC of the tool development and the code review by the Mentor.
The implemented Machine Learning tool and the corresponding branch where the integration part takes place can be found at the forked repository of the AcousticBrainz platform of the link below. This link includes also a detailed description of how the tool works:
Till now, the tool has been developed successfully, and several models have been trained offline based on some datasets that were provided by the Mentor. The training process works successfully with the performance results (e.g. the accuracy of the model) being almost the same with the old machine learning infrastructure which is provided by the gaia tool.
In the integration part, a new docker image, the
server_dataset_evaluator_sklearn
is created in order for the new ML infrastructure can run in Python 3 with the corresponding libraries dependencies. The previous tool runs in Python 2. Several other compatibility issues (as the commits depict) are solved so there will be no conflicts between the simultaneous execution of Python 2 and Python 3, based on the tool that the AcoustiBrainz platform uses. Additionally, the user interface with the relevant back-end services is updated. Now, the user has the ability to choose which ML tool will execute the music classification task and train the model (gaia or sklearn).The next steps that work have to be done are:
predictions
functionality.Finally, during the GSoC project period, some other PRs for code changes were created and implemented that correspond to: