Starter code: Kaggle Toxic Comment Classification Challenge
Here, at Neptune we enjoy participating in the Kaggle competitions. Toxic Comment Classification Challenge is especially interesting because it touches important issue of online harassment.
You need to be registered to neptune.ml to be able to use our predictions for your ensemble models.
- click
start notebook
- choose
browse
button - select the
neptune_ensembling.ipynb
file from this repository. - choose worker type:
gcp-large
takes over an hourgcp-gpu-medium
less 20min - run first few cells to load our predictions on the held out validation set along with the labels
- train your second level, ensemble model
- load our predictions on the test set
- feed our test set predictions to your ensemble model and get final predictions
- save your submission file
- click on browse files and find your submission file to download it.
Running the notebook as is got 0.9849 on the LB.
We are contributing starter code that is easy to use and extend. We did it before with Cdiscount’s Image Classification Challenge and we believe that it is correct way to open data science to the wider community and encourage more people to participate in Challenges. This starter is ready-to-use end-to-end solution. Since all computations are organized in separate steps, it is also easy to extend. Check devbook.ipynb for more information about different pipelines.
Now we want to go one step further and invite you to participate in the development of this analysis pipeline. At the later stage of the competition (early February) we will invite top contributors to join our team on Kaggle.
You are welcome to extend this pipeline and contribute your own models or procedures. Please refer to the CONTRIBUTING for more details.
on the neptune site
- register to receive $5 in GPU and storage time (contact us directly, if you want to receive more credits for training)
- log in:
neptune login
- create new project named
toxic
: Follow the linkProjects
(top bar, left side), then clickNew project
button. This action will generate project-keyTOX
, which is already listed in theneptune.yaml
.
run setup commands
$ git clone https://github.com/neptune-ml/kaggle-toxic-starter.git
$ pip3 install neptune-cli
$ neptune login
start experiment
$ neptune send --environment keras-2.0-gpu-py3 --worker gcp-gpu-medium -- train_evaluate_predict_pipeline --pipeline_name glove_lstm
Happy Training :)
Refer to Neptune documentation and Getting started: Neptune Cloud for more.
Please refer to the Getting started: local instance for installation procedure.
Below end-to-end pipeline is visualized. You can run exactly this one!
We have also prepared something simpler to just get you started:
There are several ways to seek help:
- Read project's Wiki, where we publish descriptions about the code, pipelines and neptune.
- Kaggle discussion is our primary way of communication.
- You can submit an issue directly in this repo.