-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tabnet (pytorch-tabnet) #52
Comments
Setup:
Dockerfile:
|
Starter code (adapted from https://github.com/dreamquark-ai/tabnet/blob/develop/census_example.ipynb ):
To change training data size:
|
Compare to XGBoost sample code:
|
Results: p3.2xlarge (8 CPU cores, 60GB RAM, Tesla V100-SXM2 GPU, 16GB GPU memory) data 0.1m rows:
|
XGBoost: CPU:
GPU:
|
More info on the data/tabnet:
|
Results on 1M rows: Very slow, stopped after 1 hour:
|
XGBoost 1M rows: CPU: (8 cores only)
GPU:
|
Trying out other values for hyper parameters for 0.1 M rows data: Default parameter values:
very similar to results above:
|
On CPU (rather than GPU) (8 cores only though):
|
m5.4xlarge CPU only (no GPU) (16 CPU cores):
|
With normalizing the numeric variables:
epoch 29 | loss: 0.60077 | train_auc: 0.73926 | valid_auc: 0.70858 | 0:03:35s Early stopping occurred at epoch 33 with best_epoch = 13 and best_valid_auc = 0.70874 In [32]: print(metrics.roc_auc_score(y_test, y_pred))
|
Hello @szilard, Nice work on the benchmark! Here is a list of thoughts/questions I have by reading all your comment:
EDIT: make sure to set
I guess I'm out of ideas for now, there is no guarantee to beat XGBoost at all but it would be nice to end with a closer score and a closer training time I definitely agree with you :) Cheers! |
Wow, thanks @Optimox for comments and suggestions with great details, I really appreciate it. I'll try out tweaking the params as per your many suggestions above and I'll post here the results as I'm getting them. Yeah, surely, which algo is the best will depend on the dataset and on this one tabnet might not be able to beat xgboost, but it's already not far. I'll try to see especially if any of the suggestions you made could make it faster. Thanks again for detailed feedback. |
m5.2xlarge CPU only (no GPU) (8 CPU cores):
Reduce number of epochs as per @Optimox 's suggestions:
(
AUC is still ~0.71, but 4x faster |
@szilard did you change the scheduler when reducing number of epochs? With low number of epochs you can also set the patience == MAX_EPOCH which will insure that you will complete all the epochs and select best one on validation set in order to make inference, or you can set patience = 0 so that you won't even early stop and keep the latest version (I would recommend first option) |
Not yet, I did not change anything yet, just I actually want to try the R package before doing all this tweaking. I like posting a lot of info in GitHub issues (and often), sorry for too many notifications. |
New baseline: no early stopping m5.2xlarge CPU only (no GPU) (8 CPU cores)
|
tabnet in R: Setup:
For some reason, it seems the R lib is 15x slower than the python implementation (though both call in theory into the same C++ code). Some of the default parameters might be different (TBD), but still. |
Removing evaluation speeds up things:
Do not try to balance the data (
|
New baseline (params taking their default values commented out):
|
New ec2 instance (still m5.2xlarge):
Batch size:
16K too large, degrades AUC. 4K speeds up a bit, degrades only a little. We'll keep 1024. |
OneCycleLearningRate as suggested above by @Optimox :
Very similar runtime and AUC to previous. |
Without LR scheduler: Remove:
Code:
|
|
New simplified baseline:
|
Faster and better AUC. Also
|
|
New baseline:
|
Not enough AUC, keep |
So after tweaking based on @Optimox 's suggestions, here we go:
|
Nice work @szilard! May I request one last experiment?
|
Sure, will do. |
On CPU (same m5.2xlarge):
So it's about the same like after 10 epochs (but much better than after 5 epochs, that's why I left 10). Will run it on GPU as well. |
I shouldn't do that and it's evil, but here it is if you put the test set in the eval (to see the AUC after each epoch):
It seems it reaches close to the best AUC (~0.72) after 8 epochs, so looking at 10 epochs is sensible. Note: The test set has slightly different distribution than the train set (time gapped split from the original data). |
On 1M rows dataset:
It reaches AUC~0.735 after 3-4 epochs (2-3 mins). |
Compare that with standard feed-forward neural net:
TODO: Try out various setups such as here: https://github.com/szilard/benchm-ml#deep-neural-networks |
So max AUC ~0.73, more exactly 0.734 with some complex rate annealing and momentum trickery, but almost as good 0.732 with just a 2 hidden layers ( So slightly lower AUC than with tabnet (0.735). |
With XGBoost:
or:
|
m5.2xlarge (8 cores) 1M rows
|
If you don't do this, you'll run into the following error riiiight at the end of your training run:
P.S. Don't ask me how I know this. |
https://github.com/dreamquark-ai/tabnet
The text was updated successfully, but these errors were encountered: