-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dask xgboost #50
Comments
XGBoost setup:
|
plain XGBoost (without Dask):
results:
|
XGBoost with Dask:
Results:
|
Thanks for conducting the bench here.
Em .. that's not optimal. Splitting up the computation into processes instead of threads will create a lots of overhead for xgboost due to tcp communication. If you are using a single machine, use single worker. |
Changing number of workers, threads, partitions:
|
@trivialfis WIP on that, see above (I'm running different setups right now, will fill out results in a few minutes as I get them). |
@trivialfis see now updated results above. I guess a more realistic comparison would be Dask on N servers with C cores with 1 worker per server and C threads. I'm not sure if partitions should be N or N*C. And bigger data (say 10M rows) or course. |
10M rows:
|
m5.16xlarge (4x previous box), 10M rows
[plain XGBoost 16c and workers=1 pinned to cores 0-7,32-39, XGBoost 32c and workers=2 pinned to 0-15,32-47] |
m5.4xlarge 16c (8+8HT)
1M rows
integer encoding for simplicity
The text was updated successfully, but these errors were encountered: