Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dask xgboost #50

Open
szilard opened this issue Jan 21, 2021 · 9 comments
Open

Dask xgboost #50

szilard opened this issue Jan 21, 2021 · 9 comments
Labels

Comments

@szilard
Copy link
Owner

szilard commented Jan 21, 2021

m5.4xlarge 16c (8+8HT)

1M rows

integer encoding for simplicity

@szilard
Copy link
Owner Author

szilard commented Jan 21, 2021

XGBoost setup:


sudo docker run --rm  -ti -p 8787:8787 continuumio/anaconda3 /bin/bash

pip3 install -U xgboost

ipython

@szilard
Copy link
Owner Author

szilard commented Jan 21, 2021

plain XGBoost (without Dask):

import pandas as pd
import numpy as np
from sklearn import preprocessing 
from sklearn import metrics

import xgboost as xgb


d_train = pd.read_csv("https://raw.githubusercontent.com/szilard/benchm-ml--data/master/int_enc/train-1m-intenc.csv")
d_test = pd.read_csv("https://raw.githubusercontent.com/szilard/benchm-ml--data/master/int_enc/test-1m-intenc.csv")

X_train = d_train.iloc[:, :-1].to_numpy()
y_train = d_train.iloc[:,-1:].to_numpy()
X_test = d_test.iloc[:, :-1].to_numpy()
y_test = d_test.iloc[:,-1:].to_numpy()


dxgb_train = xgb.DMatrix(X_train, label = y_train)
dxgb_test = xgb.DMatrix(X_test)

param = {'max_depth':10, 'eta':0.1, 'objective':'binary:logistic', 'tree_method':'hist'}             
%time md = xgb.train(param, dxgb_train, num_boost_round = 100)

y_pred = md.predict(dxgb_test)   
print(metrics.roc_auc_score(y_test, y_pred))

results:

Wall time: 3.35 s
0.7527781837199401

@szilard
Copy link
Owner Author

szilard commented Jan 21, 2021

XGBoost with Dask:

import pandas as pd
from sklearn import metrics

from dask.distributed import Client, LocalCluster
import dask.dataframe as dd

import xgboost as xgb


cluster = LocalCluster(n_workers=16, threads_per_worker=1)
client = Client(cluster)

d_train = pd.read_csv("https://raw.githubusercontent.com/szilard/benchm-ml--data/master/int_enc/train-1m-intenc.csv")
d_test = pd.read_csv("https://raw.githubusercontent.com/szilard/benchm-ml--data/master/int_enc/test-1m-intenc.csv")

dx_train = dd.from_pandas(d_train, npartitions=16)
dx_test = dd.from_pandas(d_test, npartitions=1)

X_train = dx_train.iloc[:, :-1].to_dask_array(lengths=True)
y_train = dx_train.iloc[:,-1:].to_dask_array(lengths=True)
X_test = dx_test.iloc[:, :-1].to_dask_array(lengths=True)
y_test = dx_test.iloc[:,-1:].to_dask_array(lengths=True)

X_train.persist()
y_train.persist()

client.has_what()


dxgb_train = xgb.dask.DaskDMatrix(client, X_train, y_train)
dxgb_test = xgb.dask.DaskDMatrix(client, X_test)


param = {'objective':'binary:logistic', 'tree_method':'hist', 'max_depth':10, 'eta':0.1}             
%time md = xgb.dask.train(client, param, dxgb_train, num_boost_round = 100)


y_pred = xgb.dask.predict(client, md, dxgb_test)
y_pred_loc = y_pred.compute()
y_test_loc = y_test.compute()
print(metrics.roc_auc_score(y_test_loc, y_pred_loc))

Results:

Wall time: 22.2 s
0.7491229723842621

@trivialfis
Copy link

trivialfis commented Jan 21, 2021

Thanks for conducting the bench here.

cluster = LocalCluster(n_workers=16, threads_per_worker=1)

Em .. that's not optimal. Splitting up the computation into processes instead of threads will create a lots of overhead for xgboost due to tcp communication. If you are using a single machine, use single worker.

@szilard
Copy link
Owner Author

szilard commented Jan 21, 2021

Changing number of workers, threads, partitions:

n_workers n_threads n_partitions Time AUC
16 1 16 22.2 0.749122
1 16 16 4.9 0.752778
1 16 1 4.8 0.752778
1 1 1 21.3 0.752778
4 4 16 11.6 0.752098
no dask 16 3.3 0.752778
no dask 1 19.7 0.752778

@szilard
Copy link
Owner Author

szilard commented Jan 21, 2021

@trivialfis WIP on that, see above (I'm running different setups right now, will fill out results in a few minutes as I get them).

@szilard
Copy link
Owner Author

szilard commented Jan 21, 2021

@trivialfis see now updated results above. I guess a more realistic comparison would be Dask on N servers with C cores with 1 worker per server and C threads. I'm not sure if partitions should be N or N*C.

And bigger data (say 10M rows) or course.

@szilard
Copy link
Owner Author

szilard commented Jan 21, 2021

10M rows:

d_train = pd.read_csv("https://benchm-ml--int-enc.s3-us-west-2.amazonaws.com/train-10m-intenc.csv")
d_test = pd.read_csv("https://benchm-ml--int-enc.s3-us-west-2.amazonaws.com/test-10m-intenc.csv")
n_workers n_threads n_partitions Time (sec) AUC
16 1 16 46.3 0.757480
1 16 16 33 0.759228
4 4 16 34.9 0.757468
no dask 16 28.8 0.759228

@szilard
Copy link
Owner Author

szilard commented Jan 21, 2021

m5.16xlarge (4x previous box), 10M rows

n_workers n_threads n_partitions Time (sec) AUC total cores
1 16 1 31.5 0.759228 16
2 16 2 23.5 0.759020 32
4 16 4 19.3 0.757872 64
no dask 16 27 0.759228 16
no dask 32 18.2 0.759228 32
no dask 64 15 0.759228 64

[plain XGBoost 16c and workers=1 pinned to cores 0-7,32-39, XGBoost 32c and workers=2 pinned to 0-15,32-47]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants