Dask xgboost #50

szilard · 2021-01-21T09:07:03Z

m5.4xlarge 16c (8+8HT)

1M rows

integer encoding for simplicity

szilard · 2021-01-21T09:07:46Z

XGBoost setup:


sudo docker run --rm  -ti -p 8787:8787 continuumio/anaconda3 /bin/bash

pip3 install -U xgboost

ipython

szilard · 2021-01-21T09:09:03Z

plain XGBoost (without Dask):

import pandas as pd
import numpy as np
from sklearn import preprocessing 
from sklearn import metrics

import xgboost as xgb


d_train = pd.read_csv("https://raw.githubusercontent.com/szilard/benchm-ml--data/master/int_enc/train-1m-intenc.csv")
d_test = pd.read_csv("https://raw.githubusercontent.com/szilard/benchm-ml--data/master/int_enc/test-1m-intenc.csv")

X_train = d_train.iloc[:, :-1].to_numpy()
y_train = d_train.iloc[:,-1:].to_numpy()
X_test = d_test.iloc[:, :-1].to_numpy()
y_test = d_test.iloc[:,-1:].to_numpy()


dxgb_train = xgb.DMatrix(X_train, label = y_train)
dxgb_test = xgb.DMatrix(X_test)

param = {'max_depth':10, 'eta':0.1, 'objective':'binary:logistic', 'tree_method':'hist'}             
%time md = xgb.train(param, dxgb_train, num_boost_round = 100)

y_pred = md.predict(dxgb_test)   
print(metrics.roc_auc_score(y_test, y_pred))

results:

Wall time: 3.35 s
0.7527781837199401

szilard · 2021-01-21T09:11:01Z

XGBoost with Dask:

import pandas as pd
from sklearn import metrics

from dask.distributed import Client, LocalCluster
import dask.dataframe as dd

import xgboost as xgb


cluster = LocalCluster(n_workers=16, threads_per_worker=1)
client = Client(cluster)

d_train = pd.read_csv("https://raw.githubusercontent.com/szilard/benchm-ml--data/master/int_enc/train-1m-intenc.csv")
d_test = pd.read_csv("https://raw.githubusercontent.com/szilard/benchm-ml--data/master/int_enc/test-1m-intenc.csv")

dx_train = dd.from_pandas(d_train, npartitions=16)
dx_test = dd.from_pandas(d_test, npartitions=1)

X_train = dx_train.iloc[:, :-1].to_dask_array(lengths=True)
y_train = dx_train.iloc[:,-1:].to_dask_array(lengths=True)
X_test = dx_test.iloc[:, :-1].to_dask_array(lengths=True)
y_test = dx_test.iloc[:,-1:].to_dask_array(lengths=True)

X_train.persist()
y_train.persist()

client.has_what()


dxgb_train = xgb.dask.DaskDMatrix(client, X_train, y_train)
dxgb_test = xgb.dask.DaskDMatrix(client, X_test)


param = {'objective':'binary:logistic', 'tree_method':'hist', 'max_depth':10, 'eta':0.1}             
%time md = xgb.dask.train(client, param, dxgb_train, num_boost_round = 100)


y_pred = xgb.dask.predict(client, md, dxgb_test)
y_pred_loc = y_pred.compute()
y_test_loc = y_test.compute()
print(metrics.roc_auc_score(y_test_loc, y_pred_loc))

Results:

Wall time: 22.2 s
0.7491229723842621

trivialfis · 2021-01-21T09:16:26Z

Thanks for conducting the bench here.

cluster = LocalCluster(n_workers=16, threads_per_worker=1)

Em .. that's not optimal. Splitting up the computation into processes instead of threads will create a lots of overhead for xgboost due to tcp communication. If you are using a single machine, use single worker.

szilard · 2021-01-21T09:17:47Z

Changing number of workers, threads, partitions:

n_workers	n_threads	n_partitions	Time	AUC
16	1	16	22.2	0.749122
1	16	16	4.9	0.752778
1	16	1	4.8	0.752778
1	1	1	21.3	0.752778
4	4	16	11.6	0.752098
no dask	16		3.3	0.752778
no dask	1		19.7	0.752778

szilard · 2021-01-21T09:18:48Z

@trivialfis WIP on that, see above (I'm running different setups right now, will fill out results in a few minutes as I get them).

szilard · 2021-01-21T09:35:05Z

@trivialfis see now updated results above. I guess a more realistic comparison would be Dask on N servers with C cores with 1 worker per server and C threads. I'm not sure if partitions should be N or N*C.

And bigger data (say 10M rows) or course.

szilard · 2021-01-21T12:49:25Z

10M rows:

d_train = pd.read_csv("https://benchm-ml--int-enc.s3-us-west-2.amazonaws.com/train-10m-intenc.csv")
d_test = pd.read_csv("https://benchm-ml--int-enc.s3-us-west-2.amazonaws.com/test-10m-intenc.csv")

n_workers	n_threads	n_partitions	Time (sec)	AUC
16	1	16	46.3	0.757480
1	16	16	33	0.759228
4	4	16	34.9	0.757468
no dask	16		28.8	0.759228

szilard · 2021-01-21T13:05:30Z

m5.16xlarge (4x previous box), 10M rows

n_workers	n_threads	n_partitions	Time (sec)	AUC	total cores
1	16	1	31.5	0.759228	16
2	16	2	23.5	0.759020	32
4	16	4	19.3	0.757872	64
no dask	16		27	0.759228	16
no dask	32		18.2	0.759228	32
no dask	64		15	0.759228	64

[plain XGBoost 16c and workers=1 pinned to cores 0-7,32-39, XGBoost 32c and workers=2 pinned to 0-15,32-47]

szilard mentioned this issue Jan 21, 2021

Dask (xgboost and lightgbm with Dask) -- likely WRONG results #49

Closed

szilard added the analysis label Jan 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dask xgboost #50

Dask xgboost #50

szilard commented Jan 21, 2021

szilard commented Jan 21, 2021

szilard commented Jan 21, 2021

szilard commented Jan 21, 2021

trivialfis commented Jan 21, 2021 •

edited

Loading

szilard commented Jan 21, 2021 •

edited

Loading

szilard commented Jan 21, 2021

szilard commented Jan 21, 2021 •

edited

Loading

szilard commented Jan 21, 2021

szilard commented Jan 21, 2021 •

edited

Loading

Dask xgboost #50

Dask xgboost #50

Comments

szilard commented Jan 21, 2021

szilard commented Jan 21, 2021

szilard commented Jan 21, 2021

szilard commented Jan 21, 2021

trivialfis commented Jan 21, 2021 • edited Loading

szilard commented Jan 21, 2021 • edited Loading

szilard commented Jan 21, 2021

szilard commented Jan 21, 2021 • edited Loading

szilard commented Jan 21, 2021

szilard commented Jan 21, 2021 • edited Loading

trivialfis commented Jan 21, 2021 •

edited

Loading

szilard commented Jan 21, 2021 •

edited

Loading

szilard commented Jan 21, 2021 •

edited

Loading

szilard commented Jan 21, 2021 •

edited

Loading