-
Notifications
You must be signed in to change notification settings - Fork 917
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cudaErrorMemoryAllocation out of memory when using compute() #5988
Comments
I am encountering the exact same issue here, would love to hear a potential solution. System configurationOutput of
Output of
Running Code snippetfrom dask.distributed import Client
from dask_cuda import LocalCUDACluster
cluster = LocalCUDACluster()
client = Client(cluster)
ddf_crypto_market_entries = ... # load from disk
assert type(ddf_crypto_market_entries) == dask_cudf.core.DataFrame
ddf_crypto_self_merged = ddf_crypto_market_entries.merge(ddf_crypto_market_entries, on='cc')
ddf_crypto_self_merged.compute() With output:
|
@Divyanshupy , in your example you are doing a series of chained left outer joins. @kdhageman , it looks like you are doing a self-join, which usually causes significant expansion as well. Without seeing the actual data, it's quite possible that these outer and self joins are expanding your dataframe such that it cannot fit on a single GPU. I would recommend instead you use Please see the second half of this comment as well, which may (hopefully) provide a solution: #5829 (comment) |
It looks like in the above stacktraces the OOM error is happening in the worker computation as opposed to in transferring to the client process where using |
@kkraus14 >>> ddf_crypto_market_entries.compute().shape
(873343, 2) I tried setting the control of the GPU memory spilling by using @beckernick result = ddf_crypto_self_merged.persist() the >>> result.dask
<dask.highlevelgraph.HighLevelGraph at 0x7f2685263550> How do I get concrete results out of this |
Calling Calling Based on your above example, I would remove the Putting all of this together: cluster = LocalCUDACluster()
# cluster = LocalCUDACluster(device_memory_limit="6GB") # Do this second to allow spilling from GPU memory to host memory in the workers
client = Client(cluster)
sh = dask_cudf.read_csv(path_sh, npartitions = 4)
al = dask_cudf.read_csv(path_al, npartitions = 4)
pn = dask_cudf.read_csv(path_pn, npartitions = 4)
uk = dask_cudf.read_csv(path_uk, npartitions = 4)
# Do this third to reduce the size of each piece of work
# sh = dask_cudf.read_csv(path_sh, npartitions = 8)
# al = dask_cudf.read_csv(path_al, npartitions = 8)
# pn = dask_cudf.read_csv(path_pn, npartitions = 8)
# uk = dask_cudf.read_csv(path_uk, npartitions = 8)
sh = sh[['ID', 'Label', 'FramesAnnotated', 'TotalFrames']]
pn = pn[['ID', 'Label', 'FramesAnnotated', 'TotalFrames']]
uk = uk[['ID', 'Label', 'FramesAnnotated', 'TotalFrames']]
al = al[['ID', 'Label', 'FramesAnnotated', 'TotalFrames']]
def merge_mine(left,right,suffixes):
merge = left.merge(right,on='ID',how='outer',suffixes=suffixes)
return merge
m1 = merge_mine(sh,al,suffixes=('_sh','_al'))
m2 = merge_mine(m1,uk,suffixes=('_m1','_uk'))
m3 = merge_mine(m2,pn,suffixes=('_m2','_pn'))
m3.compute().to_csv('/Data')
# m3.to_csv('/Data') # Do this first as this makes all of the work properly run on the workers |
@kkraus14 thanks for the response.
When spinning up a Dask cluster from the terminal instead (using
With this terminal-started Dask cluster, and with DataFrame with the >>> result = ddf_crypto_self_merged.persist()
>>> result.dask.values()
[<Future: pending, key: ('drop_by_shallow_copy-5eefecbbf971d04e1179e03ddd745955', 0)>,
<Future: pending, key: ('drop_by_shallow_copy-5eefecbbf971d04e1179e03ddd745955', 1)>,
<Future: pending, key: ('drop_by_shallow_copy-5eefecbbf971d04e1179e03ddd745955', 2)>,
<Future: pending, key: ('drop_by_shallow_copy-5eefecbbf971d04e1179e03ddd745955', 3)>,
<Future: pending, key: ('drop_by_shallow_copy-5eefecbbf971d04e1179e03ddd745955', 4)>,
<Future: pending, key: ('drop_by_shallow_copy-5eefecbbf971d04e1179e03ddd745955', 5)>,
<Future: pending, key: ('drop_by_shallow_copy-5eefecbbf971d04e1179e03ddd745955', 6)>,
<Future: pending, key: ('drop_by_shallow_copy-5eefecbbf971d04e1179e03ddd745955', 7)>,
<Future: pending, key: ('drop_by_shallow_copy-5eefecbbf971d04e1179e03ddd745955', 8)>,
<Future: pending, key: ('drop_by_shallow_copy-5eefecbbf971d04e1179e03ddd745955', 9)>,
<Future: pending, key: ('drop_by_shallow_copy-5eefecbbf971d04e1179e03ddd745955', 10)>,
<Future: pending, key: ('drop_by_shallow_copy-5eefecbbf971d04e1179e03ddd745955', 11)>,
<Future: finished, type: cudf.DataFrame, key: ('drop_by_shallow_copy-5eefecbbf971d04e1179e03ddd745955', 12)>,
<Future: finished, type: cudf.DataFrame, key: ('drop_by_shallow_copy-5eefecbbf971d04e1179e03ddd745955', 13)>,
<Future: finished, type: cudf.DataFrame, key: ('drop_by_shallow_copy-5eefecbbf971d04e1179e03ddd745955', 14)>,
...
] |
As follow-up to my previous comment, running
Any explanation for/insight into this? |
I'm seeing the exact same issue in The following code: # notebook cell 1
from dask_cuda import LocalCUDACluster
cluster = LocalCUDACluster(CUDA_VISIBLE_DEVICES=[0, 1, 2, 3])
from dask.distributed import Client, wait
client = Client(cluster)
# notebook cell 2
import dask_cudf
ddf = dask_cudf.read_parquet(f'/mnt/data/2019-taxi-dataset/')
ddf = ddf.repartition(npartitions=120) # optional
ddf = ddf.persist()
wait(ddf) Completely saturates just one of my four GPUs:
Attempting to
I am seeing the same behavior that @kdhageman is seeing in terms of To me it really looks like |
There is a bug in |
With 3000 partitions, you've likely broken down the problem into such small pieces that there isn't enough work in each piece to saturate a GPU. I would try increasing the number of partitions, as well as moving where you set the partitioning. I.E. instead of increasing the partitioning in the |
This has been answered and the Numba issues are resolved so I'm closing this. Feel free to open a new issue if there's further issues. |
What is your question?
Hello, I am trying to merge 3 very large dataframes using multiple gpus. I am able to merge them together but when I try to save the resulting dataframe as csv or pandas dataframe using the compute() method. It gives out of memory error, even though only 400 mb of memory is used on each gpu. I have 4 2080Ti Max Q gpus,each with 12 gb memory. I have observed that this error is observed whenever I use the compute() function.
Code Snippets:
The text was updated successfully, but these errors were encountered: