Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a barrier before cugraph Graph creation #4046

Merged

Conversation

VibhuJawa
Copy link
Member

@VibhuJawa VibhuJawa commented Dec 5, 2023

This PR introduces a short term fix for #4037 .

CC: @jnke2016 , @rlratzel

@VibhuJawa VibhuJawa requested a review from a team as a code owner December 5, 2023 21:39
Copy link

copy-pr-bot bot commented Dec 5, 2023

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions bot added the python label Dec 5, 2023
@VibhuJawa
Copy link
Member Author

VibhuJawa commented Dec 5, 2023

Memory footprint measurement remains the same.

PR cugraph_sync_graph_creation = 2.95x
for directed graphs.

Dask client/cluster created using LocalCUDACluster
...
Number of input edges = 134,217,728, directed = True, renumber = True
Graph creation time = 3.848442316055298 s
function:  construct_graph
function args: (<dask_cudf.DataFrame | 4 tasks | 1 npartitions>,) kwargs: {'directed': True, 'renumber': True}
execution_time: 3.8484749794006348
allocation_counts:
{   'tcp://127.0.0.1:37301': {   'current_bytes': '1.0GB',
                                 'peak_bytes': '5.9GB',
                                 'total_bytes': '43.8GB'}}
Edge List Memory = 2.0GB
Peak Memory across workers = 5.9GB
Max Peak to output graph ratio across workers = 5.64
Max Peak to avg input graph ratio across workers = 2.95
Number of edges in final graph = 131,163,339
--------------------------------------------------------------------------------
Number of input edges = 134,217,728, directed = True, renumber = False
Graph creation time = 3.710846424102783 s
function:  construct_graph
function args: (<dask_cudf.DataFrame | 4 tasks | 1 npartitions>,) kwargs: {'directed': True, 'renumber': False}
execution_time: 3.710878849029541
allocation_counts:
{   'tcp://127.0.0.1:41629': {   'current_bytes': '1.0GB',
                                 'peak_bytes': '5.9GB',
                                 'total_bytes': '43.8GB'}}
Edge List Memory = 2.0GB
Peak Memory across workers = 5.9GB
Max Peak to output graph ratio across workers = 5.64
Max Peak to avg input graph ratio across workers = 2.95
Number of edges in final graph = 131,163,339
--------------------------------------------------------------------------------
----------------------------------------renumber completed----------------------------------------
Number of input edges = 134,217,728, directed = False, renumber = True
Graph creation time = 4.768225193023682 s
function:  construct_graph
function args: (<dask_cudf.DataFrame | 4 tasks | 1 npartitions>,) kwargs: {'directed': False, 'renumber': True}
execution_time: 4.768261909484863
allocation_counts:
{   'tcp://127.0.0.1:32867': {   'current_bytes': '2.0GB',
                                 'peak_bytes': '11.6GB',
                                 'total_bytes': '78.2GB'}}
Edge List Memory = 2.0GB
Peak Memory across workers = 11.6GB
Max Peak to output graph ratio across workers = 5.81
Max Peak to avg input graph ratio across workers = 5.80
Number of edges in final graph = 129,338,598
--------------------------------------------------------------------------------
Number of input edges = 134,217,728, directed = False, renumber = False
Graph creation time = 4.8399293422698975 s
function:  construct_graph
function args: (<dask_cudf.DataFrame | 4 tasks | 1 npartitions>,) kwargs: {'directed': False, 'renumber': False}
execution_time: 4.8399598598480225
allocation_counts:
{   'tcp://127.0.0.1:42681': {   'current_bytes': '2.0GB',
                                 'peak_bytes': '11.6GB',
                                 'total_bytes': '78.2GB'}}
Edge List Memory = 2.0GB
Peak Memory across workers = 11.6GB
Max Peak to output graph ratio across workers = 5.81
Max Peak to avg input graph ratio across workers = 5.80
'NoneType' object has no attribute 'unrenumbered_id_type'
2023-12-05 13:39:15,440 - distributed.nanny - WARNING - Restarting worker
----------------------------------------renumber completed----------------------------------------
----------------------------------------scale = 23 completed----------------------------------------
Number of input edges = 268,435,456, directed = True, renumber = True
Graph creation time = 6.711274862289429 s
function:  construct_graph
function args: (<dask_cudf.DataFrame | 4 tasks | 1 npartitions>,) kwargs: {'directed': True, 'renumber': True}
execution_time: 6.711305618286133
allocation_counts:
{   'tcp://127.0.0.1:38367': {   'current_bytes': '2.1GB',
                                 'peak_bytes': '11.8GB',
                                 'total_bytes': '87.8GB'}}
Edge List Memory = 4.0GB
Peak Memory across workers = 11.8GB
Max Peak to output graph ratio across workers = 5.65
Max Peak to avg input graph ratio across workers = 2.96
Number of edges in final graph = 263,433,821
--------------------------------------------------------------------------------
Number of input edges = 268,435,456, directed = True, renumber = False
Graph creation time = 6.605293273925781 s
function:  construct_graph
function args: (<dask_cudf.DataFrame | 4 tasks | 1 npartitions>,) kwargs: {'directed': True, 'renumber': False}
execution_time: 6.605325222015381
allocation_counts:
{   'tcp://127.0.0.1:42737': {   'current_bytes': '2.1GB',
                                 'peak_bytes': '11.8GB',
                                 'total_bytes': '87.8GB'}}
Edge List Memory = 4.0GB
Peak Memory across workers = 11.8GB
Max Peak to output graph ratio across workers = 5.65
Max Peak to avg input graph ratio across workers = 2.96
Number of edges in final graph = 263,433,821
--------------------------------------------------------------------------------

Main:

....
Dask client/cluster created using LocalCUDACluster
Number of input edges = 134,217,728, directed = True, renumber = True
Graph creation time = 3.636697292327881 s
function:  construct_graph
function args: (<dask_cudf.DataFrame | 4 tasks | 1 npartitions>,) kwargs: {'directed': True, 'renumber': True}
execution_time: 3.6367363929748535
allocation_counts:
{   'tcp://127.0.0.1:34813': {   'current_bytes': '1.0GB',
                                 'peak_bytes': '5.9GB',
                                 'total_bytes': '43.8GB'}}
Edge List Memory = 2.0GB
Peak Memory across workers = 5.9GB
Max Peak to output graph ratio across workers = 5.64
Max Peak to avg input graph ratio across workers = 2.95
Number of edges in final graph = 131,163,339
--------------------------------------------------------------------------------
Number of input edges = 134,217,728, directed = True, renumber = False
Graph creation time = 3.5798535346984863 s
function:  construct_graph
function args: (<dask_cudf.DataFrame | 4 tasks | 1 npartitions>,) kwargs: {'directed': True, 'renumber': False}
execution_time: 3.5798895359039307
allocation_counts:
{   'tcp://127.0.0.1:43881': {   'current_bytes': '1.0GB',
                                 'peak_bytes': '5.9GB',
                                 'total_bytes': '43.8GB'}}
Edge List Memory = 2.0GB
Peak Memory across workers = 5.9GB
Max Peak to output graph ratio across workers = 5.64
Max Peak to avg input graph ratio across workers = 2.95
Number of edges in final graph = 131,163,339
--------------------------------------------------------------------------------
Number of input edges = 134,217,728, directed = False, renumber = True
Graph creation time = 4.675522565841675 s
function:  construct_graph
function args: (<dask_cudf.DataFrame | 4 tasks | 1 npartitions>,) kwargs: {'directed': False, 'renumber': True}
execution_time: 4.675558090209961
allocation_counts:
{   'tcp://127.0.0.1:35815': {   'current_bytes': '2.0GB',
                                 'peak_bytes': '11.6GB',
                                 'total_bytes': '78.2GB'}}
Edge List Memory = 2.0GB
Peak Memory across workers = 11.6GB
Max Peak to output graph ratio across workers = 5.81
Max Peak to avg input graph ratio across workers = 5.80
Number of edges in final graph = 129,338,598
--------------------------------------------------------------------------------
------------------------------------------------------------------------------

Number of input edges = 134,217,728, directed = False, renumber = False
Graph creation time = 4.703786849975586 s
function:  construct_graph
function args: (<dask_cudf.DataFrame | 4 tasks | 1 npartitions>,) kwargs: {'directed': False, 'renumber': False}
execution_time: 4.703828573226929
allocation_counts:
{   'tcp://127.0.0.1:36339': {   'current_bytes': '2.0GB',
                                 'peak_bytes': '11.6GB',
                                 'total_bytes': '78.2GB'}}
Edge List Memory = 2.0GB
Peak Memory across workers = 11.6GB
Max Peak to output graph ratio across workers = 5.81
Max Peak to avg input graph ratio across workers = 5.80
'NoneType' object has no attribute 'unrenumbered_id_type'
2023-12-05 13:46:52,990 - distributed.nanny - WARNING - Restarting worker
----------------------------------------renumber completed----------------------------------------
----------------------------------------scale = 23 completed----------------------------------------

Number of input edges = 268,435,456, directed = True, renumber = True
Graph creation time = 6.533435344696045 s
function:  construct_graph
function args: (<dask_cudf.DataFrame | 4 tasks | 1 npartitions>,) kwargs: {'directed': True, 'renumber': True}
execution_time: 6.533491373062134
allocation_counts:
{   'tcp://127.0.0.1:35793': {   'current_bytes': '2.1GB',
                                 'peak_bytes': '11.8GB',
                                 'total_bytes': '87.8GB'}}
Edge List Memory = 4.0GB
Peak Memory across workers = 11.8GB
Max Peak to output graph ratio across workers = 5.65
Max Peak to avg input graph ratio across workers = 2.96
Number of edges in final graph = 263,433,821
--------------------------------------------------------------------------------
Number of input edges = 268,435,456, directed = True, renumber = False
Graph creation time = 6.50528359413147 s
function:  construct_graph
function args: (<dask_cudf.DataFrame | 4 tasks | 1 npartitions>,) kwargs: {'directed': True, 'renumber': False}
execution_time: 6.505335569381714
allocation_counts:
{   'tcp://127.0.0.1:40855': {   'current_bytes': '2.1GB',
                                 'peak_bytes': '11.8GB',
                                 'total_bytes': '87.8GB'}}
Edge List Memory = 4.0GB
Peak Memory across workers = 11.8GB
Max Peak to output graph ratio across workers = 5.65
Max Peak to avg input graph ratio across workers = 2.96
Number of edges in final graph = 263,433,821
--------------------------------------------------------------------------------

@VibhuJawa VibhuJawa added the non-breaking Non-breaking change label Dec 5, 2023
@VibhuJawa VibhuJawa self-assigned this Dec 5, 2023
@VibhuJawa VibhuJawa added the bug Something isn't working label Dec 5, 2023
@rlratzel
Copy link
Contributor

rlratzel commented Dec 5, 2023

/ok to test

@rlratzel rlratzel linked an issue Dec 6, 2023 that may be closed by this pull request
@rlratzel
Copy link
Contributor

rlratzel commented Dec 6, 2023

/merge

@rapids-bot rapids-bot bot merged commit a5718c6 into rapidsai:branch-24.02 Dec 6, 2023
75 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working non-breaking Non-breaking change python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG]: cuGraph MNMG Hangs
3 participants