Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix OOB error, BFS C API should validate that the source vertex is a valid vertex #4077

Merged
merged 10 commits into from
Jan 12, 2024

Conversation

ChuckHastings
Copy link
Collaborator

@ChuckHastings ChuckHastings commented Jan 4, 2024

  • Added error check to be sure that the source vertex is a valid vertex in the graph.
  • Updated nx_cugraph.Graph class to create PLC graphs using vertices_array in order to include isolated vertices. This is now needed since the error check added in this PR prevents NetworkX tests from passing if isolated vertices are treated as invalid, so this change prevents that.
  • This resolves the problem that required the test workarounds done here in 4029, so those workarounds have been removed in this PR.

Closes #4067

@ChuckHastings ChuckHastings self-assigned this Jan 4, 2024
@ChuckHastings ChuckHastings added bug Something isn't working non-breaking Non-breaking change and removed cuGraph labels Jan 4, 2024
@ChuckHastings ChuckHastings added this to the 24.02 milestone Jan 4, 2024
@ChuckHastings ChuckHastings marked this pull request as ready for review January 4, 2024 00:14
@ChuckHastings ChuckHastings requested a review from a team as a code owner January 4, 2024 00:14
size_t local_count = thrust::count_if(
handle.get_thrust_policy(), span.begin(), span.end(), [value] __device__(data_t d) {
return d == value;
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Comment on lines 242 to 245
if constexpr (multi_gpu) {
local_count = cugraph::host_scalar_allreduce(
handle.get_comms(), local_count, raft::comms::op_t::SUM, handle.get_stream());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the other functions in this file are SG functions, and this function is an MG function. This might be confusing...

Should we better resolve this?

  1. No.
  2. Create another file for MG utility wrapper functions
  3. Just call thrust::count() and host_scalar_allreduce() instead of creating this function.
  4. Rename functions to better hint that this function performs allerduce if MG
  5. Something else?

What do you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look at option 2.

While I like option 3, there are many cases where we don't need to create a .cu file except for common utilities like this. This allows us to leave many files as .cpp files instead.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think what I'll do is move the all reduce (which doesn't use thrust, so can be called in a .cpp file) out of the function, making it an SG function. So I'll leave it here but remove the multi_gpu flag.

Copy link
Contributor

@seunghwak seunghwak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

…l case testing and skipped tests from prior PR needed due to the bug this PR fixes.
@rlratzel rlratzel requested a review from eriknw January 6, 2024 05:48
Comment on lines -114 to -118
# Individually run tests that are skipped above b/c they may run out of memory
PYTEST_NO_SKIP=True ./run_nx_tests.sh --cov-append -k "TestDAG and test_antichains"
PYTEST_NO_SKIP=True ./run_nx_tests.sh --cov-append -k "TestMultiDiGraph_DAGLCA and test_all_pairs_lca_pairs_without_lca"
PYTEST_NO_SKIP=True ./run_nx_tests.sh --cov-append -k "TestDAGLCA and test_all_pairs_lca_pairs_without_lca"
PYTEST_NO_SKIP=True ./run_nx_tests.sh --cov-append -k "TestEfficiency and test_using_ego_graph"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ops-codeowners: these lines are removed because the tests are all running as part of the same invocation again, so we're not losing any tests, we're just running them the standard way again (ie. removing a special case no longer needed)

@rlratzel
Copy link
Contributor

/merge

1 similar comment
@rlratzel
Copy link
Contributor

/merge

@rapids-bot rapids-bot bot merged commit 24d02a5 into rapidsai:branch-24.02 Jan 12, 2024
98 checks passed
rapids-bot bot pushed a commit that referenced this pull request Jan 16, 2024
…4092)

Hooray for removing and cleaning code! Tests also added (we already tested isolated nodes for Louvain).

nx-cugraph was updated to handle isolated nodes by passing the node set to PLC in #4077

Authors:
  - Erik Welch (https://github.com/eriknw)

Approvers:
  - Rick Ratzel (https://github.com/rlratzel)

URL: #4092
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working ci cuGraph non-breaking Non-breaking change python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OOM error raised after subsequent calls to plc.bfs with different SGGraphs
5 participants