Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA]: Add parameter to prevent persisted edgelists in datasets API #4241

Closed
2 tasks done
nv-rliu opened this issue Mar 14, 2024 · 0 comments · Fixed by #4256
Closed
2 tasks done

[FEA]: Add parameter to prevent persisted edgelists in datasets API #4241

nv-rliu opened this issue Mar 14, 2024 · 0 comments · Fixed by #4256
Assignees
Labels
feature request New feature or request improvement Improvement / enhancement to an existing function python
Milestone

Comments

@nv-rliu
Copy link
Contributor

nv-rliu commented Mar 14, 2024

Is this a new feature, an improvement, or a change to existing functionality?

Improvement

How would you describe the priority of this feature request

Low (would be nice)

Please provide a clear description of problem this feature solves

When cugraph.datasets objects are used to clean-up MG tests (ex. #4197), they often need to store edge-lists for SG and MG (dask_cudf) usage. However, the current implementation of datasets requires constant calls to unload to avoid these issues.

This also happened to interfere with CI due to the fact that edge-lists were persisted between files.

Describe your ideal solution

Similar to how MG algorithms have a flag that developers use for testing/debugging (perform_expensive_check), perhaps the datasets API should also have a flag that is set when used for testing purposes in order to automatically check for preexisting edge-lists and unload them.

from cugraph.datasets import karate
df = karate.get_edgelist()
ddf = karate.get_dask_edgelist() # This just returns a cudf.DataFrame instead of dask_cudf

# proposed solution
df = karate.get_edgelist(auto_unload=True) # prevents edge-list from persisting for test usage
ddf = karate.get_dask_edgelist(auto_unload=True)

Describe any alternatives you have considered

Since this issue only affects tests, an alternative could be to use fixtures that perform the "check and unload" steps in each unit test.

Additional context

This is part of a general effort to improve readability of the MG tests #4187

Code of Conduct

  • I agree to follow cuGraph's Code of Conduct
  • I have searched the open feature requests and have found no duplicates for this feature request
@nv-rliu nv-rliu added feature request New feature or request ? - Needs Triage Need team to review and classify labels Mar 14, 2024
@nv-rliu nv-rliu added this to the 24.06 milestone Mar 14, 2024
@nv-rliu nv-rliu added improvement Improvement / enhancement to an existing function python and removed ? - Needs Triage Need team to review and classify labels Mar 14, 2024
@nv-rliu nv-rliu self-assigned this Mar 15, 2024
@rapids-bot rapids-bot bot closed this as completed in #4256 May 2, 2024
rapids-bot bot pushed a commit that referenced this issue May 2, 2024
Closes #4241 

This PR adds an additional check to the `get_edgelist()` and `get_dask_edgelist()` functions in the Datasets API.

This ensures that, when retrieving an edge-list, the internal (`self._edgelist`) type is verified to ensure that the object is SG or MG. 

In addition, minor improvements have also been made `utils/test_dataset.py` to be more thorough with type checks.

Authors:
  - Ralph Liu (https://github.com/nv-rliu)

Approvers:
  - Rick Ratzel (https://github.com/rlratzel)

URL: #4256
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request improvement Improvement / enhancement to an existing function python
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant