Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a few igraph algorithms to run via scripts/bench.py #54

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

eriknw
Copy link
Member

@eriknw eriknw commented Mar 11, 2023

See #53 (this PR maybe closes it).

CC @szhorvat. I'm not at all familiar with using igraph, so I'm hoping you can take a look at this. What other algorithms do you think would be easy to compare against? I think we can compare some SSSP and BFS algorithms once #51 is in. Also, how is one supposed to load Matrix Market files into an igraph Graph? Going from *.mtx file to scipy.sparse to networkx to igraph seems less than ideal.

I'm not sure the most fair way to compare PageRank. On one hand, comparing PageRank with default arguments is fair because that's how they will typically get used in practice. On the other hand, we can view PageRank and other centrality measures as ranking metrics (i.e., order them by score), so a fair comparison could be to run iterative versions of the algorithm until the same ranking is achieved.

@szhorvat
Copy link

szhorvat commented Mar 12, 2023

Including @ntamas here you might have more feedback. I'll also take a look next week.

I wanted to make it clear again that I asked about benchmarking against igraph only out of curiosity, and to better understand what performance is possible with GraphBLAS. This is not part of the pyOpenSci review.

@eriknw
Copy link
Member Author

eriknw commented Mar 13, 2023

I wanted to make it clear again that I asked about benchmarking against igraph only out of curiosity, and to better understand what performance is possible with GraphBLAS. This is not part of the pyOpenSci review.

Understood. We have your attention, so I'd like to use it 😉 . I think many people are curious about how igraph compares.

Part of the value proposition of python-graphblas is it provides high performance sparse data structures and operations on them. python-graphblas provides the foundation from which algorithms can be created; graphblas-algorithms is a collection of such algorithms. Users aren't limited to use graphblas-algorithms though--it is possible to explore and invent custom, bespoke, and hopefully fast algorithms that specifically suite ones needs. Some of our earliest (and slightly messy) examples of this are here: https://github.com/metagraph-dev/grblas-recipes/ (someday we'd like to clean up and update some of these to include in our documentation).

So, yeah, I appreciate that you aren't expecting us to benchmark against igraph as part of the review. Nevertheless, if we are able to do so even in a limited fashion, it can help support our claimed value proposition that GraphBLAS can be high-performance, and, hence, a useful tool for people.

Here are results from my initial benchmarking comparing GraphBLAS and igraph using this PR running on a DGX. One should be skeptical of benchmarks in general including these, and I'm sure they don't tell the full story, but the results may be interesting nevertheless:

graphblas_vs_igraph1

@codecov-commenter
Copy link

Codecov Report

Patch coverage has no change and project coverage change: -1.25 ⚠️

Comparison is base (6de1fd6) 69.16% compared to head (8e58900) 67.91%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #54      +/-   ##
==========================================
- Coverage   69.16%   67.91%   -1.25%     
==========================================
  Files          77       77              
  Lines        3152     3151       -1     
  Branches      602      597       -5     
==========================================
- Hits         2180     2140      -40     
- Misses        786      832      +46     
+ Partials      186      179       -7     
Impacted Files Coverage Δ
graphblas_algorithms/nxapi/_utils.py 57.47% <0.00%> (ø)

... and 7 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

if backend == "networkx":
return G
if backend == "igraph":
# TODO: is there a better way for igraph to read MM files or scipy.sparse arrays?
Copy link

@szhorvat szhorvat Mar 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, how is one supposed to load Matrix Market files into an igraph Graph? Going from *.mtx file to scipy.sparse to networkx to igraph seems less than ideal.

You can use igraph.Graph.Adjacency to convert from a SciPy sparse matrix.

There is no direct support for reading MM files in igraph.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just what I was looking for, thanks!

implementation="prpack",
):
if personalization is not None:
raise NotImplementedError

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personalization is provided through the reset and reset_vertices parameters of personalized_pagerank()

https://python.igraph.org/en/stable/api/igraph.GraphBase.html#personalized_pagerank

if nstart is not None:
raise NotImplementedError
if dangling is not None:
raise NotImplementedError

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just confirming that the behaviour is the same as with dangling=None in NetworkX.

Comment on lines +44 to +45
# TODO: check results when `count_zeros=False`
mode = "zero" if count_zeros else "nan"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is correct.

@szhorvat
Copy link

I'm not sure the most fair way to compare PageRank. On one hand, comparing PageRank with default arguments is fair because that's how they will typically get used in practice.

Probably PageRank is just not the best type of computation for benchmarking ... The results from using the default arguments are still practically useful though.

I just checked what PRPACK does exactly. For graph with less than 128 vertices, it uses Gaussian elimination (which is exact, no tolerance comes into play). For larger graphs, it seems to use a variant of Gauss-Seidel iteration. igraph uses a hard-coded tolerance of $10^{-10}$ here, but tolerances are not comparable between different iterative algorithms anyway.

@szhorvat
Copy link

Here are results from my initial benchmarking comparing GraphBLAS and igraph using this PR running on a DGX.

Thanks for sharing these results! I assume these were run in parallel, which is fair, since this seems to be one of the major advantages of using GraphBLAS. But it would be interesting to see a serial comparison too.

In igraph, PRPACK can use some parallelism, but it's not very effective. This can be disabled by setting the OMP_NUM_THREADS=1 environment variable. Most other parts of igraph don't run in parallel, except when relying on a parallelized external library (e.g. BLAS).

@eriknw
Copy link
Member Author

eriknw commented Mar 17, 2023

But it would be interesting to see a serial comparison too.

Agreed! I'll try to get new results this weekend.

This can be disabled by setting the OMP_NUM_THREADS=1 environment variable

Setting this environment variable works for python-graphblas as well, or one can do gb.ss.config["nthreads"] = 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants