Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with finding diameter of network prior to implementation of Diameter() in dossier.py #91

Open
ghost opened this issue Aug 10, 2021 · 0 comments

Comments

@ghost
Copy link

ghost commented Aug 10, 2021

Suppose reactions with corresponding reactant-product tuples are given.

>>> reactions = [[('M73', 'M293'), ('M128', 'M175'), ('M156', 'M54'), ('M257', 'M167'), ('M98', 'M15'), ('M184', 'M56'), ('M204', 'M19'), ('M157', 'M253')]]
>>> G = nx.Graph()
>>> G.add_edges_from([r for r in reactions])
>>> nx.diameter(G)

Execution of the last statement gives a nx.NetworkXError exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/arghomaitra/.pyenv/versions/3.9.6/lib/python3.9/site-packages/networkx/algorithms/distance_measures.py", line 299, in diameter
    e = eccentricity(G)
  File "/Users/arghomaitra/.pyenv/versions/3.9.6/lib/python3.9/site-packages/networkx/algorithms/distance_measures.py", line 264, in eccentricity
    raise nx.NetworkXError(msg)
networkx.exception.NetworkXError: Found infinite path length because the graph is not connected

So the first thing is that the graph is not connected and we can't find diameter just by doing nx.diameter(G) like we did for local efficiency and global efficiency.

Looked at this SO post and saw that we can do two things:

  1. check if the graph is connected using nx.is_connected(G). If nx.is_connected(G) is False (which in our case is definitely false), then find connected components in the graph using nx.connected_components(G).
  2. after finding connected components, make subgraphs of components and find diameter of each component.

So here it is as follows:

>>> nx.is_connected(G)
False
>>> S = [G.subgraph(c).copy() for c in nx.connected_components(G)]
>>> d = [nx.diameter(i) for i in S]
>>> d
[7, 2, 3, 1, 4, 1, 1, 1, 4, 2, 2, 2, 5, 1, 2, 1, 2, 1, 1, 3, 1, 1, 1, 2, 2, 2, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 2, 1, 2, 1, 1]

We used nx.local_efficiency(G) and nx.global_efficiency(G) and got desired value every time a sequence was processed. However, for diameter, every time a single genome is processed, we get the diameter of its connected components via subgraphs like the list you see above.

So, how do we choose the diameter? Do we choose the max value from the list of diameters or something like that? Or something else entirely?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

0 participants