-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implemented MinkowskiKNNEdges #630
Implemented MinkowskiKNNEdges #630
Conversation
There was a bug where source/target was swapped.
@AMHermansen I think this could be a nice addition to the repo! It would be fantastic if you could add a unit test that assert that the code works, and some kind of quantification of computation speed for future reference (See attached plot as an example) The overhead of measuring the execution speed as shown above is quite small, and I'd be happy to share the code via slack that I used to make the plot. |
If my memory of algorithmic complexity serves me right, the time complexity should be O(B*n^2), where B is number of batches and n is largest number of pulses in a batch, but I'll make a concrete benchmark that shows speeds on the hardware that I'm using. Do you have a suggestion for values of k and n, for which I should try it on (I believe only n should matter, but I can probably test it both), and should I also benchmark on cpu vs gpu? I'm assuming that randomly generated data suffices for the computational benchmark? |
I think it will be sufficient to probe
Yes, that is perfectly fine. |
Hello @RasmusOrsoe For a run where I was cropping graphs to have at-most 256 nodes I didn't notice any particular slowdown from this definition, so the increased computation time seems to be acceptable at least up to 256-nodes. I would imagine that a custom cpp/cu implementation would reach similar speeds as the knn-benchmark, should we continue with this implementation or do you find the slowdown to be too great? |
@AMHermansen Thank you for this very thorough test! I think the execution times here are well within the margins of acceptance - great job! I don't think it's necessary to optimize this further. |
@RasmusOrsoe Could you have a look at this PR. I'm still not entirely sure that the row/coll order is correct. I think the easiest way for you to verify this is in the unit test, where I include |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @AMHermansen,
Sorry for the wait; I had overseen your request for review.
It looks to me like the format of the edge_index
is correct and with the right dtype
. Generally, the format should be [2, n_edges]
. I took a look at your expected edge indices and I do think they look correct to me!
I've left a few superficial comments.
|
||
Returns: Matrix of shape (n, m) of all pairwise Minkowski distances. | ||
""" | ||
space_coords = space_coords or [0, 1, 2] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a particular reason why you don't let space_coords
default to [0,1,2]
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand your question right, you're asking why I have space_coords=None
in the function definition, and then later do space_coords = space_coords or [0, 1, 2]
.
The reason is that I generally avoid having mutable objects as default parameters, since IMO python treats mutable default objects in a non-intuitive way, and it can create some hard-to-debug bugs.
An example from the Python Documentation would be:
def f(a, L=[]):
L.append(a)
return L
print(f(1)) # prints [1]
print(f(2)) # prints [1, 2]
print(f(3)) # prints [1, 2, 3]
The above behavior is related to how python treats function definitions. More information can be found in Section 4.8.1 and the related warning in the link-above.
The same behavior was what was changed in #633
self.nb_nearest_neighbours = nb_nearest_neighbours | ||
self.c = c | ||
self.time_like_weight = time_like_weight | ||
self.space_coords = space_coords or [0, 1, 2] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment above
Implementation of MinkowskiKNNEdges.
Current implementation is written purely in python, and has a larger than necessary memory overhead. Current memory consumption is O(n^2), where n is the largest number of pulses in a single event. A custom implementation in cpp/cuda would reduce this to O(n*k) where k is number of edges.
The code is currently working, but I'll try to construct some sort of unit test to verify it.