Replies: 1 comment 2 replies
-
Hi @baregawi that's a nice idea. It would be great if you create a pull request with your changes. I'd be happy to review it then. Looking through the code here it already looks quite good to me. Regarding the initialization of
Looking forward to your pull request. You can also make it a draft pull request so that we can give feedback early on. 👍 |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello haystack team,
I just came across this awesome package while building systems like this at work and after working with it for a bit I wanted to contribute a few changes. The first one is a very simple change to how InMemoryDocumentStore implements query_by_embedding.
Currently the code loops and does an
np.dot
orscipy.spatial.distance.cosine
for each prospective document. This is the most computationally intense step in that function and it is executing something like 1000x slower than it could. One dot product at a time does not allownumpy
to behave in a cache friendly manner. And the fact thatnumpy
is used makes it so CPUs are used even when a GPU is available on the machine.The change would like this, the following code block from https://github.com/deepset-ai/haystack/blob/master/haystack/document_stores/memory.py#L202:
Would be changed to something like:
This would execute MUCH quicker. And since we only run
max_samples_at_once = 5000
at one time we wouldn't be using to much GPU memory at one time. I also use this pattern for dot products in a lot of places. It is memory stable on a GPU in a production environment.What do people think?
Beta Was this translation helpful? Give feedback.
All reactions