Replies: 11 comments
-
Thank you @fukidzon I've reworked the code that you provided as a Colab notebook: https://gist.github.com/ceteri/f3bfac641cffb61e10af5aae7eefc9dd so people can view and interact with the problem. The root issue appears to be that the noun chunks produced by the To address your main question:
If you have an example of an extension that has implemented "3856" then yes we could add support for that in the next release. |
Beta Was this translation helpful? Give feedback.
-
Also, there's another implied question:
While that's possible, and somewhat closer to the original algorithm description, it would a larger job to refactor the code. I'll take a look, and try to scope it. We may be able to add a Back to your original question on StackOverflow could you provide a brief example text in |
Beta Was this translation helpful? Give feedback.
-
Another issue that was mentioned:
See the gist it appears that |
Beta Was this translation helpful? Give feedback.
-
The points above identify two issues in the |
Beta Was this translation helpful? Give feedback.
-
@ceteri You can find the explanation for only ['textrank'] showing up in @fukidzon @ceteri |
Beta Was this translation helpful? Give feedback.
-
Thank you kindly @asajatovic, that's good to know and makes a lot of sense to use that approach. @fukidzon I can help with a syntax iterator implementation. To start I'd need more about a language sample and expected output -- there are core models in |
Beta Was this translation helpful? Give feedback.
-
@ceteri @asajatovic thank you for the comments! I created a colab notebook with custom noun_chunks example for Slovak: https://colab.research.google.com/drive/1tLMUMpFTGvxvp32YQYF5LC-nlTlUdtYz To create a syntax_iterators for Slovak language would be the best solution - I was already checking it but I think it needs a deeper look into the language structure to make it correctly (the best would be if it's a part of spaCy code, not just a local workaround )
I like the idea, that it can be possible to provide some other source of "noun_chunks" |
Beta Was this translation helpful? Give feedback.
-
I found a solution:
I'm not sure how clean is this workaround |
Beta Was this translation helpful? Give feedback.
-
I have much the same problem using the
Adding a syntax_iterator seems like the cleanest thing to do. The only concern I would have with the presented solution is that it requires the parser to be after the tagger in the pipeline. |
Beta Was this translation helpful? Give feedback.
-
Hi, I'm trying to use Is this only works with |
Beta Was this translation helpful? Give feedback.
-
Hi @andremacola, could you help us by showing some example code about the pipeline you're building with spaCy 3.x ? The code for Also, if this is more of a spaCy question, we could move this thread to https://github.com/explosion/spaCy/discussions/ to get more help. |
Beta Was this translation helpful? Give feedback.
-
I wanted to use pytextrank together with spacy_udpipe to get keywords from texts in other languages (see https://stackoverflow.com/questions/59824405/spacy-udpipe-with-pytextrank-to-extract-keywords-from-non-english-text) but I realized, that udpipe-spacy somehow "overrides" the original spacy's pipeline so the noun_chunks are not generated (btw: the noun_chunks are created in lang/en/syntax_iterators.py but it doesn't exist for all languages so even if it is called, it doesn't work e.g. for Slovak language)
Pytextrank keywords are taken from the spacy doc.noun_chunks, but if the noun_chunks are not generated, pytextrank doesn't work.
Sample code:
Would it be possible that pytextrank processes the "noun_chunks" (candidates for keywords) from a custom extension (function which uses a Matcher and the result is available e.g. as a doc._.custom_noun_chunks - see explosion/spaCy#3856 )?
Beta Was this translation helpful? Give feedback.
All reactions