Replies: 4 comments
-
Hi @imeano, Given how TextRank works, there are strict needs for what the parsers tend to produce:
The noun chunking part was an extended case that I have added (along with use of lemmatization) to make the algorithm more effective. Does that help? Also, does https://spacy.io/models/pt provide an effective parser for Portuguese? |
Beta Was this translation helpful? Give feedback.
-
Thanks for the response. It does answer my question, even tough I didn't asked as best I could. I used spaCy's terminology without specifying it clearly. Because spacy's DependencyParser, as a pipeline component, is called simply "parser" I tend to also just call it parser. From testing, I came up with the following:
So, assuming those features are the only ones needed for pytextrank to work properly, it seems I can disable the DependencyParser as long as I include noun_chunking and sentence segmentation pipeline components. I was quite sure I could get it to work with these alterations, but was afraid to get different results.
Mostly effective I would say. I've worked with linguists and they couldn't make much use of the syntactic trees produced (errors in syntactic parser tend to to accumulate as far from ROOT you get). Sentence segmentation is quite good for sentences that aren't too long. As for the Tagger, POS is quite good, but TAG_MAP is too huge in my opinion. |
Beta Was this translation helpful? Give feedback.
-
Can you please elaborate on this one more explicitly? i.e. if we can't remove
Then is Please spell the redundant parts more explicitly 🙇 It means a lot when the text is big and removing any redundant pipeline would help a lot, memory-wise. |
Beta Was this translation helpful? Give feedback.
-
Hi @guy4261, No, none of the textgraph algorithms would work with a parser disabled. Disabling NER might be an option. It depends on the language, version of other pipeline components, etc., so you'd need to experiment. |
Beta Was this translation helpful? Give feedback.
-
Hello,
I'm using pytextrank with texts in Portuguese. Thanks to issue #54 I'm able to use POS information to produce some basic noun chunking, instead of syntactic information from the parser.
My question is, in this case where I'm producing chunks from POS, am I loosing something if I disable the parser and create a new pipeline component just for chunking? Are there other relevant information given by the parser used?
Beta Was this translation helpful? Give feedback.
All reactions