You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
thank you very much for your work. We are using HeidelTime in a dynamic setting and have several problems. We will list them here in issues together with design changes suggestions that should address them. Most will be straight-forward to implement for someone familiar with the project.
Is this software still under active development? If not, would you mind translating those high-level propositions to a lower level and point out, which parts of the implementation would need to change for that?
Standalone's dependencies
Speaking about the standalone version, as far as I understand, heideltime needs tokenized text to work, but it doesn't accept pretokenized text as input. Instead it contains hard-coded dependencies on external taggers (for tokenization as well as for POS-tagging), which need to be installed separately.
This has several disadvantages:
Out of sync Tokenization if you don't use the exact same Tokenizer (even then you have to run the Tokenizer twice)
The internally used Tokens are forgotten, as the TimeML-version in use does not support explicit Token-tags.
hard-coded dependencies (use those specific Tokenizers/Taggers or use none at all)
it's not standalone
currently generating the TimeML for a single textfile involves loading a big language model for Tokenization/POS-Tagging. tagging another file repeats the whole procedure.
Especially in dynamic contexts this introduces a huge cost that could be easily avoided.
It's quite simple to parse Tokenized texts, for example they could be given in a "one token per line" format, or similarly something like CoNLL. Not much harder should it be, to implement something similar allowing for already POS-tagged text, completely getting rid of hard-coded external dependencies without reducing performance, necessarily.
Solution:
Provide a way to parse pretokenized texts instead of invoking an external Tokenizer on your own.
Add CLI-Option to define data format (raw / pretokenized / POS-tagged (CoNLL)
The text was updated successfully, but these errors were encountered:
Hello,
thank you very much for your work. We are using HeidelTime in a dynamic setting and have several problems. We will list them here in issues together with design changes suggestions that should address them. Most will be straight-forward to implement for someone familiar with the project.
Is this software still under active development? If not, would you mind translating those high-level propositions to a lower level and point out, which parts of the implementation would need to change for that?
Standalone's dependencies
Speaking about the standalone version, as far as I understand, heideltime needs tokenized text to work, but it doesn't accept pretokenized text as input. Instead it contains hard-coded dependencies on external taggers (for tokenization as well as for POS-tagging), which need to be installed separately.
This has several disadvantages:
Especially in dynamic contexts this introduces a huge cost that could be easily avoided.
It's quite simple to parse Tokenized texts, for example they could be given in a "one token per line" format, or similarly something like CoNLL. Not much harder should it be, to implement something similar allowing for already POS-tagged text, completely getting rid of hard-coded external dependencies without reducing performance, necessarily.
Solution:
The text was updated successfully, but these errors were encountered: