You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I really like the paper and the idea! And also thank you for releasing the code base!
I am currently working on my master's thesis and I am planning to augment this architecture with knowledge infusion.
While doing so, I encountered an issue with the code to convert the CoNLL03 dataset to the required json structure.
In the tables below, you can see that using your code (denoted eth_asp) does not capture 27 entities over the train, dev and test sets.
Your code does not check for entities at the end of the document -> they are not recognized.
I propose the following changes to your code:
if line == "-DOCSTART- -X- -X- O": # new doc
if doc is not None:
# when extended is not the same as tokens
# mark where to copy from with <extra_id_22> and <extra_id_23>
# E.g.
# Extract entities such as apple, orange, lemon <extra_id_22> Give me a mango . <extra_id_23>
# See ace05_to_json.py for example of extending the input
# FIX: missing entities <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
if start is not None:
doc['entities'].append({
"type":
current_type,
"start":
start,
"end":
idx if idx > start else idx + 1
})
# FIX: missing entities >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
doc["extended"] = doc["tokens"]
dataset.append(doc)
doc = {
"tokens": [], # list of tokens for the model to copy from
"extended":
[], # list of input tokens. Prompts, instructions, etc. go here
"entities": [
] # list of dict:{"type": type, "start": start, "end": end}, format: [start, end)
}
idx, start = -1, None
continue
Best regards,
Robin
The text was updated successfully, but these errors were encountered:
Dear Tianyu Liu,
I really like the paper and the idea! And also thank you for releasing the code base!
I am currently working on my master's thesis and I am planning to augment this architecture with knowledge infusion.
While doing so, I encountered an issue with the code to convert the CoNLL03 dataset to the required json structure.
In the tables below, you can see that using your code (denoted eth_asp) does not capture 27 entities over the train, dev and test sets.
Your code does not check for entities at the end of the document -> they are not recognized.
I propose the following changes to your code:
Best regards,
Robin
The text was updated successfully, but these errors were encountered: