Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parser doesn't produce amr-unknown #14

Open
PolKul opened this issue Aug 26, 2021 · 5 comments
Open

parser doesn't produce amr-unknown #14

PolKul opened this issue Aug 26, 2021 · 5 comments

Comments

@PolKul
Copy link

PolKul commented Aug 26, 2021

I was able to train the parser as per your instructions. But when testing the trained model I found that it didn't produce amr-unknown node. For example:

Text: Which architect of Marine Corps Air Station Kaneohe Bay was also tenant of New Sanno hotel?
# ::node	1	person	1-2
# ::node	2	architect-01	1-2
# ::node	3	facility	3-9
# ::node	5	also	10-11
# ::node	6	reside-01	11-12
# ::node	7	company	13-16
# ::node	10	name	3-9
# ::node	11	"Marine"	3-9
# ::node	12	"Corps"	3-9
# ::node	13	"Air"	3-9
# ::node	14	"Station"	3-9
# ::node	15	"Kaneohe"	3-9
# ::node	16	"Bay"	3-9
# ::node	18	name	13-16
# ::node	19	"New"	13-16
# ::node	20	"Sanno"	13-16
# ::node	21	"Hotel"	13-16
# ::root	6	reside-01
# ::edge	person	ARG0-of	architect-01	1	2	
# ::edge	architect-01	ARG1	facility	2	3	
# ::edge	reside-01	mod	also	6	5	
# ::edge	reside-01	ARG0	person	6	1	
# ::edge	reside-01	ARG1	company	6	7	
# ::edge	facility	name	name	3	10	
# ::edge	name	op1	"Marine"	10	11	
# ::edge	name	op2	"Corps"	10	12	
# ::edge	name	op3	"Air"	10	13	
# ::edge	name	op4	"Station"	10	14	
# ::edge	name	op5	"Kaneohe"	10	15	
# ::edge	name	op6	"Bay"	10	16	
# ::edge	company	name	name	7	18	
# ::edge	name	op1	"New"	18	19	
# ::edge	name	op2	"Sanno"	18	20	
# ::edge	name	op3	"Hotel"	18	21	
# ::short	{1: 'p', 2: 'a', 3: 'f', 5: 'a2', 6: 'r', 7: 'c', 10: 'n', 11: 'x0', 12: 'x1', 13: 'x2', 14: 'x3', 15: 'x4', 16: 'x5', 18: 'n2', 19: 'x6', 20: 'x7', 21: 'x8'}	
(r / reside-01
      :ARG0 (p / person
            :ARG0-of (a / architect-01
                  :ARG1 (f / facility
                        :name (n / name
                              :op1 "Marine"
                              :op2 "Corps"
                              :op3 "Air"
                              :op4 "Station"
                              :op5 "Kaneohe"
                              :op6 "Bay"))))
      :ARG1 (c / company
            :name (n2 / name
                  :op1 "New"
                  :op2 "Sanno"
                  :op3 "Hotel"))
      :mod (a2 / also))
@PolKul
Copy link
Author

PolKul commented Aug 27, 2021

parsing the same sentence with amrlib parser, for example, gives me this result with amr-unknown:

# ::snt Which architect of Marine Corps Air Station Kaneohe Bay was also tenant of New Sanno hotel?
(t / tenant-01
      :ARG0 (a / amr-unknown
            :ARG0-of (a2 / architect-01
                  :ARG1 (f / facility
                        :name (n / name
                              :op1 "Marine"
                              :op2 "Corps"
                              :op3 "Air"
                              :op4 "Station"
                              :op5 "Kaneohe"
                              :op6 "Bay"))))
      :ARG1 (h / hotel
            :name (n2 / name
                  :op1 "New"
                  :op2 "Sanno"))
      :mod (a3 / also))

@ramon-astudillo
Copy link
Member

It should produce amr-unknown, we use this often for question parsing.

What did you trained it with? I just checked on a v0.4.2 deploy and it parses correctly. Also, do you tokenize?

@PolKul
Copy link
Author

PolKul commented Aug 28, 2021

hi @ramon-astudillo, well, I was trying to follow your setup instructions from here for setup and training (the default action-pointer network config bash run/run_experiment.sh configs/amr2.0-action-pointer.sh ). This is the code for inference:

from transition_amr_parser.parse import AMRParser
amr_parser_checkpoint = "/DATA/AMR2.0/models/exp_cofill_o8.3_act-states_RoBERTa-large-top24/_act-pos-grh_vmask1_shiftpos1_ptr-lay6-h1_grh-lay123-h2-allprev_1in1out_cam-layall-h2-abuf/ep120-seed42/checkpoint_best.pt"
parser = AMRParser.from_checkpoint(amr_parser_checkpoint)
words = [word.strip(string.punctuation) for word in text.split()]
annotations = parser.parse_sentences([words])

@PolKul
Copy link
Author

PolKul commented Aug 28, 2021

would mind sharing your trained checkpoint to see if it makes any difference?

@ramon-astudillo
Copy link
Member

would mind sharing your trained checkpoint to see if it makes any difference?

I am certain it should. We are looking into sharing pre-trained models but I can not say anything at this point.

Also FYI we will update to v0.5.1 soon (post EMNLP preprint submission deadline). This new model (Structured-BART) is new SoTA for AMR2.0 and will be published at EMNLP2021, a non updated prerprint is here https://openreview.net/forum?id=qjDQCHLXCNj

From experience in parsing questions, I can say silver-data fine-tuning works well. You can parse some text corpus with questions, filter it with a couple of rules*, and the use it as additional training data. The training scheme silver+gold pre-training with gold fine-tuning seems to work best, see e.g. https://aclanthology.org/2020.findings-emnlp.288/

(*) For example ignore all parses having :rel (which indicates a detached subgraph) or with missing amr-unknown (if you are certain it should have one).

ramon-astudillo pushed a commit that referenced this issue Jun 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants