Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem of the input data format in preprocessing. #25

Open
Rainbow0625 opened this issue Jun 24, 2020 · 1 comment
Open

Problem of the input data format in preprocessing. #25

Rainbow0625 opened this issue Jun 24, 2020 · 1 comment

Comments

@Rainbow0625
Copy link

"The input data format for parsing should be raw document with one sentence per line."

I put a sentence in a file without a suffix ending in a period like the above, but the files after preprocessing are all 0 bytes.
Why is that?

Please help me, thank you very much!!!!

@Rainbow0625
Copy link
Author

After I change the suffix of the input file to 'xxx.sent', there is a new error:

Start Stanford CoreNLP...
java -Xmx2500m -cp stanfordnlp/stanford-corenlp-full-2015-04-20/stanford-corenlp-3.5.2.jar:stanfordnlp/stanford-corenlp-full-2015-04-20/stanford-corenlp-3.5.2-models.jar:stanfordnlp/stanford-corenlp-full-2015-04-20/joda-time.jar:stanfordnlp/stanford-corenlp-full-2015-04-20/xom.jar:stanfordnlp/stanford-corenlp-full-2015-04-20/jollyday.jar:stanfordnlp/stanford-corenlp-full-2015-04-20/protobuf.jar:stanfordnlp/stanford-corenlp-full-2015-04-20/javax.json.jar:stanfordnlp/stanford-corenlp-full-2015-04-20/ejml-0.23.jar edu.stanford.nlp.pipeline.StanfordCoreNLP -props stanfordnlp/default.properties
Loading Models: 4/4
Read token,lemma,name entity file rawData.sent.prp...

[ERROR] Timeout
Traceback (most recent call last):
File "/Users/Rainbow/Desktop/AMR/AMRParsing/stanfordnlp/corenlp.py", line 508, in parse
data = parse_parser_results_new(result)
File "/Users/Rainbow/Desktop/AMR/AMRParsing/stanfordnlp/corenlp.py", line 154, in parse_parser_results_new
seqs = re.split("\r\n", text)
File "/anaconda3/lib/python3.7/re.py", line 213, in split
return _compile(pattern, flags).split(string, maxsplit)
TypeError: expected string or bytes-like object

Traceback (most recent call last):
File "amr_parsing.py", line 437, in
main()
File "amr_parsing.py", line 170, in main
instances = preprocess(amr_file,START_SNLP=True,INPUT_AMR=args.amrfmt, PRP_FORMAT=args.prpfmt)
File "/Users/Rainbow/Desktop/AMR/AMRParsing/preprocessing.py", line 439, in preprocess
instances = proc1.parse(tmp_sent_filename)
File "/Users/Rainbow/Desktop/AMR/AMRParsing/stanfordnlp/corenlp.py", line 511, in parse
raise e
File "/Users/Rainbow/Desktop/AMR/AMRParsing/stanfordnlp/corenlp.py", line 508, in parse
data = parse_parser_results_new(result)
File "/Users/Rainbow/Desktop/AMR/AMRParsing/stanfordnlp/corenlp.py", line 154, in parse_parser_results_new
seqs = re.split("\r\n", text)
File "/anaconda3/lib/python3.7/re.py", line 213, in split
return _compile(pattern, flags).split(string, maxsplit)
TypeError: expected string or bytes-like object

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant