Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttributeError #38

Open
Jeevi10 opened this issue Mar 9, 2020 · 7 comments
Open

AttributeError #38

Jeevi10 opened this issue Mar 9, 2020 · 7 comments

Comments

@Jeevi10
Copy link

Jeevi10 commented Mar 9, 2020

when I ran my own data set to decode it gives an error,
for tool in tools:
shuf = tool.decode(docs)

AttributeError Traceback (most recent call last)
in
1 for tool in tools:
----> 2 shuf = tool.decode(docs)

~/anaconda3/lib/python3.7/site-packages/elit/component/tagger/pos_tagger.py in decode(self, docs, **kwargs)
61 if isinstance(docs, Document):
62 docs = [docs]
---> 63 samples = NLPTaskDataFetcher.convert_elit_documents(docs)
64 with self.context:
65 sentences = self.tagger.predict(samples)

~/anaconda3/lib/python3.7/site-packages/elit/component/tagger/corpus.py in convert_elit_documents(docs)
1298 dataset = []
1299 for d in docs:
-> 1300 for s in d.sentences:
1301 sentence = Sentence()
1302

AttributeError: 'str' object has no attribute 'sentences

but it works fine for the example given is documentation!
Please help me to figure this out.

@hankcs
Copy link
Contributor

hankcs commented Mar 9, 2020

Not sure what did you put in the docs, but it's supposed to be a str (can contain many sents then the tokenizer will split it into several sents).

@Jeevi10
Copy link
Author

Jeevi10 commented Mar 9, 2020

Not sure what did you put in the docs, but it's supposed to be a str (can contain many sents then the tokenizer will split it into several sents).

this is one of the sample of my docs(R8 dataset)

'france currency intervention debt france today repaid billion francs short term currency intervention debt european monetary cooperation fund finance ministry said said debt part billion franc liability incurred swap facilities defend franc january european monetary system realignment realignment following several weeks speculative pressure produced three pct revaluation west german mark dutch guilder french franc two pct revaluation belgian franc reuter'

@hankcs
Copy link
Contributor

hankcs commented Mar 9, 2020

Did you put a tokenizer in your tools? I just ran the POS and it works fine.

from elit.component import NERFlairTagger
from elit.component.tokenizer import EnglishTokenizer
from elit.structure import Document

tagger = NERFlairTagger()
tagger.load()
components = [EnglishTokenizer(), tagger]
docs = 'france currency intervention debt france today repaid billion francs short term currency intervention debt european monetary cooperation fund finance ministry said said debt part billion franc liability incurred swap facilities defend franc january european monetary system realignment realignment following several weeks speculative pressure produced three pct revaluation west german mark dutch guilder french franc two pct revaluation belgian franc reuter'
for c in components:
    docs = c.decode(docs)
for d in docs:  # type: Document
    print(d)

{'sens': [{'tok': ['france', 'currency', 'intervention', 'debt', 'france', 'today', 'repaid', 'billion', 'francs', 'short', 'term', 'currency', 'intervention', 'debt', 'european', 'monetary', 'cooperation', 'fund', 'finance', 'ministry', 'said', 'said', 'debt', 'part', 'billion', 'franc', 'liability', 'incurred', 'swap', 'facilities', 'defend', 'franc', 'january', 'european', 'monetary', 'system', 'realignment', 'realignment', 'following', 'several', 'weeks', 'speculative', 'pressure', 'produced', 'three', 'pct', 'revaluation', 'west', 'german', 'mark', 'dutch', 'guilder', 'french', 'franc', 'two', 'pct', 'revaluation', 'belgian', 'franc', 'reuter'], 'off': [(0, 6), (7, 15), (16, 28), (29, 33), (34, 40), (41, 46), (47, 53), (54, 61), (62, 68), (69, 74), (75, 79), (80, 88), (89, 101), (102, 106), (107, 115), (116, 124), (125, 136), (137, 141), (142, 149), (150, 158), (159, 163), (164, 168), (169, 173), (174, 178), (179, 186), (187, 192), (193, 202), (203, 211), (212, 216), (217, 227), (228, 234), (235, 240), (241, 248), (249, 257), (258, 266), (267, 273), (274, 285), (286, 297), (298, 307), (308, 315), (316, 321), (322, 333), (334, 342), (343, 351), (352, 357), (358, 361), (362, 373), (374, 378), (379, 385), (386, 390), (391, 396), (397, 404), (405, 411), (412, 417), (418, 421), (422, 425), (426, 437), (438, 445), (446, 451), (452, 458)], 'sid': 0, 'ner': [(0, 1, 'GPE'), (4, 5, 'GPE'), (5, 6, 'DATE'), (7, 9, 'MONEY'), (14, 15, 'NORP'), (24, 26, 'MONEY'), (32, 33, 'DATE'), (39, 41, 'DATE'), (44, 45, 'CARDINAL')]}], 'doc_id': 0}

@Jeevi10
Copy link
Author

Jeevi10 commented Mar 9, 2020

Did you put a tokenizer in your tools? I just ran the POS and it works fine.

from elit.component import NERFlairTagger
from elit.component.tokenizer import EnglishTokenizer
from elit.structure import Document

tagger = NERFlairTagger()
tagger.load()
components = [EnglishTokenizer(), tagger]
docs = 'france currency intervention debt france today repaid billion francs short term currency intervention debt european monetary cooperation fund finance ministry said said debt part billion franc liability incurred swap facilities defend franc january european monetary system realignment realignment following several weeks speculative pressure produced three pct revaluation west german mark dutch guilder french franc two pct revaluation belgian franc reuter'
for c in components:
    docs = c.decode(docs)
for d in docs:  # type: Document
    print(d)

{'sens': [{'tok': ['france', 'currency', 'intervention', 'debt', 'france', 'today', 'repaid', 'billion', 'francs', 'short', 'term', 'currency', 'intervention', 'debt', 'european', 'monetary', 'cooperation', 'fund', 'finance', 'ministry', 'said', 'said', 'debt', 'part', 'billion', 'franc', 'liability', 'incurred', 'swap', 'facilities', 'defend', 'franc', 'january', 'european', 'monetary', 'system', 'realignment', 'realignment', 'following', 'several', 'weeks', 'speculative', 'pressure', 'produced', 'three', 'pct', 'revaluation', 'west', 'german', 'mark', 'dutch', 'guilder', 'french', 'franc', 'two', 'pct', 'revaluation', 'belgian', 'franc', 'reuter'], 'off': [(0, 6), (7, 15), (16, 28), (29, 33), (34, 40), (41, 46), (47, 53), (54, 61), (62, 68), (69, 74), (75, 79), (80, 88), (89, 101), (102, 106), (107, 115), (116, 124), (125, 136), (137, 141), (142, 149), (150, 158), (159, 163), (164, 168), (169, 173), (174, 178), (179, 186), (187, 192), (193, 202), (203, 211), (212, 216), (217, 227), (228, 234), (235, 240), (241, 248), (249, 257), (258, 266), (267, 273), (274, 285), (286, 297), (298, 307), (308, 315), (316, 321), (322, 333), (334, 342), (343, 351), (352, 357), (358, 361), (362, 373), (374, 378), (379, 385), (386, 390), (391, 396), (397, 404), (405, 411), (412, 417), (418, 421), (422, 425), (426, 437), (438, 445), (446, 451), (452, 458)], 'sid': 0, 'ner': [(0, 1, 'GPE'), (4, 5, 'GPE'), (5, 6, 'DATE'), (7, 9, 'MONEY'), (14, 15, 'NORP'), (24, 26, 'MONEY'), (32, 33, 'DATE'), (39, 41, 'DATE'), (44, 45, 'CARDINAL')]}], 'doc_id': 0}

yes I put it there
tools = [ tok,POS, sdp]
tok = SpaceTokenizer()

@Jeevi10
Copy link
Author

Jeevi10 commented Mar 9, 2020

Did you put a tokenizer in your tools? I just ran the POS and it works fine.

from elit.component import NERFlairTagger
from elit.component.tokenizer import EnglishTokenizer
from elit.structure import Document

tagger = NERFlairTagger()
tagger.load()
components = [EnglishTokenizer(), tagger]
docs = 'france currency intervention debt france today repaid billion francs short term currency intervention debt european monetary cooperation fund finance ministry said said debt part billion franc liability incurred swap facilities defend franc january european monetary system realignment realignment following several weeks speculative pressure produced three pct revaluation west german mark dutch guilder french franc two pct revaluation belgian franc reuter'
for c in components:
    docs = c.decode(docs)
for d in docs:  # type: Document
    print(d)

{'sens': [{'tok': ['france', 'currency', 'intervention', 'debt', 'france', 'today', 'repaid', 'billion', 'francs', 'short', 'term', 'currency', 'intervention', 'debt', 'european', 'monetary', 'cooperation', 'fund', 'finance', 'ministry', 'said', 'said', 'debt', 'part', 'billion', 'franc', 'liability', 'incurred', 'swap', 'facilities', 'defend', 'franc', 'january', 'european', 'monetary', 'system', 'realignment', 'realignment', 'following', 'several', 'weeks', 'speculative', 'pressure', 'produced', 'three', 'pct', 'revaluation', 'west', 'german', 'mark', 'dutch', 'guilder', 'french', 'franc', 'two', 'pct', 'revaluation', 'belgian', 'franc', 'reuter'], 'off': [(0, 6), (7, 15), (16, 28), (29, 33), (34, 40), (41, 46), (47, 53), (54, 61), (62, 68), (69, 74), (75, 79), (80, 88), (89, 101), (102, 106), (107, 115), (116, 124), (125, 136), (137, 141), (142, 149), (150, 158), (159, 163), (164, 168), (169, 173), (174, 178), (179, 186), (187, 192), (193, 202), (203, 211), (212, 216), (217, 227), (228, 234), (235, 240), (241, 248), (249, 257), (258, 266), (267, 273), (274, 285), (286, 297), (298, 307), (308, 315), (316, 321), (322, 333), (334, 342), (343, 351), (352, 357), (358, 361), (362, 373), (374, 378), (379, 385), (386, 390), (391, 396), (397, 404), (405, 411), (412, 417), (418, 421), (422, 425), (426, 437), (438, 445), (446, 451), (452, 458)], 'sid': 0, 'ner': [(0, 1, 'GPE'), (4, 5, 'GPE'), (5, 6, 'DATE'), (7, 9, 'MONEY'), (14, 15, 'NORP'), (24, 26, 'MONEY'), (32, 33, 'DATE'), (39, 41, 'DATE'), (44, 45, 'CARDINAL')]}], 'doc_id': 0}

yes I put it there
tools = [ tok,POS, sdp]
tok = SpaceTokenizer()

it worked now, for single sentence, I couldn't pass many instances, If I do it throws same error as I mentioned earlier.

@hankcs
Copy link
Contributor

hankcs commented Mar 9, 2020

Did you put a tokenizer in your tools? I just ran the POS and it works fine.

from elit.component import NERFlairTagger
from elit.component.tokenizer import EnglishTokenizer
from elit.structure import Document

tagger = NERFlairTagger()
tagger.load()
components = [EnglishTokenizer(), tagger]
docs = 'france currency intervention debt france today repaid billion francs short term currency intervention debt european monetary cooperation fund finance ministry said said debt part billion franc liability incurred swap facilities defend franc january european monetary system realignment realignment following several weeks speculative pressure produced three pct revaluation west german mark dutch guilder french franc two pct revaluation belgian franc reuter'
for c in components:
    docs = c.decode(docs)
for d in docs:  # type: Document
    print(d)

{'sens': [{'tok': ['france', 'currency', 'intervention', 'debt', 'france', 'today', 'repaid', 'billion', 'francs', 'short', 'term', 'currency', 'intervention', 'debt', 'european', 'monetary', 'cooperation', 'fund', 'finance', 'ministry', 'said', 'said', 'debt', 'part', 'billion', 'franc', 'liability', 'incurred', 'swap', 'facilities', 'defend', 'franc', 'january', 'european', 'monetary', 'system', 'realignment', 'realignment', 'following', 'several', 'weeks', 'speculative', 'pressure', 'produced', 'three', 'pct', 'revaluation', 'west', 'german', 'mark', 'dutch', 'guilder', 'french', 'franc', 'two', 'pct', 'revaluation', 'belgian', 'franc', 'reuter'], 'off': [(0, 6), (7, 15), (16, 28), (29, 33), (34, 40), (41, 46), (47, 53), (54, 61), (62, 68), (69, 74), (75, 79), (80, 88), (89, 101), (102, 106), (107, 115), (116, 124), (125, 136), (137, 141), (142, 149), (150, 158), (159, 163), (164, 168), (169, 173), (174, 178), (179, 186), (187, 192), (193, 202), (203, 211), (212, 216), (217, 227), (228, 234), (235, 240), (241, 248), (249, 257), (258, 266), (267, 273), (274, 285), (286, 297), (298, 307), (308, 315), (316, 321), (322, 333), (334, 342), (343, 351), (352, 357), (358, 361), (362, 373), (374, 378), (379, 385), (386, 390), (391, 396), (397, 404), (405, 411), (412, 417), (418, 421), (422, 425), (426, 437), (438, 445), (446, 451), (452, 458)], 'sid': 0, 'ner': [(0, 1, 'GPE'), (4, 5, 'GPE'), (5, 6, 'DATE'), (7, 9, 'MONEY'), (14, 15, 'NORP'), (24, 26, 'MONEY'), (32, 33, 'DATE'), (39, 41, 'DATE'), (44, 45, 'CARDINAL')]}], 'doc_id': 0}

yes I put it there
tools = [ tok,POS, sdp]
tok = SpaceTokenizer()

it worked now, for single sentence, I couldn't pass many instances, If I do it throws same error as I mentioned earlier.

docs = 'Sentence one. Sentence two.'

@Jeevi10
Copy link
Author

Jeevi10 commented Mar 10, 2020

tools = [tok,POS]
doc=shuffle_doc_words_list[0]
for tool in tools:
doc = tool.decode(doc)
print(doc)

above code works fine, but

for tool in tools:
doc = tool.decode(shuffle_doc_words_list[0])
print(doc)

above piece of code gives me an error,

AttributeError Traceback (most recent call last)
in
2 #doc=shuffle_doc_words_list[0]
3 for tool in tools:
----> 4 doc = tool.decode(shuffle_doc_words_list[0])
5 print(doc)

~/anaconda3/lib/python3.7/site-packages/elit/component/tagger/pos_tagger.py in decode(self, docs, **kwargs)
61 if isinstance(docs, Document):
62 docs = [docs]
---> 63 samples = NLPTaskDataFetcher.convert_elit_documents(docs)
64 with self.context:
65 sentences = self.tagger.predict(samples)

~/anaconda3/lib/python3.7/site-packages/elit/component/tagger/corpus.py in convert_elit_documents(docs)
1298 dataset = []
1299 for d in docs:
-> 1300 for s in d.sentences:
1301 sentence = Sentence()
1302

AttributeError: 'str' object has no attribute 'sentences'

I don't really understand the differece.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants