AttributeError #38

Jeevi10 · 2020-03-09T16:38:53Z

when I ran my own data set to decode it gives an error,
for tool in tools:
shuf = tool.decode(docs)

AttributeError Traceback (most recent call last)
in
1 for tool in tools:
----> 2 shuf = tool.decode(docs)

~/anaconda3/lib/python3.7/site-packages/elit/component/tagger/pos_tagger.py in decode(self, docs, **kwargs)
61 if isinstance(docs, Document):
62 docs = [docs]
---> 63 samples = NLPTaskDataFetcher.convert_elit_documents(docs)
64 with self.context:
65 sentences = self.tagger.predict(samples)

~/anaconda3/lib/python3.7/site-packages/elit/component/tagger/corpus.py in convert_elit_documents(docs)
1298 dataset = []
1299 for d in docs:
-> 1300 for s in d.sentences:
1301 sentence = Sentence()
1302

AttributeError: 'str' object has no attribute 'sentences

but it works fine for the example given is documentation!
Please help me to figure this out.

hankcs · 2020-03-09T16:42:45Z

Not sure what did you put in the docs, but it's supposed to be a str (can contain many sents then the tokenizer will split it into several sents).

Jeevi10 · 2020-03-09T16:45:31Z

Not sure what did you put in the docs, but it's supposed to be a str (can contain many sents then the tokenizer will split it into several sents).

this is one of the sample of my docs(R8 dataset)

'france currency intervention debt france today repaid billion francs short term currency intervention debt european monetary cooperation fund finance ministry said said debt part billion franc liability incurred swap facilities defend franc january european monetary system realignment realignment following several weeks speculative pressure produced three pct revaluation west german mark dutch guilder french franc two pct revaluation belgian franc reuter'

hankcs · 2020-03-09T16:50:20Z

Did you put a tokenizer in your tools? I just ran the POS and it works fine.

from elit.component import NERFlairTagger
from elit.component.tokenizer import EnglishTokenizer
from elit.structure import Document

tagger = NERFlairTagger()
tagger.load()
components = [EnglishTokenizer(), tagger]
docs = 'france currency intervention debt france today repaid billion francs short term currency intervention debt european monetary cooperation fund finance ministry said said debt part billion franc liability incurred swap facilities defend franc january european monetary system realignment realignment following several weeks speculative pressure produced three pct revaluation west german mark dutch guilder french franc two pct revaluation belgian franc reuter'
for c in components:
    docs = c.decode(docs)
for d in docs:  # type: Document
    print(d)

{'sens': [{'tok': ['france', 'currency', 'intervention', 'debt', 'france', 'today', 'repaid', 'billion', 'francs', 'short', 'term', 'currency', 'intervention', 'debt', 'european', 'monetary', 'cooperation', 'fund', 'finance', 'ministry', 'said', 'said', 'debt', 'part', 'billion', 'franc', 'liability', 'incurred', 'swap', 'facilities', 'defend', 'franc', 'january', 'european', 'monetary', 'system', 'realignment', 'realignment', 'following', 'several', 'weeks', 'speculative', 'pressure', 'produced', 'three', 'pct', 'revaluation', 'west', 'german', 'mark', 'dutch', 'guilder', 'french', 'franc', 'two', 'pct', 'revaluation', 'belgian', 'franc', 'reuter'], 'off': [(0, 6), (7, 15), (16, 28), (29, 33), (34, 40), (41, 46), (47, 53), (54, 61), (62, 68), (69, 74), (75, 79), (80, 88), (89, 101), (102, 106), (107, 115), (116, 124), (125, 136), (137, 141), (142, 149), (150, 158), (159, 163), (164, 168), (169, 173), (174, 178), (179, 186), (187, 192), (193, 202), (203, 211), (212, 216), (217, 227), (228, 234), (235, 240), (241, 248), (249, 257), (258, 266), (267, 273), (274, 285), (286, 297), (298, 307), (308, 315), (316, 321), (322, 333), (334, 342), (343, 351), (352, 357), (358, 361), (362, 373), (374, 378), (379, 385), (386, 390), (391, 396), (397, 404), (405, 411), (412, 417), (418, 421), (422, 425), (426, 437), (438, 445), (446, 451), (452, 458)], 'sid': 0, 'ner': [(0, 1, 'GPE'), (4, 5, 'GPE'), (5, 6, 'DATE'), (7, 9, 'MONEY'), (14, 15, 'NORP'), (24, 26, 'MONEY'), (32, 33, 'DATE'), (39, 41, 'DATE'), (44, 45, 'CARDINAL')]}], 'doc_id': 0}

Jeevi10 · 2020-03-09T16:52:06Z

Did you put a tokenizer in your tools? I just ran the POS and it works fine.

from elit.component import NERFlairTagger
from elit.component.tokenizer import EnglishTokenizer
from elit.structure import Document

tagger = NERFlairTagger()
tagger.load()
components = [EnglishTokenizer(), tagger]
docs = 'france currency intervention debt france today repaid billion francs short term currency intervention debt european monetary cooperation fund finance ministry said said debt part billion franc liability incurred swap facilities defend franc january european monetary system realignment realignment following several weeks speculative pressure produced three pct revaluation west german mark dutch guilder french franc two pct revaluation belgian franc reuter'
for c in components:
    docs = c.decode(docs)
for d in docs:  # type: Document
    print(d)

{'sens': [{'tok': ['france', 'currency', 'intervention', 'debt', 'france', 'today', 'repaid', 'billion', 'francs', 'short', 'term', 'currency', 'intervention', 'debt', 'european', 'monetary', 'cooperation', 'fund', 'finance', 'ministry', 'said', 'said', 'debt', 'part', 'billion', 'franc', 'liability', 'incurred', 'swap', 'facilities', 'defend', 'franc', 'january', 'european', 'monetary', 'system', 'realignment', 'realignment', 'following', 'several', 'weeks', 'speculative', 'pressure', 'produced', 'three', 'pct', 'revaluation', 'west', 'german', 'mark', 'dutch', 'guilder', 'french', 'franc', 'two', 'pct', 'revaluation', 'belgian', 'franc', 'reuter'], 'off': [(0, 6), (7, 15), (16, 28), (29, 33), (34, 40), (41, 46), (47, 53), (54, 61), (62, 68), (69, 74), (75, 79), (80, 88), (89, 101), (102, 106), (107, 115), (116, 124), (125, 136), (137, 141), (142, 149), (150, 158), (159, 163), (164, 168), (169, 173), (174, 178), (179, 186), (187, 192), (193, 202), (203, 211), (212, 216), (217, 227), (228, 234), (235, 240), (241, 248), (249, 257), (258, 266), (267, 273), (274, 285), (286, 297), (298, 307), (308, 315), (316, 321), (322, 333), (334, 342), (343, 351), (352, 357), (358, 361), (362, 373), (374, 378), (379, 385), (386, 390), (391, 396), (397, 404), (405, 411), (412, 417), (418, 421), (422, 425), (426, 437), (438, 445), (446, 451), (452, 458)], 'sid': 0, 'ner': [(0, 1, 'GPE'), (4, 5, 'GPE'), (5, 6, 'DATE'), (7, 9, 'MONEY'), (14, 15, 'NORP'), (24, 26, 'MONEY'), (32, 33, 'DATE'), (39, 41, 'DATE'), (44, 45, 'CARDINAL')]}], 'doc_id': 0}

yes I put it there
tools = [ tok,POS, sdp]
tok = SpaceTokenizer()

Jeevi10 · 2020-03-09T16:58:34Z

Did you put a tokenizer in your tools? I just ran the POS and it works fine.

from elit.component import NERFlairTagger
from elit.component.tokenizer import EnglishTokenizer
from elit.structure import Document

tagger = NERFlairTagger()
tagger.load()
components = [EnglishTokenizer(), tagger]
docs = 'france currency intervention debt france today repaid billion francs short term currency intervention debt european monetary cooperation fund finance ministry said said debt part billion franc liability incurred swap facilities defend franc january european monetary system realignment realignment following several weeks speculative pressure produced three pct revaluation west german mark dutch guilder french franc two pct revaluation belgian franc reuter'
for c in components:
    docs = c.decode(docs)
for d in docs:  # type: Document
    print(d)

{'sens': [{'tok': ['france', 'currency', 'intervention', 'debt', 'france', 'today', 'repaid', 'billion', 'francs', 'short', 'term', 'currency', 'intervention', 'debt', 'european', 'monetary', 'cooperation', 'fund', 'finance', 'ministry', 'said', 'said', 'debt', 'part', 'billion', 'franc', 'liability', 'incurred', 'swap', 'facilities', 'defend', 'franc', 'january', 'european', 'monetary', 'system', 'realignment', 'realignment', 'following', 'several', 'weeks', 'speculative', 'pressure', 'produced', 'three', 'pct', 'revaluation', 'west', 'german', 'mark', 'dutch', 'guilder', 'french', 'franc', 'two', 'pct', 'revaluation', 'belgian', 'franc', 'reuter'], 'off': [(0, 6), (7, 15), (16, 28), (29, 33), (34, 40), (41, 46), (47, 53), (54, 61), (62, 68), (69, 74), (75, 79), (80, 88), (89, 101), (102, 106), (107, 115), (116, 124), (125, 136), (137, 141), (142, 149), (150, 158), (159, 163), (164, 168), (169, 173), (174, 178), (179, 186), (187, 192), (193, 202), (203, 211), (212, 216), (217, 227), (228, 234), (235, 240), (241, 248), (249, 257), (258, 266), (267, 273), (274, 285), (286, 297), (298, 307), (308, 315), (316, 321), (322, 333), (334, 342), (343, 351), (352, 357), (358, 361), (362, 373), (374, 378), (379, 385), (386, 390), (391, 396), (397, 404), (405, 411), (412, 417), (418, 421), (422, 425), (426, 437), (438, 445), (446, 451), (452, 458)], 'sid': 0, 'ner': [(0, 1, 'GPE'), (4, 5, 'GPE'), (5, 6, 'DATE'), (7, 9, 'MONEY'), (14, 15, 'NORP'), (24, 26, 'MONEY'), (32, 33, 'DATE'), (39, 41, 'DATE'), (44, 45, 'CARDINAL')]}], 'doc_id': 0}

yes I put it there
tools = [ tok,POS, sdp]
tok = SpaceTokenizer()

it worked now, for single sentence, I couldn't pass many instances, If I do it throws same error as I mentioned earlier.

hankcs · 2020-03-09T22:21:50Z

Did you put a tokenizer in your tools? I just ran the POS and it works fine.

from elit.component import NERFlairTagger
from elit.component.tokenizer import EnglishTokenizer
from elit.structure import Document

tagger = NERFlairTagger()
tagger.load()
components = [EnglishTokenizer(), tagger]
docs = 'france currency intervention debt france today repaid billion francs short term currency intervention debt european monetary cooperation fund finance ministry said said debt part billion franc liability incurred swap facilities defend franc january european monetary system realignment realignment following several weeks speculative pressure produced three pct revaluation west german mark dutch guilder french franc two pct revaluation belgian franc reuter'
for c in components:
    docs = c.decode(docs)
for d in docs:  # type: Document
    print(d)

{'sens': [{'tok': ['france', 'currency', 'intervention', 'debt', 'france', 'today', 'repaid', 'billion', 'francs', 'short', 'term', 'currency', 'intervention', 'debt', 'european', 'monetary', 'cooperation', 'fund', 'finance', 'ministry', 'said', 'said', 'debt', 'part', 'billion', 'franc', 'liability', 'incurred', 'swap', 'facilities', 'defend', 'franc', 'january', 'european', 'monetary', 'system', 'realignment', 'realignment', 'following', 'several', 'weeks', 'speculative', 'pressure', 'produced', 'three', 'pct', 'revaluation', 'west', 'german', 'mark', 'dutch', 'guilder', 'french', 'franc', 'two', 'pct', 'revaluation', 'belgian', 'franc', 'reuter'], 'off': [(0, 6), (7, 15), (16, 28), (29, 33), (34, 40), (41, 46), (47, 53), (54, 61), (62, 68), (69, 74), (75, 79), (80, 88), (89, 101), (102, 106), (107, 115), (116, 124), (125, 136), (137, 141), (142, 149), (150, 158), (159, 163), (164, 168), (169, 173), (174, 178), (179, 186), (187, 192), (193, 202), (203, 211), (212, 216), (217, 227), (228, 234), (235, 240), (241, 248), (249, 257), (258, 266), (267, 273), (274, 285), (286, 297), (298, 307), (308, 315), (316, 321), (322, 333), (334, 342), (343, 351), (352, 357), (358, 361), (362, 373), (374, 378), (379, 385), (386, 390), (391, 396), (397, 404), (405, 411), (412, 417), (418, 421), (422, 425), (426, 437), (438, 445), (446, 451), (452, 458)], 'sid': 0, 'ner': [(0, 1, 'GPE'), (4, 5, 'GPE'), (5, 6, 'DATE'), (7, 9, 'MONEY'), (14, 15, 'NORP'), (24, 26, 'MONEY'), (32, 33, 'DATE'), (39, 41, 'DATE'), (44, 45, 'CARDINAL')]}], 'doc_id': 0}

yes I put it there
tools = [ tok,POS, sdp]
tok = SpaceTokenizer()

it worked now, for single sentence, I couldn't pass many instances, If I do it throws same error as I mentioned earlier.

docs = 'Sentence one. Sentence two.'

Jeevi10 · 2020-03-10T22:56:46Z

tools = [tok,POS]
doc=shuffle_doc_words_list[0]
for tool in tools:
doc = tool.decode(doc)
print(doc)

above code works fine, but

for tool in tools:
doc = tool.decode(shuffle_doc_words_list[0])
print(doc)

above piece of code gives me an error,

AttributeError Traceback (most recent call last)
in
2 #doc=shuffle_doc_words_list[0]
3 for tool in tools:
----> 4 doc = tool.decode(shuffle_doc_words_list[0])
5 print(doc)

~/anaconda3/lib/python3.7/site-packages/elit/component/tagger/pos_tagger.py in decode(self, docs, **kwargs)
61 if isinstance(docs, Document):
62 docs = [docs]
---> 63 samples = NLPTaskDataFetcher.convert_elit_documents(docs)
64 with self.context:
65 sentences = self.tagger.predict(samples)

~/anaconda3/lib/python3.7/site-packages/elit/component/tagger/corpus.py in convert_elit_documents(docs)
1298 dataset = []
1299 for d in docs:
-> 1300 for s in d.sentences:
1301 sentence = Sentence()
1302

AttributeError: 'str' object has no attribute 'sentences'

I don't really understand the differece.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AttributeError #38

AttributeError #38

Jeevi10 commented Mar 9, 2020 •

edited

Loading

hankcs commented Mar 9, 2020

Jeevi10 commented Mar 9, 2020 •

edited

Loading

hankcs commented Mar 9, 2020

Jeevi10 commented Mar 9, 2020 •

edited

Loading

Jeevi10 commented Mar 9, 2020 •

edited

Loading

hankcs commented Mar 9, 2020

Jeevi10 commented Mar 10, 2020 •

edited

Loading

AttributeError #38

AttributeError #38

Comments

Jeevi10 commented Mar 9, 2020 • edited Loading

hankcs commented Mar 9, 2020

Jeevi10 commented Mar 9, 2020 • edited Loading

hankcs commented Mar 9, 2020

Jeevi10 commented Mar 9, 2020 • edited Loading

Jeevi10 commented Mar 9, 2020 • edited Loading

hankcs commented Mar 9, 2020

Jeevi10 commented Mar 10, 2020 • edited Loading

Jeevi10 commented Mar 9, 2020 •

edited

Loading

Jeevi10 commented Mar 9, 2020 •

edited

Loading

Jeevi10 commented Mar 9, 2020 •

edited

Loading

Jeevi10 commented Mar 9, 2020 •

edited

Loading

Jeevi10 commented Mar 10, 2020 •

edited

Loading