No index phone_doc-0520-3 found in Elastic Search #27

mayankagrawal93 · 2021-05-21T07:40:19Z

I have been trying to run the code locally. All installation steps including Spark are completed successfully. When I run shell_chat, the bot is replying and I am able to chat with it. But there is an error which gets printed as 'No index phone_doc-0520-3 was found in Elastic Search'.
I tried searching in Spark codes I ran where there is no mention of 'phone_doc-0520-3' in 'upload.py' getting created.
What should I do to resolve this? Or do I actually need to resolve it because the bot is already up and running?
Its searching for phone_doc-0520-3 in file chirpy/core/asr/search_phone_to_ent.py .

Separately, (instead of raising another issue, I am asking here itself), the bot replies are okay but not exactly the same as in live demo. The bot does not seem to understand some utterances which it does in the live demo. Am I missing something here?
All docker images have been pulled and containers started. The only thing I have not setup is the twitter opinion database in Postgres (for which it is showing an error in terminal).
Are these two errors (1. No index phone_doc-0520-3 found , 2. No Postgres) responsible for my reduction in accuracy in bot?

Thanks in advance for you reply!

AshwinParanjape · 2021-05-22T15:25:57Z

It is a phoneme to entity index that's primarily useful for correcting ASR errors. Since you aren't going to be running the bot with voice, you can ignore the error. But if you want the following python script can create the index for you:
https://github.com/stanfordnlp/chirpycardinal/blob/main/chirpy/core/asr/index_phone_to_ent.py

Neither of the two errors sound serious enough for the bot to not be working as expected. What kind of replies are you getting? If you can post a couple of examples, I can probably take a guess.

mayankagrawal93 · 2021-05-23T07:48:06Z

Thanks for your quick reply!
I see! Then I will just leave out the voice part.

The bot is not able to understand entity names is what I am guessing. These are few examples here from both live demo and my installation

Asking about musicians
i. Live demo

ii. My installation
Asking about famous persons
i. Live demo

ii. My installation

Bot doesnt seem to understand famous person, musician etc and also sometimes the context like 'Can we talk about music' which it does in live demo.

AshwinParanjape · 2021-05-23T16:06:01Z

There seems to be something wrong with entity linking.

Can you confirm if the elasticsearch instances are running fine?
How many docs are in the indices? (This is to confirm that the uploading and indexing has happened correctly)
Can you add a logtofile_path here:

chirpycardinal/servers/local/shell_chat.py

Line 55 in 6359578

logtofile_level=LOGTOFILE_LEVEL, logtofile_path='',

Or equivalently, change the logtoscreen_level so that it shows more info? You should be able to see if any entities are detected at all.

Since we already know phone_to_doc is throwing an error, that is a suspect and might be interfering with entity linking, so maybe just try indexing using https://github.com/stanfordnlp/chirpycardinal/blob/main/chirpy/core/asr/index_phone_to_ent.py to see if the error goes away.

mayankagrawal93 · 2021-05-24T07:22:42Z

Tried this,

Elasticsearch Instance seems to be running fine.
When I query to count the documents,

Does this count seems okay or is it less?
Files I took as input to preprocess.py were
i. In https://dumps.wikimedia.org/wikidatawiki/entities/ , latest-all.json.bz2 => 1 file in total
ii. In https://dumps.wikimedia.org/enwiki/20210220/, enwiki-20210220-pages-articles-multistream.xml.bz2 18.0 GB and enwiki-20210220-pages-articles-multistream-index.txt.bz2 219.6 MB => 2 files in total
iii. In https://dumps.wikimedia.org/other/pagecounts-ez/merged/, pagecounts-2020-08-views-ge-5-totals.bz2 => 1 file in total
This is the output of log file. I have pasted logs for only that particular utterances where issue is coming. I had to manaully add a line to print output results of elastic search (results['hits']['hits']) in code. To directly go to that part in logs, search for results of elastic search are
logs.txt

The output seems to be coming but maybe it is missing some entities. Please let me know was the input of preprocess.py correct?

Meanwhile I will index phone_to_doc and let you know if that improves the accuracy

mayankagrawal93 · 2021-05-26T10:50:51Z

To create indexing for phone_to_doc, its trying to find WIKI_ENTITIES = "/u/scr/nlp/data/Wikipedia/enwiki-20200520-pages-articles-multistream-spans.json.bz2" , but I dont see any spans file in spark dump of wikidata.
These are the files in the output :

Please let me know which one to give as input in WIKI_ENTITIES

AshwinParanjape · 2021-06-21T18:52:27Z

Sorry, for the super late reply. I was not available in the meanwhile.

Here it seems that you only have 700k articles in the elasticsearch index. Which means not everything got uploaded. I think that's the problem. There is no "Linkin Park" wikipedia page to link to.

@anumpamme helped fix an error with the indexing and I just merged it in this commit - ccff9b9

Can you do this?
1 - Pull the latest version
2 - Rerun wiki-es-dump/upload.py with the appropriate args
3 - Get the counts (in particular articles)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No index phone_doc-0520-3 found in Elastic Search #27

No index phone_doc-0520-3 found in Elastic Search #27

mayankagrawal93 commented May 21, 2021 •

edited

Loading

AshwinParanjape commented May 22, 2021 •

edited

Loading

mayankagrawal93 commented May 23, 2021

AshwinParanjape commented May 23, 2021

mayankagrawal93 commented May 24, 2021

mayankagrawal93 commented May 26, 2021

AshwinParanjape commented Jun 21, 2021

No index phone_doc-0520-3 found in Elastic Search #27

No index phone_doc-0520-3 found in Elastic Search #27

Comments

mayankagrawal93 commented May 21, 2021 • edited Loading

AshwinParanjape commented May 22, 2021 • edited Loading

mayankagrawal93 commented May 23, 2021

AshwinParanjape commented May 23, 2021

mayankagrawal93 commented May 24, 2021

mayankagrawal93 commented May 26, 2021

AshwinParanjape commented Jun 21, 2021

mayankagrawal93 commented May 21, 2021 •

edited

Loading

AshwinParanjape commented May 22, 2021 •

edited

Loading