running time of get_contexts #36

mikabr · 2018-09-29T01:04:13Z

Email from package user:

It's just that when I run get_contexts for a word like "tapa" in Spanish, it runs really quickly. But when I run it for words like "cap" in English, it can take many hours. I've tried running it on a cluster but it doesn't work for some reason (princeton has been trying to help me with that but they seem stumped).

To be clear- it definitely works! I just leave it running overnight but it's hard to run multiple this way. For tapa, it's only 348 children in 18 corpora but for "cap" and it's 522 children in 49 corpora so maybe that's it? But the time difference is pretty disproportionate to the child/corpus difference

Anyway, any thoughts would be appreciated if you have the time!

amsan7 · 2018-10-02T20:14:42Z

hmm, get_contexts(collection="Spanish",token="tapa") still isn't super fast for me (about 10-15 min?). the two calls inside to get_tokens and get_utterances are pretty fast when ran individually, so i'm guessing the issue is happening at https://github.com/langcog/childesr/blob/master/R/childesr.R#L656, which i'm not totally sure what it's doing. i also notice there are two dplyr::collect() calls? maybe that isn't necessary?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

running time of get_contexts #36

running time of get_contexts #36

mikabr commented Sep 29, 2018

amsan7 commented Oct 2, 2018

running time of get_contexts #36

running time of get_contexts #36

Comments

mikabr commented Sep 29, 2018

amsan7 commented Oct 2, 2018