Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

running time of get_contexts #36

Open
mikabr opened this issue Sep 29, 2018 · 1 comment
Open

running time of get_contexts #36

mikabr opened this issue Sep 29, 2018 · 1 comment

Comments

@mikabr
Copy link
Member

mikabr commented Sep 29, 2018

Email from package user:

It's just that when I run get_contexts for a word like "tapa" in Spanish, it runs really quickly. But when I run it for words like "cap" in English, it can take many hours. I've tried running it on a cluster but it doesn't work for some reason (princeton has been trying to help me with that but they seem stumped).

To be clear- it definitely works! I just leave it running overnight but it's hard to run multiple this way. For tapa, it's only 348 children in 18 corpora but for "cap" and it's 522 children in 49 corpora so maybe that's it? But the time difference is pretty disproportionate to the child/corpus difference

Anyway, any thoughts would be appreciated if you have the time!

@amsan7
Copy link
Contributor

amsan7 commented Oct 2, 2018

hmm, get_contexts(collection="Spanish",token="tapa") still isn't super fast for me (about 10-15 min?). the two calls inside to get_tokens and get_utterances are pretty fast when ran individually, so i'm guessing the issue is happening at https://github.com/langcog/childesr/blob/master/R/childesr.R#L656, which i'm not totally sure what it's doing. i also notice there are two dplyr::collect() calls? maybe that isn't necessary?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants