Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dropping nearly all documents dfm2stm #291

Open
michellerh330 opened this issue Sep 4, 2024 · 0 comments
Open

dropping nearly all documents dfm2stm #291

michellerh330 opened this issue Sep 4, 2024 · 0 comments

Comments

@michellerh330
Copy link

Hello, for several subsets of my data, my code runs fine, but for one subset (looking at newspapers from the UK), I keep getting this error:

Warning: There were 10 warnings in mutate().
The first warning was:
ℹ In argument: topic_model = future_map(K, ~stm(dfm_uk, K = ., verbose = FALSE, seed = 2461)).
Caused by warning in dfm2stm():
! Dropped 21,185 empty document(s)
ℹ Run dplyr::last_dplyr_warnings() to see the 9 remaining warnings.

Here is my code:

conflicts_prefer(stopwords::stopwords)
toks <-
tokens(text_corp2, remove_numbers = TRUE, remove_punct = TRUE, remove_symbols = TRUE) %>%
tokens_wordstem %>%
tokens_remove(c(stopwords("en", source = "stopwords-iso")))

dfm_uk <- dfm(toks) %>%
dfm_trim(min_docfreq = 0.01, docfreq_type = "prop")

Error occurs here:

many_models <- data_frame(K = c(20, 30, 40, 50, 60)) %>%
mutate(topic_model = future_map(K, ~stm(dfm_uk, K = .,
verbose = FALSE, seed=2461)))

dplyr::last_dplyr_warnings()
[[1]]
<warning/rlang_warning>
Warning in mutate():
ℹ In argument: topic_model = future_map(K, ~stm(dfm_uk, K = ., verbose = FALSE, seed = 2461)).
Caused by warning in dfm2stm():
! Dropped 21,185 empty document(s)


Backtrace:

  1. ├─data_frame(K = c(20, 30, 40, 50, 60)) %>% ...
  2. ├─dplyr::mutate(...)
  3. └─dplyr:::mutate.data.frame(...)

[[2]]
<warning/rlang_warning>
Warning in mutate():
ℹ In argument: topic_model = future_map(K, ~stm(dfm_uk, K = ., verbose = FALSE, seed = 2461)).
Caused by warning:
! UNRELIABLE VALUE: Future (‘’) unexpectedly generated random numbers without specifying argument 'seed'. There is a risk that those random numbers are not statistically sound and the overall results might be invalid. To fix this, specify 'seed=TRUE'. This ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-CMRG method. To disable this check, use 'seed=NULL', or set option 'future.rng.onMisuse' to "ignore".

Backtrace:

  1. ├─data_frame(K = c(20, 30, 40, 50, 60)) %>% ...
  2. ├─dplyr::mutate(...)
  3. └─dplyr:::mutate.data.frame(...)

[[3]]
<warning/rlang_warning>
Warning in mutate():
ℹ In argument: topic_model = future_map(K, ~stm(dfm_uk, K = ., verbose = FALSE, seed = 2461)).
Caused by warning in dfm2stm():
! Dropped 21,185 empty document(s)

Backtrace:

  1. ├─data_frame(K = c(20, 30, 40, 50, 60)) %>% ...
  2. ├─dplyr::mutate(...)
  3. └─dplyr:::mutate.data.frame(...)

[[4]]
<warning/rlang_warning>
Warning in mutate():
ℹ In argument: topic_model = future_map(K, ~stm(dfm_uk, K = ., verbose = FALSE, seed = 2461)).
Caused by warning:
! UNRELIABLE VALUE: Future (‘’) unexpectedly generated random numbers without specifying argument 'seed'. There is a risk that those random numbers are not statistically sound and the overall results might be invalid. To fix this, specify 'seed=TRUE'. This ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-CMRG method. To disable this check, use 'seed=NULL', or set option 'future.rng.onMisuse' to "ignore".

Backtrace:

  1. ├─data_frame(K = c(20, 30, 40, 50, 60)) %>% ...
  2. ├─dplyr::mutate(...)
  3. └─dplyr:::mutate.data.frame(...)

[[5]]
<warning/rlang_warning>
Warning in mutate():
ℹ In argument: topic_model = future_map(K, ~stm(dfm_uk, K = ., verbose = FALSE, seed = 2461)).
Caused by warning in dfm2stm():
! Dropped 21,185 empty document(s)

Backtrace:

  1. ├─data_frame(K = c(20, 30, 40, 50, 60)) %>% ...
  2. ├─dplyr::mutate(...)
  3. └─dplyr:::mutate.data.frame(...)

... with 5 more warnings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant