You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I originally went with a sequential PubMed load because that would load publications in the order of update, allowing e.g. a later retitled publication to overwrite a previously titled one. However, loading all the 1,628 PubMed files sequentially takes a long time!
Some better ideas:
Convert all files into individual DuckDB databases, then run one query over all of them to ascertain the most recent title/publication status for each one.
Convert all files into e.g. TSV files in parallel, then run through the smaller TSV file sequentially to figure out most recent title/publication status.
???
The text was updated successfully, but these errors were encountered:
I originally went with a sequential PubMed load because that would load publications in the order of update, allowing e.g. a later retitled publication to overwrite a previously titled one. However, loading all the 1,628 PubMed files sequentially takes a long time!
Some better ideas:
The text was updated successfully, but these errors were encountered: