Save Model after every X Number of Iterations #2090
-
Hello! Hoping anyone can help me out. I am currently working on what I would consider a very large topic model (over 1 million large documents) and it currently takes many hours on my university's HPC clusters to run. However, if something fails in the middle (or I hit walltime) I would like to be able to pick up training where I left off. So, is there a way to save the model every X number of iterations? I have looked through discussions and documentation but I can't seem to find a good way to make this happen. I am also happy to try and make this change myself if need be. Thanks in advance! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
If it's a bit over 1 million documents, you can use cuML instead and it should all run within the hour I believe. Have you checked the guide on GPU acceleration? It shows how to run BERTopic quite fast using I believe 1 million documents. With respect to your question, there isn't something like "iterations" in the underlying algorithm of BERTopic since it highly depends on the underlying dimensionality reduction and clustering algorithms. |
Beta Was this translation helpful? Give feedback.
If it's a bit over 1 million documents, you can use cuML instead and it should all run within the hour I believe. Have you checked the guide on GPU acceleration? It shows how to run BERTopic quite fast using I believe 1 million documents.
With respect to your question, there isn't something like "iterations" in the underlying algorithm of BERTopic since it highly depends on the underlying dimensionality reduction and clustering algorithms.