Value Error when tuning MaximalMarginalRelevance #2266

rhys-thompson-deel · 2025-01-16T09:20:10Z

Have you searched existing issues? 🔎

I have searched and found no existing issues

Desribe the bug

I am trying do some representation model hyperparameter tuning on a BERTopic model by altering the diversity parameter in MaximalMarginalRelevance.

I am setting top_n_words in MMR as the same as the topic model. However, I keep encountering

ValueError: Length of weights not compatible with specified axis.

in certain trials when running update_topics.

It is only happening due to the MMR part (with removal the code works fine), and happens randomly for certain trials between runs. Do you know why this might be happening?

I am using v0.16.4.

Reproduction

from bertopic import BERTopic
import optuna
import copy
def _execute_representation_tuning(
        self, topic_model: BERTopic, docs: List[str]
    ) -> Callable:
        """
        Execute BERTopic topic representation tuning using Optuna.

        Args:
            topic_model (BERTopic): Fit topic model to optimize
            docs (List[str]): Documents from which to extract topics

        Returns:
            (Callable) Objective function execution for trial.
        """
        def _inner_objective(trial: optuna.trial.Trial) -> float:
            """
            Objective function for Optuna.
            """

            topic_model_copy = copy.deepcopy(topic_model)
            top_n_words = topic_model_copy.top_n_words

            ctfidf_model = ClassTfidfTransformer(
                reduce_frequent_words=trial.suggest_categorical(
                    "reduce_frequent_words",
                    [True, False],
                ),
                bm25_weighting=trial.suggest_categorical(
                    "bm25_weighting", [True, False]
                ),
            )

            mmr = MaximalMarginalRelevance(
                diversity=trial.suggest_float(
                    "diversity",
                   0.1,
                   0.9,
                ),
                top_n_words=top_n_words,
            )

            topic_model_copy.update_topics(
                docs=docs,
                top_n_words=top_n_words,
                ctfidf_model=ctfidf_model,
                representation_model=mmr,
            )

            score = ...scorer function...

            return score

        return _inner_objective

BERTopic Version

0.16.4

The text was updated successfully, but these errors were encountered:

rhys-thompson-deel added the bug Something isn't working label Jan 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Value Error when tuning MaximalMarginalRelevance #2266

Value Error when tuning MaximalMarginalRelevance #2266

rhys-thompson-deel commented Jan 16, 2025 •

edited

Loading

Value Error when tuning MaximalMarginalRelevance #2266

Value Error when tuning MaximalMarginalRelevance #2266

Comments

rhys-thompson-deel commented Jan 16, 2025 • edited Loading

Have you searched existing issues? 🔎

Desribe the bug

Reproduction

BERTopic Version

rhys-thompson-deel commented Jan 16, 2025 •

edited

Loading