Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Value Error when tuning MaximalMarginalRelevance #2266

Open
1 task done
rhys-thompson-deel opened this issue Jan 16, 2025 · 0 comments
Open
1 task done

Value Error when tuning MaximalMarginalRelevance #2266

rhys-thompson-deel opened this issue Jan 16, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@rhys-thompson-deel
Copy link

rhys-thompson-deel commented Jan 16, 2025

Have you searched existing issues? 🔎

  • I have searched and found no existing issues

Desribe the bug

I am trying do some representation model hyperparameter tuning on a BERTopic model by altering the diversity parameter in MaximalMarginalRelevance.

I am setting top_n_words in MMR as the same as the topic model. However, I keep encountering

ValueError: Length of weights not compatible with specified axis.

in certain trials when running update_topics.

It is only happening due to the MMR part (with removal the code works fine), and happens randomly for certain trials between runs. Do you know why this might be happening?

I am using v0.16.4.

Reproduction

from bertopic import BERTopic
import optuna
import copy
def _execute_representation_tuning(
        self, topic_model: BERTopic, docs: List[str]
    ) -> Callable:
        """
        Execute BERTopic topic representation tuning using Optuna.

        Args:
            topic_model (BERTopic): Fit topic model to optimize
            docs (List[str]): Documents from which to extract topics

        Returns:
            (Callable) Objective function execution for trial.
        """
        def _inner_objective(trial: optuna.trial.Trial) -> float:
            """
            Objective function for Optuna.
            """

            topic_model_copy = copy.deepcopy(topic_model)
            top_n_words = topic_model_copy.top_n_words

            ctfidf_model = ClassTfidfTransformer(
                reduce_frequent_words=trial.suggest_categorical(
                    "reduce_frequent_words",
                    [True, False],
                ),
                bm25_weighting=trial.suggest_categorical(
                    "bm25_weighting", [True, False]
                ),
            )

            mmr = MaximalMarginalRelevance(
                diversity=trial.suggest_float(
                    "diversity",
                   0.1,
                   0.9,
                ),
                top_n_words=top_n_words,
            )

            topic_model_copy.update_topics(
                docs=docs,
                top_n_words=top_n_words,
                ctfidf_model=ctfidf_model,
                representation_model=mmr,
            )

            score = ...scorer function...

            return score

        return _inner_objective

BERTopic Version

0.16.4

@rhys-thompson-deel rhys-thompson-deel added the bug Something isn't working label Jan 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant