Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should raise an Exception when tokenizer is not defined #1977

Open
timo-obrecht opened this issue May 7, 2024 · 2 comments
Open

Should raise an Exception when tokenizer is not defined #1977

timo-obrecht opened this issue May 7, 2024 · 2 comments

Comments

@timo-obrecht
Copy link

In bertopic/representation/_utils.py, line 57, tokenizer is possibly None. In this case, an exception asking the user to explicitly set tokenizer should be raised.

    if doc_length is not None:
        if tokenizer == "char":
            truncated_document = document[:doc_length]
        elif tokenizer == "whitespace":
            truncated_document = " ".join(document.split()[:doc_length])
        elif tokenizer == "vectorizer":
            tokenizer = topic_model.vectorizer_model.build_tokenizer()
            truncated_document = " ".join(tokenizer(document)[:doc_length])
        elif hasattr(tokenizer, 'encode') and hasattr(tokenizer, 'decode'):
            encoded_document = tokenizer.encode(document)
            truncated_document = tokenizer.decode(encoded_document[:doc_length])
        return truncated_document
    return document
@MaartenGr
Copy link
Owner

Thanks for this! It would indeed be much nicer to raise an expectation and explain what the user should do. If you want, a PR would be much appreciated.

@SSivakumar12 SSivakumar12 mentioned this issue Oct 14, 2024
5 tasks
@SSivakumar12
Copy link
Contributor

Hi, I believe I have a PR for this if that is okay: #2181

MaartenGr pushed a commit that referenced this issue Dec 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants