Textblob not finding the downloaded corpora #474

cagan-elden · 2024-09-26T18:20:02Z

python -m textblob.download_corpora

Although I download the corpora as said in the error message it still does not work.
I ain't sure is it because of the NLTK library or not because I've installed that too.

doctorsketch · 2024-09-29T19:48:10Z

I found upgrading from NLTK 3.8.1 to 3.9.1 broke my project. I now get errors asking me to:

python -m textblob.download_corpora

Previously you could download textblob corpora on one account and it could be found by another account. This is no longer the case.

Moving back to NLTK 3.8.1 fixed it. I can reproduce the issue by upgrading to 3.9.1 again.

Ajaychaki2004 · 2024-11-15T13:30:18Z

The problem is due the version moving back to the NLTK 3.8.1 can help to rectify the error

doctorsketch · 2024-11-27T11:52:37Z

To follow up on this, I fixed it by specifying the NLTK data path and telling NLTK where to look like this:

def download_nltk_resources(self):
    """
    Downloads required NLTK resources if not already present.
    """
    import nltk
    import os
    
    # Use the environment variable or fall back to default
    nltk_data_path = os.getenv('NLTK_DATA', '/usr/local/share/nltk_data')
    
    # Ensure the directory exists
    os.makedirs(nltk_data_path, exist_ok=True)
    
    # Add our path to NLTK's data path
    nltk.data.path.insert(0, nltk_data_path)
    
    print(f"Using NLTK data path: {nltk_data_path}")
    
    required_resources = {
        'averaged_perceptron_tagger': ('taggers', 'averaged_perceptron_tagger'),
        'averaged_perceptron_tagger_eng': ('taggers', 'averaged_perceptron_tagger_eng'),
        'punkt': ('tokenizers', 'punkt'),
        'punkt_tab': ('tokenizers/punkt_tab', 'english'),
        'movie_reviews': ('corpora', 'movie_reviews'),
        'brown': ('corpora', 'brown'),
        'conll2000': ('corpora', 'conll2000'),
        'wordnet': ('corpora', 'wordnet')
    }
    
    # Download and verify all resources
    for resource, (folder, name) in required_resources.items():
        try:
            nltk.data.find(f'{folder}/{name}')
        except LookupError:
            print(f"Downloading {resource}...")
            nltk.download(resource, download_dir=nltk_data_path, quiet=True)

with NLTK_DATA specified as an environment variable.

Then do something like this:

try:
    # Download resources only once at the start
    if not hasattr(TextParser, '_resources_checked'):
        self.download_nltk_resources()
        TextParser._resources_checked = True

idlip mentioned this issue Nov 24, 2024

pythonPackages.textblob: remove with lib NixOS/nixpkgs#358687

Open

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Textblob not finding the downloaded corpora #474

Textblob not finding the downloaded corpora #474

cagan-elden commented Sep 26, 2024

doctorsketch commented Sep 29, 2024

Ajaychaki2004 commented Nov 15, 2024

doctorsketch commented Nov 27, 2024

Textblob not finding the downloaded corpora #474

Textblob not finding the downloaded corpora #474

Comments

cagan-elden commented Sep 26, 2024

doctorsketch commented Sep 29, 2024

Ajaychaki2004 commented Nov 15, 2024

doctorsketch commented Nov 27, 2024