Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bestrecipes.com.au scraper broken #1354

Open
2 tasks done
Surfoo opened this issue Nov 3, 2024 · 6 comments
Open
2 tasks done

bestrecipes.com.au scraper broken #1354

Surfoo opened this issue Nov 3, 2024 · 6 comments
Labels
bots-protection А form of bot protection is preventing the fetching of the recipe's HTML bug

Comments

@Surfoo
Copy link

Surfoo commented Nov 3, 2024

Pre-filing checks

  • I have searched for open issues that report the same problem
  • I have checked that the bug affects the latest version of the library

The URL of the recipe(s) that are not being scraped correctly

The results you expect to see
I don't know.

The results (including any Python error messages) that you are seeing
I didn't run the scraper, I have an issue:

$ python -m pipx install recipe-scrapers --include-deps                                                                                                                                                                
'recipe-scrapers' already seems to be installed. Not modifying existing installation in '/home/johndoe/.local/pipx/venvs/recipe-scrapers'. Pass '--force' to force installation.

$ python                                                                                                                                                                                                               
Python 3.12.6 (main, Sep  8 2024, 13:18:56) [GCC 14.2.1 20240805] on linux
Type "help", "copyright", "credits" or "license" for more information.

>>> from recipe_scrapers import scrape_html
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'recipe_scrapers'
>>> 

@Surfoo Surfoo added the bug label Nov 3, 2024
@jknndy
Copy link
Collaborator

jknndy commented Nov 3, 2024

Hi @Surfoo, it looks like the issue you’re experiencing is related to importing the recipe_scrapers library rather than the specific URLs. The ModuleNotFoundError: No module named 'recipe_scrapers' error suggests that Python wasn’t able to locate recipe_scrapers at all before attempting to access bestrecipes.

Could you share any additional output or error messages, if available, that might clarify the environment setup? It may also help to check if recipe_scrapers is installed in the same environment where you’re running the script.

@Surfoo
Copy link
Author

Surfoo commented Nov 3, 2024

I tried to help by following the Getting Started part in the readme. Which command would you like me to execute? I don't know Python.

I had the bug with the mealie app, here the log, but mealie use recipe_scrapers in backend.

mealie            | INFO     2024-11-03T19:03:47 - HTTP Request: GET https://www.bestrecipes.com.au/recipes/mini-marsala-fruit-cakes-recipe/kwlyzyae "HTTP/1.1 403 Forbidden"

@jknndy
Copy link
Collaborator

jknndy commented Nov 7, 2024

Sorry for the delay here, I am traveling for work so free time is rare! The 403 error makes me believe this could be related to the way mealie is attempting to access the site.

@jayaddison could you weigh in here? I believe this is similar to the other bots-protection issue opened recently.

@jayaddison
Copy link
Collaborator

Initially: yes, it seems likely that this could be some form of bot protection (network request filtering). I'll try to confirm that soon. @Surfoo did you manage to find a way to get that import to work? We don't generally suggest using pip here, but pipx should work equally well I'd expect.

@jayaddison jayaddison added the bots-protection А form of bot protection is preventing the fetching of the recipe's HTML label Nov 7, 2024
@Surfoo
Copy link
Author

Surfoo commented Nov 8, 2024

Hello,
No I haven't tried it since the last time

@jayaddison
Copy link
Collaborator

I can confirm that I'm able to scrape the first recipe (the Satay Chicken one) from HTML successfully, so this does indeed seem to be some kind of network-request-filtering problem (aka bots-protection).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bots-protection А form of bot protection is preventing the fetching of the recipe's HTML bug
Projects
None yet
Development

No branches or pull requests

3 participants