Kitchenstories scrapper not detected #1261

hhopke · 2024-09-23T20:42:29Z

Pre-filing checks

I have searched for open issues that report the same problem
I have checked that the bug affects the latest version of the library

The URL of the recipe(s) that are not being scraped correctly

"https://www.kitchenstories.com/de/rezepte/susskartoffel-curry"

The results you expect to see

Scrapped recipe

The results (including any Python error messages) that you are seeing

url = "https://www.kitchenstories.com/de/rezepte/susskartoffel-curry" name = input('What is your name, risotto sampler?\n') html = requests.get(url, headers={"User-Agent": f"Risotto Sampler {name}"}).content scraper = scrape_html(html, org_url=url, wild_mode=False) scraper.host() scraper.title() scraper.total_time() scraper.image() scraper.ingredients() scraper.ingredient_groups() scraper.instructions() scraper.instructions_list() scraper.yields() scraper.to_json() scraper.links() scraper.nutrients() # not always available scraper.canonical_url() # not always available scraper.equipment() # not always available scraper.cooking_method() # not always available scraper.keywords() # not always available scraper.dietary_restrictions() # not always available

Traceback (most recent call last): File "...\scratches\scratch_7.py", line 11, in <module> scraper.title() File "~\recipe_scrapers\plugins\exception_handling.py", line 63, in decorated_method_wrapper return decorated(self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File ~\recipe_scrapers\plugins\html_tags_stripper.py", line 74, in decorated_method_wrapper decorated_func_result = decorated(self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "~\recipe_scrapers\plugins\normalize_string.py", line 33, in decorated_method_wrapper return normalize_string(decorated(self, *args, **kwargs)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "~\recipe_scrapers\plugins\schemaorg_fill.py", line 66, in decorated_method_wrapper raise e File "~\recipe_scrapers\plugins\schemaorg_fill.py", line 57, in decorated_method_wrapper return decorated(self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "~\recipe_scrapers\_abstract.py", line 95, in title raise NotImplementedError("This should be implemented.") NotImplementedError: This should be implemented.

The text was updated successfully, but these errors were encountered:

jayaddison · 2024-10-01T15:14:32Z

Hi @hhopke - thank you for the bugreport! I haven't been able to replicate this problem locally; could you check whether there any of the differences in the code I used below seemed different to yours?

>>> import requests
>>> from recipe_scrapers import scrape_html
>>> url = "https://www.kitchenstories.com/de/rezepte/susskartoffel-curry"
>>> name = input('What is your name, risotto sampler?\n')
What is your name, risotto sampler?
James
>>> html = requests.get(url, headers={"User-Agent": f"Risotto Sampler {name}"}).content
>>> scraper = scrape_html(html, org_url=url, wild_mode=False)
>>> scraper.title()
'Süßkartoffel-Curry'

hhopke · 2024-10-16T19:27:55Z

Hi @jayaddison,
I was on vacation, therefore the late reply. I used the exactly same code. Just tried to copy and paste with yours and get the same output. Interesting though is that I am getting this for multiple sites, like if the page is blocking me.

For instance this page worked: https://fitmencook.com/recipes/mexican-tortilla-soup/

jayaddison · 2024-11-01T01:46:03Z

@hhopke no problem at all, thanks for responding. I have one idea, although it may be something you've already considered: do you know whether the relevant pages display as expected when opened in a popular web browser? That could provide one item of information, and perhaps a workaround:

Info: it may help confirm whether the problem could somehow be related to the script used to retrieve the recipe page (a difference in user-agent).
Workaround: if the page does load correctly in a browser, you should be able to save the source HTML of the page from your browser to a file, and then to update the scripting to read that file and scrape from there instead.

Unfortunately there's often not a lot we can do about transient network errors and network/server filtering -- so I can't guarantee a successful result; but if the page does load in other browsers then, in theory at least, we have more options.

hhopke added the bug label Sep 23, 2024

jayaddison added the bots-protection А form of bot protection is preventing the fetching of the recipe's HTML label Oct 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kitchenstories scrapper not detected #1261

Kitchenstories scrapper not detected #1261

hhopke commented Sep 23, 2024 •

edited

Loading

jayaddison commented Oct 1, 2024

hhopke commented Oct 16, 2024

jayaddison commented Nov 1, 2024

Kitchenstories scrapper not detected #1261

Kitchenstories scrapper not detected #1261

Comments

hhopke commented Sep 23, 2024 • edited Loading

jayaddison commented Oct 1, 2024

hhopke commented Oct 16, 2024

jayaddison commented Nov 1, 2024

hhopke commented Sep 23, 2024 •

edited

Loading