[SITES] https://www.bizjournals.com #648

tbrox · 2024-08-23T10:38:22Z

First please check that it is really an issue with the library, and not some special case of website:

[ X ] There is no paywall
[ X ] You do not have to be logged in to see the articles
[ X ] You tried using a common browser user agent in your configuration / call
[ X ] The website is not in the list of well known problematic sites

Your report as follows:

Website that does not parse correctly:

https://www.bizjournals.com

Some sample urls that I have tried

https://www.bizjournals.com/boston/news/2024/08/23/irobot-roomba-cleaning-station.html?ana=brss_4650
https://www.bizjournals.com/sanfrancisco/inno/stories/news/2024/08/22/bracing-for-impact-bay-area-investors-bullish-dei.html?ana=brss_4650

The exact code i used to test this articles/website

article = Article(url, fetch_images=False, follow_meta_refresh=True)
article.download()
article.parse()

Other information, remarks, messages, etc:

newspaper.exceptions.ArticleException: Article download() failed with Status code 403 for url None

The text was updated successfully, but these errors were encountered:

tbrox added the sites not working label Aug 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SITES] https://www.bizjournals.com #648

[SITES] https://www.bizjournals.com #648

tbrox commented Aug 23, 2024

[SITES] https://www.bizjournals.com #648

[SITES] https://www.bizjournals.com #648

Comments

tbrox commented Aug 23, 2024

First please check that it is really an issue with the library, and not some special case of website:

Your report as follows: