Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SITES] https://www.bizjournals.com #648

Open
tbrox opened this issue Aug 23, 2024 · 0 comments
Open

[SITES] https://www.bizjournals.com #648

tbrox opened this issue Aug 23, 2024 · 0 comments

Comments

@tbrox
Copy link

tbrox commented Aug 23, 2024

First please check that it is really an issue with the library, and not some special case of website:

  • [ X ] There is no paywall
  • [ X ] You do not have to be logged in to see the articles
  • [ X ] You tried using a common browser user agent in your configuration / call
  • [ X ] The website is not in the list of well known problematic sites

Your report as follows:

Website that does not parse correctly:

https://www.bizjournals.com

Some sample urls that I have tried

https://www.bizjournals.com/boston/news/2024/08/23/irobot-roomba-cleaning-station.html?ana=brss_4650
https://www.bizjournals.com/sanfrancisco/inno/stories/news/2024/08/22/bracing-for-impact-bay-area-investors-bullish-dei.html?ana=brss_4650

The exact code i used to test this articles/website

article = Article(url, fetch_images=False, follow_meta_refresh=True)
article.download()
article.parse()

Other information, remarks, messages, etc:

newspaper.exceptions.ArticleException: Article download() failed with Status code 403 for url None

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant