Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

links that read-art can not crawl #1

Open
Tjatse opened this issue Nov 28, 2014 · 5 comments
Open

links that read-art can not crawl #1

Tjatse opened this issue Nov 28, 2014 · 5 comments
Labels

Comments

@Tjatse
Copy link
Owner

Tjatse commented Nov 28, 2014

No description provided.

@Tjatse Tjatse added the bug label Nov 28, 2014
@mxr576
Copy link

mxr576 commented Sep 20, 2015

Hi!

I'm using your module in my web crawler, called Web page Content Extractor (wce), and I've recently discovered that the read-art returns with "Error: 400 Bad Request" for these URLs, however the node-readability works on these ones, without any problem. Could you please check them?

@Tjatse
Copy link
Owner Author

Tjatse commented Sep 21, 2015

Hi, @mxr576, thanks a lot, there is a bug of setting host on headers in req-fast, I've fixed it and put your issue as a test case under test directory, it works fine, just update read-art to latest version and try out.

@mxr576
Copy link

mxr576 commented Sep 21, 2015

Thanks for the fast reaction! I was suspicious too, that this should a req-fast issue. I can confirm, that the content extraction works fine on these links now with read-art.

mxr576 added a commit to mxr576/webpage-content-extractor that referenced this issue Sep 21, 2015
@entertainyou
Copy link
Contributor

@Tjatse , for URL: http://mp.weixin.qq.com/s?__biz=MjYyMzc1Mjk4MA==&mid=400815255&idx=1&sn=d91b630394b8ba70209406bbf44b41e8&scene=0#wechat_redirect with pictures as article, the result is

<div> <strong class="profile_nickname">搞笑集中营</strong>
<p class="profile_meta"> <span class="profile_meta_value">WeiGaoXiao</span> </p>
<p class="profile_meta"> <span class="profile_meta_value">搞笑段子、搞笑视频、搞笑幽默、搞笑糗事、内涵漫画……等等搞笑的搞笑,这里是搞笑集中营,一网打尽所有的搞笑,让你天天笑哈哈哈哈哈哈哈~</span>

@FarmaanElahi
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants