links that read-art can not crawl #1

Tjatse · 2014-11-28T07:39:08Z

No description provided.

mxr576 · 2015-09-20T16:54:42Z

Hi!

I'm using your module in my web crawler, called Web page Content Extractor (wce), and I've recently discovered that the read-art returns with "Error: 400 Bad Request" for these URLs, however the node-readability works on these ones, without any problem. Could you please check them?

Tjatse · 2015-09-21T03:48:18Z

Hi, @mxr576, thanks a lot, there is a bug of setting host on headers in req-fast, I've fixed it and put your issue as a test case under test directory, it works fine, just update read-art to latest version and try out.

mxr576 · 2015-09-21T05:28:27Z

Thanks for the fast reaction! I was suspicious too, that this should a req-fast issue. I can confirm, that the content extraction works fine on these links now with read-art.

…mment)

entertainyou · 2016-02-18T09:38:02Z

@Tjatse , for URL: http://mp.weixin.qq.com/s?__biz=MjYyMzc1Mjk4MA==&mid=400815255&idx=1&sn=d91b630394b8ba70209406bbf44b41e8&scene=0#wechat_redirect with pictures as article, the result is

<div> <strong class="profile_nickname">搞笑集中营</strong>
<p class="profile_meta"> <span class="profile_meta_value">WeiGaoXiao</span> </p>
<p class="profile_meta"> <span class="profile_meta_value">搞笑段子、搞笑视频、搞笑幽默、搞笑糗事、内涵漫画……等等搞笑的搞笑，这里是搞笑集中营，一网打尽所有的搞笑，让你天天笑哈哈哈哈哈哈哈~</span>

FarmaanElahi · 2018-08-14T08:39:02Z

https://medium.com/google-developers/drawing-a-rounded-corner-background-on-text-5a610a95af5
Entire artcile is not crawled

Tjatse added the bug label Nov 28, 2014

mxr576 added a commit to mxr576/webpage-content-extractor that referenced this issue Sep 21, 2015

Read-art updated to resolve this issue: Tjatse/node-readability#1 (co…

8ce67bf

…mment)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

links that read-art can not crawl #1

links that read-art can not crawl #1

Tjatse commented Nov 28, 2014

mxr576 commented Sep 20, 2015

Tjatse commented Sep 21, 2015

mxr576 commented Sep 21, 2015

entertainyou commented Feb 18, 2016

FarmaanElahi commented Aug 14, 2018

links that read-art can not crawl #1

links that read-art can not crawl #1

Comments

Tjatse commented Nov 28, 2014

mxr576 commented Sep 20, 2015

Tjatse commented Sep 21, 2015

mxr576 commented Sep 21, 2015

entertainyou commented Feb 18, 2016

FarmaanElahi commented Aug 14, 2018