- Enhance related articles list analyze.
- Fixed: check if the uri is a cheerio Object in an ugly way.
- DO NOTHING BUT RELEASE 0.5.0
- Enhance JSON output: paragraphs
- Fixed: remove \r\t\n paragraph
- Simply expose Reader and Article.
- Update packages, add nodeJS 4 and 6 for Travis and remove unnecesary … @miduga
- Do some refactor and cleaning @miduga
- Merge branch 'cleaning-reader' @miduga
- Fix promises documentation @miduga
- Promise support
- remove useless anchors i.e.: paginations, contactors...
- remove related links if neccessary
- improve link density algorithm
- options recognition
- cheerio object can be passed in
- more test cases
- improve
selectors
option
- improve performance
- fixed typo @entertainyou
debug
modeextract
ofselectors
could befunction
now
betterTitle
option- #18 #20 #21 #27 #28
forceDecode
option, letcheerio
/htmlparser2
handle the encodings now.
minParagraphs
option.- Make images regexp extendable. #14@entertainyou
- feature: threshold
- fixed: #5 #10
imgFallback
option @entertainyou
- fix scoreRule on grandparent node
- only fetch body when uri is provided but html is empty
- fix the links of img,a,object,embed... - relative to absolute
- customize selectors
- refactor: extract data
- Supports
cheerio
output type. @entertainyou
- Add
tidyAttrs
option.
- Add response data to callback arguments
- Remove empty content in JSON output mode
- Split content by
<br />
in JSON output mode
- Custom score rule supports
- Options.minTextLength supports
- Update dependencies
- RexExp of videos
- Update documentation
- Decode HTML entities manually
- Update documentation
- Remove broken test sites
- Update dependencies
- Identify Ads.
- Calculate scores with reducing weight if content has unlikely candidates now.
- Grab article content more accurate.