Golang port of the boilerpipe Java library written by Christian Kohlschütter
Boilerpipe removes boilerplate and extracts text content from HTML documents.
Currently it only supports article extraction which includes the title, the date, and the content.
Best attempts will be made to follow Semantic Versioning 2.0.0 rules, but no API guarantees will be made until version 1.0.0.
However, existing tags will not change guaranteeing that vendoring will not break.
To install: go get -u github.com/jlubawy/go-boilerpipe/...
$ boilerpipe help
Boilerpipe removes boilerplate and extracts text content from HTML documents.
Usage:
boilerpipe command [arguments]
The commands are:
extract extract text content from an HTML document
serve start a HTTP server for extracting text from HTML documents
version print boilerpipe version
Use "boilerpipe help [command]" for more information about a command.
See examples in filter_test.go.