Skip to content

Golang port of the boilerpipe Java library used for the removal of boilerplate and extraction of text content from HTML documents.

License

Notifications You must be signed in to change notification settings

jlubawy/go-boilerpipe

Repository files navigation

go-boilerpipe

Golang port of the boilerpipe Java library written by Christian Kohlschütter

Build Status GoDoc

Boilerpipe removes boilerplate and extracts text content from HTML documents.

Currently it only supports article extraction which includes the title, the date, and the content.

Best attempts will be made to follow Semantic Versioning 2.0.0 rules, but no API guarantees will be made until version 1.0.0.

However, existing tags will not change guaranteeing that vendoring will not break.

Getting Started

To install: go get -u github.com/jlubawy/go-boilerpipe/...

Command-Line Tool

$ boilerpipe help

Boilerpipe removes boilerplate and extracts text content from HTML documents.

Usage:

       boilerpipe command [arguments]

The commands are:

       extract    extract text content from an HTML document
       serve      start a HTTP server for extracting text from HTML documents
       version    print boilerpipe version

Use "boilerpipe help [command]" for more information about a command.

Using the library

See examples in filter_test.go.

About

Golang port of the boilerpipe Java library used for the removal of boilerplate and extraction of text content from HTML documents.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages