Skip to content

Latest commit

 

History

History
45 lines (24 loc) · 1.51 KB

README.md

File metadata and controls

45 lines (24 loc) · 1.51 KB

go-boilerpipe

Golang port of the boilerpipe Java library written by Christian Kohlschütter

Build Status GoDoc

Boilerpipe removes boilerplate and extracts text content from HTML documents.

Currently it only supports article extraction which includes the title, the date, and the content.

Best attempts will be made to follow Semantic Versioning 2.0.0 rules, but no API guarantees will be made until version 1.0.0.

However, existing tags will not change guaranteeing that vendoring will not break.

Getting Started

To install: go get -u github.com/jlubawy/go-boilerpipe/...

Command-Line Tool

$ boilerpipe help

Boilerpipe removes boilerplate and extracts text content from HTML documents.

Usage:

       boilerpipe command [arguments]

The commands are:

       extract    extract text content from an HTML document
       serve      start a HTTP server for extracting text from HTML documents
       version    print boilerpipe version

Use "boilerpipe help [command]" for more information about a command.

Using the library

See examples in filter_test.go.