This is a haskell clone of the python jusText project. It is useful for removing boiler plate content from HTML pages leaving just the main content. jusText applies certain heuristics to identify the main content of the page. You can read more about it in the thesis work done by Jan Pomik´alek.
stack install
haskell-jusText <htmlFile> <stopwordsFile>
Stopword files for different languages are available in the original repo.