Krptkn, pronounced /kroh-pot-kin/, is a metadata extraction framework for websites. It contains a spider, metadata extractor, metadata analyzer and report generator.
Build both Krptkn and PostgreSQL:
$ docker-compose build
Building phoenix
[+] Building 235.2s (25/25) FINISHED
Run:
$ docker-compose up
phoenix_1 | [info] Access KrptknWeb.Endpoint at http://localhost:4000
- Spider
- Extract URLs from HTML
- Dictionary of common directories
- Extract from Robots.txt and other files
- Metadata extraction
- NIF with libextractor
- Metadata filter
- Report generator
- Plot generator
- Metadata frequency analyzer
- URL frequency analyzer
- Add report generation to WUI
- Control panel
- Start URL
- Pause / Stop / Resume
- Clear RAM / Clear DB
- Information Leds
- Quality control
- Module documentation
- Function documentation
- Schematics
- Create tests
- Continuous integration
Admin Web UI:
Report example:
Krptkn's data flow: