Skip to content
/ krptkn Public

Spider, metadata extractor, metadata analyzer and report generator.

License

Notifications You must be signed in to change notification settings

solanav/krptkn

Repository files navigation

Krptkn

Krptkn, pronounced /kroh-pot-kin/, is a metadata extraction framework for websites. It contains a spider, metadata extractor, metadata analyzer and report generator.

Installation (docker)

Build both Krptkn and PostgreSQL:

$ docker-compose build
Building phoenix
[+] Building 235.2s (25/25) FINISHED

Run:

$ docker-compose up
phoenix_1  | [info] Access KrptknWeb.Endpoint at http://localhost:4000

Roadmap

  • Spider
    • Extract URLs from HTML
    • Dictionary of common directories
    • Extract from Robots.txt and other files
  • Metadata extraction
    • NIF with libextractor
  • Metadata filter
  • Report generator
    • Plot generator
    • Metadata frequency analyzer
    • URL frequency analyzer
    • Add report generation to WUI
  • Control panel
    • Start URL
    • Pause / Stop / Resume
    • Clear RAM / Clear DB
    • Information Leds
  • Quality control
    • Module documentation
    • Function documentation
    • Schematics
    • Create tests
    • Continuous integration

Images

Admin Web UI:

Web UI

Report example:

Web UI

Schematics

Krptkn's data flow:

Data flow for krptkn