Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add IP column to Athena table for reverse IP search with WARC-IP-Address data #30

Open
cirosantilli opened this issue Jun 15, 2023 · 0 comments

Comments

@cirosantilli
Copy link

cirosantilli commented Jun 15, 2023

Historical hostname -> IP and IP -> hostname (reverse IP) datasets are currently quite hard to come by: https://opendata.stackexchange.com/questions/1951/dataset-of-domain-names the only super convenient methods being websites such as https://viewdns.info/reverseip/ which are expensive and have undocumented methodology.

Would it be possible to add an IP column to Athena that tracks WARC-IP-Address? If we had that, it would be trivial for someone to export that data at relatively low cost from Common Crawl and make it available for all to use on a CSV file hosted on GItHub for example.

Such data can be of great value for OSINT purposes, e.g. I needed it in this project: https://cirosantilli.com/cia-2010-covert-communication-websites

There is a tool made for this apparently: https://github.com/CAIDA/commoncrawl-host-ip-mapper but I don't think it can run quickly/cheaply, the tabular approach would really be ideal here.

@cirosantilli cirosantilli changed the title Add IP column to Athena table for reverse IP serach with WARC-IP-Address data Add IP column to Athena table for reverse IP search with WARC-IP-Address data Jun 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant