This repository automatically builds both IPv4 and IPv6 information to be used for testing IP address databases. Due to the nature of how this data is collected, it may also be valuable as supplemental data when building a database and can be considered known-good data.
The data is built utilizing self-published data by various providers. No 3rd party data is utilized and is considered inherently unreliable for the purposes of this data.
- Pingdom probe server data
- IP address types:
IPv4
,IPv6
- Data available:
Country Code
,Country Name
,City
- IP address types:
- Hetrix Monitoring IPs
- IP address types:
IPv4
- Data available:
Country Code
,City
,Subdivision Code
- Note: Utilizes a hand-built mapping between Hetrix's hostnames and their locations.
- IP address types:
- Updown.io Monitoring IPs
- IP address types:
IPv4
,IPv6
- Data available:
Country Code
,City
,Latitude
,Longitude
- IP address types:
- AWS IP Address Ranges
- IP address types:
IPv4
,IPv6
- Data available:
Country Code
- Note: Utilizes a hand-built mapping between AWS's region IDs and their locations.
- IP address types:
- Oracle Cloud IP Address Ranges
- IP address types:
IPv4
- Data available:
Country Code
- Note: Utilizes a hand-built mapping between Oracle's region IDs and their locations.
- IP address types:
- Linode Geofeed
- IP address types:
IPv4
,IPv6
- Data available:
Country Code
,Subdivision Code
,City Name
,Postal Code
- IP address types:
- DigitalOcean Geofeed
- IP address types:
IPv4
,IPv6
- Data available:
Country Code
,Subdivision Code
,City Name
,Postal Code
- IP address types:
- Vultr Geofeed
- IP address types:
IPv4
,IPv6
- Data available:
Country Code
,Subdivision Code
,City Name
,Postal Code
- IP address types:
- Starlink Geofeed
- IP address types:
IPv4
,IPv6
- Data available:
Country Code
,Subdivision Code
,City Name
- IP address types:
- Google Cloud Geofeed
- IP address types:
IPv4
,IPv6
- Data available:
Country Code
,Subdivision Code
,City Name
- IP address types:
Each release will go through a few "processing" steps to ensure the generated data is of good quality.
The order of processing is as follows:
- During each parsing step, deduplication is performed. Identical CIDRs are merged if shared properties between the two match, if not the currently existing one will be retained.
- The complete list is then sorted in decending order by the quantity of IP addresses in each CIDR
- Any CIDRs which are private networks are discarded.
- Any CIDRs which haven no data associated with them are discarded.
- Any 3-letter country codes are converted to 2 letter country codes.
- Next all CIDRs are looped through and compared against previous CIDRs to identify any overlaps / subnets.
- A subnet is retained and any differing data from the parent (supernet) network is considered valid.
- Any overlapping CIDRs are simply discarded with a message as of this moment.
- If a subnet has identical information to it's supernet, it's removed from the dataset.
- The final dataset after processing is written to the JSON file before then being uploaded to the release.
Unfortunately, this final step is proving to be quite slow due to it's time complexity which reduces the data size we can easily build. If you have ideas on how to optimize this, please share!