Skip to content

Allows you to input a TXT file of addresses and output a CSV of those addresses with geocode data

License

Notifications You must be signed in to change notification settings

eric-hendrickson/geocode_address_collector

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Geocode Address Collector

This is an address collector script that takes a .txt file of [what are assumed to be] unique addresses, uses Nominatim to geocode the addresses, and creates a CSV containing the original raw address, the geocoded address, the latitude, and the longitude

Usage

To run, simply run geocode_address_collector.py, however it is strongly recommended to name a user agent when running this script (e.g., geocode_address_collector.py check). Read the footnotes under the Restrictions heading to see why.

Addresses will be taken from unique_addresses.csv. This file must have the following headers:

  • Street (composed of the house number and street name)
  • City
  • State (two letter code, such as CA for California or NV for Nevada)
  • ZIP (avoid using Zip+4)

Those that are properly geocoded will be written to geocoded_addresses.csv, whereas addresses that cannot be geocoded will be written to no_geocodes.csv, which will have the same columns as unique_addresses.csv and can be used for debugging. If a certain precision level is set, like house_number (which is set by default), and a geocoded address does not have that value, the geocoded addresses will be written to partially_geocoded_addresses.csv, which can also be used for debugging.

This script was developed using anaconda3-2021.11, with Python 3.9.19. While it will probably work with any version of Python 3.9 or higher that has all the correct packages installed, I cannot guarantee that it will.

Restrictions

This script uses Nominatim, a service provided by OpenStreetMap, who deserve all the credit for making this script possible.

OpenStreetMap has a usage policy for Nominatim, which you should review here, but here are the highlights (as of June 18, 2024, footnotes are mine):

  • No heavy uses (an absolute maximum of 1 request per second).1
  • Provide a valid HTTP Referer or User-Agent identifying the application (stock User-Agents as set by http libraries will not do).2
  • Clearly display attribution as suitable for your medium.
  • Data is provided under the ODbL license which requires to share alike (although small extractions are likely to be covered by fair usage / fair dealing).

What this basically boils down to, according to my interpretation, is don't get greedy and give credit where credit is due,

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

License

This script is covered by the MIT license, but any use of data coming from OpenStreetMap (i.e., geocoded addresses that this script produces) is covered by the ODbL.

Footnotes

  1. This is why the geocode_addresses() function has the parameter break_time=1.1. DO NOT make it lower than 1 second.

  2. There is a default user_agent value in the application, but I strongly recommend including your own as an argument when running the script (See Usage).

About

Allows you to input a TXT file of addresses and output a CSV of those addresses with geocode data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages