This is an address collector script that takes a .txt
file of [what are
assumed to be] unique addresses, uses Nominatim to geocode the addresses,
and creates a CSV containing the original raw address, the geocoded address,
the latitude, and the longitude
To run, simply run geocode_address_collector.py
, however it is strongly recommended to name a user agent when running this script (e.g., geocode_address_collector.py check
). Read the footnotes under the Restrictions heading to see why.
Addresses will be taken from unique_addresses.csv
. This file must have the following headers:
Street
(composed of the house number and street name)City
State
(two letter code, such asCA
for California orNV
for Nevada)ZIP
(avoid using Zip+4)
Those that are properly geocoded will be written to geocoded_addresses.csv
, whereas addresses that cannot be geocoded will be written to no_geocodes.csv
, which will have the same columns as unique_addresses.csv
and can be used for debugging. If a certain precision level is set, like house_number
(which is set by default), and a geocoded address does not have that value, the geocoded addresses will be written to partially_geocoded_addresses.csv
, which can also be used for debugging.
This script was developed using anaconda3-2021.11
, with Python 3.9.19. While it will probably work with any version of Python 3.9 or higher that has all the correct packages installed, I cannot guarantee that it will.
This script uses Nominatim, a service provided by OpenStreetMap, who deserve all the credit for making this script possible.
OpenStreetMap has a usage policy for Nominatim, which you should review here, but here are the highlights (as of June 18, 2024, footnotes are mine):
- No heavy uses (an absolute maximum of 1 request per second).1
- Provide a valid HTTP Referer or User-Agent identifying the application (stock User-Agents as set by http libraries will not do).2
- Clearly display attribution as suitable for your medium.
- Data is provided under the ODbL license which requires to share alike (although small extractions are likely to be covered by fair usage / fair dealing).
What this basically boils down to, according to my interpretation, is don't get greedy and give credit where credit is due,
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
This script is covered by the MIT license, but any use of data coming from OpenStreetMap (i.e., geocoded addresses that this script produces) is covered by the ODbL.