Parsing Japan addresses to prefectures and cities.
pip install japanese-addresses
from japanese_addresses import separate_address
parsed_address = separate_address('宮城県仙台市泉区市名坂字東裏97-1')
print(parsed_address)
"""
ParsedAddress(prefecture='宮城県', city='仙台市泉区', street='市名坂')
"""
parsed_address = separate_address('鹿児島県志布志市志布志町志布志')
print(parsed_address)
"""
ParsedAddress(prefecture='鹿児島県', city='志布志市', street='志布志町志布志')
"""
How to use it in combination with pandas.
import pandas as pd
from japanese_addresses import separate_address
df = pd.read_csv('sample.csv')
df.head()
"""
address
0 宮城県仙台市泉区市名坂字東裏97-1
1 鹿児島県志布志市志布志町志布志
2 東京都 神津島村284番
"""
target_col = 'address'
# https://stackoverflow.com/questions/16236684/apply-pandas-function-to-column-to-create-multiple-new-columns
def get_separate_address(address):
parsed_address = separate_address(address)
return parsed_address.prefecture, parsed_address.city, parsed_address.street
df['prefecture'], df['city'], df['street']= zip(*df[target_col].map(get_separate_address))
df.head()
"""
address prefecture city street
0 宮城県仙台市泉区市名坂字東裏97-1 宮城県 仙台市泉区 市名坂
1 鹿児島県志布志市志布志町志布志 鹿児島県 志布志市 志布志町志布志
2 東京都 神津島村284番 東京都 神津島村
"""
pip install poetry
poetry install
poetry run pytest
Japanese_addresses are licensed under MIT
prefecture2city2street.pkl is a derivative work with a modification of geolonia / japanese-addresses
Also, prefecture2city2street.pkl was created using csv_to_dict.py.
Here's the link to the original work.
https://geolonia.github.io/japanese-addresses/
The following is written in Japanese according to the description of the original work.
Geolonia 住所データ
本データは、以下のデータを元に、毎月 Geolonia にて更新作業を行っています。