-
-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add unit numbers to addresses #618
Add unit numbers to addresses #618
Comments
One of the thoughts was to use the interpolation for all address queries instead of checking with ES first. However, there seems to be a problem with finding these units in the interpolation engine as well. The common house numbers are present in the data, but the individual units are missing. It would be good to investigate why that's happening. |
can you please provide some examples of the unit numbers you are expecting to find with links to the source data? |
The OA source can be found here $ grep ",50,NE VILLAGE" ./us/or/portland_metro.csv
-122.404822,45.4982275,50,NE VILLAGE SQUIRE AVE,17,,,,97030,,96e580ae3ca840af
-122.4047135,45.4982497,50,NE VILLAGE SQUIRE AVE,16,,,,97030,,ed938477c2cc0919
-122.404545,45.4982479,50,NE VILLAGE SQUIRE AVE,15,,,,97030,,f484e126402e1c60
-122.4044473,45.498247,50,NE VILLAGE SQUIRE AVE,14,,,,97030,,13a4f0a76e89e233
-122.4042684,45.4982459,50,NE VILLAGE SQUIRE AVE,13,,,,97030,,6fb303c178eb2d4d
-122.4041579,45.4982448,50,NE VILLAGE SQUIRE AVE,12,,,,97030,,5e779bf6aa4968c2
-122.4040506,45.4982453,50,NE VILLAGE SQUIRE AVE,11,,,,97030,,bbee091d599933db
-122.4038379,45.4982622,50,NE VILLAGE SQUIRE AVE,10,,,,97030,,488a5c92cda4a1de
-122.4037477,45.4982356,50,NE VILLAGE SQUIRE AVE,9,,,,97030,,59f45a36968b37e1
-122.4037464,45.4979097,50,NE VILLAGE SQUIRE AVE,8,,,,97030,,45c0b9388786681c
-122.4038362,45.4978826,50,NE VILLAGE SQUIRE AVE,7,,,,97030,,0444a51b841983da
-122.4040366,45.4979048,50,NE VILLAGE SQUIRE AVE,6,,,,97030,,f441a312426edac1
-122.4041496,45.4979059,50,NE VILLAGE SQUIRE AVE,5,,,,97030,,8dd7da5d8d425743
-122.4043476,45.4979073,50,NE VILLAGE SQUIRE AVE,4,,,,97030,,ea0fc8be8f797872
-122.4044461,45.4979082,50,NE VILLAGE SQUIRE AVE,3,,,,97030,,5a64fb3515a5a4d2
-122.4046453,45.4978743,50,NE VILLAGE SQUIRE AVE,2,,,,97030,,adbe53f124a09122
-122.404757,45.4978864,50,NE VILLAGE SQUIRE AVE,1,,,,97030,,b9d1dabd3d0e577b or similarly and in the same dataset there are 213 total addresses at $ grep ",21000,NW QUATAMA" ./us/or/portland_metro.csv
-122.8960946,45.5218934,21000,NW QUATAMA RD,SPC 91,,,,97006,,ded6db4d505ffbe6
-122.8960946,45.5218934,21000,NW QUATAMA RD,SPC 100,,,,97006,,53b0af8ee4d2bbba
-122.8967017,45.5213388,21000,NW QUATAMA RD,,,,,97006,,3414cdafc2f4b992
-122.8978935,45.5208051,21000,NW QUATAMA RD,SPC 188,,,,97006,,6436a9d6677f5999
-122.8980784,45.5207527,21000,NW QUATAMA RD,SPC 189,,,,97006,,9eddaa43b4596a0b
-122.8982544,45.5207805,21000,NW QUATAMA RD,SPC 190,,,,97006,,a53cebd6d1013548
-122.8984162,45.5208571,21000,NW QUATAMA RD,SPC 191,,,,97006,,a43f79e326eb23f7
-122.8986217,45.5210072,21000,NW QUATAMA RD,SPC 192,,,,97006,,376a2cb7a0bc19b9
... |
Seeing the same issue for Denmark where the official unique addresses also includes unit, but also level. See openaddresses/openaddresses#3511 for more details. Looking in Pelias for Nikolaj Plads 26 in Copenhagen it, of course, results in only 1 address returned. http://pelias.github.io/compare/#/v1/search%3Ftext=Nikolaj%20Plads,%2026,%20Copenhagen,%20 To resolve this both level and unit needs to be added even i Pelias. |
Hey @sweco-semhul, One more important one is the Elasticsearch schema which tells Elasticsearch what type of data to expect and how to store it. A unit number field should be able to fit in with the rest of our address components. Right now we have custom Elasticsearch analyzers for each of the other fields (housenumber, street, postalcode), and we'll likely need another one for unit number, even if it doesn't end up being very complex. The custom analyzers for the other address components are in the same repo. We'd be super happy to assist you in any way to get started on this. It will be touching lots of different areas of the code, and so will probably take some trial and error to get right, but we'll be here. Let us know if you have questions or get stuck on anything. |
sweet. Thanks for pointing that out, will add it to the list. I have probably missed some more things and as you mentioned this will need some trail and error and testing. |
@orangejulius A lot of parts which have been changed but from what I can see its all running together and I´m able to load and search for danish addresses with unit attributes. I have created a pull request for each change and referenced to this issue. I´m not certain about how npm dependencies is handled, now they reference to my fork including needed changes for it to run. e.g. https://github.com/pelias/api/pull/1052/files#diff-b9cfc7f2cdf78a7f4b91a753d10865a2 Hopefully some of you are able to test this and that it will at least help on the way of getting the unit attribute into Pelias. Get back to me on anything I can do to help get this further. |
Hey @sweco-semhul, So far, I was successfully able to use your schema, OA importer, and API changes to import Portland, OR, USA addresses and query for them. However, the queries have to take the form From your initial issue in the openaddresses repo, I gather that's not the correct format in either Denmark or the United States. I think we can change that line to We already have code that queries the individual components of an address to deal with differing address formats for the housenumber and street parts of an address (which does differ between Finland and the United States), so perhaps we'll have to extend that code to deal with unit numbers (@trescube what do you think?). But right now I think it's ok as long as we change the order in Oh and just for proof we are getting close, here's one of the queries @dianashk linked to above: |
For how to deal with the modules, once we are ready, we can begin merging the pull requests starting from "the bottom" of the tree of dependencies. We have Greenkeeper and semantic-release which together will make those changes come through easily. Then we can either rebase or merge your other pull requests and it will all work out. |
@orangejulius sorry for the late response. Been busy doing other things. That seems like a reasonable change. The hardest for me when doing these changes was to get unit into the query parts. So maybe I have misunderstood something. Another thing would be getting labels to look like "Nikolaj Plads 23 02, København K, Denmark" (street housenumber unit ...) but I guess that would be up to this method to fix https://github.com/pelias/api/pull/1052/files#diff-35694d3f8946406d873df923bf730703R47 Am I understanding that it correctly? |
Yes, I think we're in agreement and you're understanding perfectly. Based on the link to address formats that you mentioned a while back, and your comments now, we want to be able to display and search for address in the american format (housenumber, street, unit) or the european format (street housenumber unit). So I think the "standard" (which corresponds to the american format) string in that function you linked needs to be updated. otherwise I believe its ok. We generally store things in american format but our query logic is smart enough to handle either format. I'll put more concrete recommendations in the PRs themselves to help keep things clear :) |
Cool, thanks for that. |
Sorry it took a wile but the changes to get street and unit in the right order should be there now and form what I can see it all still holds together. :) |
No worries about the delay. I think all these look good and we can start merging. I'll handle merging the dependencies in the right order so that greenkeeper can help us out. :) |
Thanks, happy to be able to give something back to a great project! |
Hey team!
I was using your awesome geocoding engine when I noticed something interesting.
Let me tell you more about it.
background
In Portland, as well as other US cities, there are planned communities that have a single house number assigned to the whole development and unit numbers assigned to each house lot in the development. So querying for an address with just the shared house number should result in a list of addresses in that development with unique unit numbers.
current state
Currently, Pelias imports OpenAddresses data, which has the appropriate unit numbers for each address, without those valuable unit numbers. So our data has numerous records with what looks to be the exact same address but slightly different locations.
At query time, the API gets back a long list of all of these addresses and then decides to deduplicate them all down to a single record because they lack any unique characteristics.
/v1/search?text=50 NE VILLAGE SQUIRE, PORTLAND
/v1/search?text=21000 NW Quatama Rd, Portland
desired behavior
The text was updated successfully, but these errors were encountered: