Tracker: Offline geocoding #58

ellenhp · 2022-06-05T04:08:14Z

Offline geocoding would probably be best accomplished by getting Carmen to run in the browser, per #50. This seems like it will require removing rayon and rocksdb from carmen-core, rewriting or repackaging vtquery and probably also other things. I'm guessing Carmen itself will need to be ported over to use local storage instead of the node filesystem APIs too. After the dust settles performance will also need to be evaluated in terms of response time, latency from a cold cache, and subjective quality of the results.

ellenhp · 2022-06-06T17:58:38Z

I got carmen to ingest some sample data, which is good. Next I'm going to try and ingest a small OSM extract.

ellenhp · 2022-06-07T05:24:34Z

Carmen seems to be doing geocoding in my test setup, but I'm really unhappy with the size of the index. The fuzzy phrase store and grid store combined are 1 MB compressed for the Seattle metro area.

ellenhp · 2022-06-08T03:45:36Z

I have fuzzy_phrase building for wasm, complete with dynamic loading of the search index, which is a very unexpected turn of events. https://github.com/ellenhp/fuzzy-phrase/tree/wasm

This was only possible because of the work done here: https://github.com/phiresky/tantivy-fst
I'm so thankful that I found this fork. :)

Currently working on converting carmen-core to use sqlite instead of rocksdb so we can leverage sql.js-httpvfs to dynamically load the gridstore too. After that I'll perform some analysis to determine how ping time to the server affects geocoding performance. I'm expecting tens of serial HTTP range requests, so ping time will probably be critical. If this does end up working well enough to deploy, it will probably make sense to use a CDN to serve the index. As much as I hate letting cloudflare MITM my TLS connections, there might not be any other way to have a good user experience. And even though access patterns would leak information to the CDN operator, it beats the heck out of sending a free-text geocoding query to $MAPS_COMPANY.

ellenhp · 2022-06-10T15:02:13Z

carmen-core is working with a sqlite backend so now in theory lazy loading is unblocked, but I'm not convinced sqlite is the correct path forward. I'd like to avoid it because simultaneous interop between javascript, C/C++ and Rust sounds kind of hard and I don't understand the emscripten virtual filesystem stuff. Also sql.js is like a megabyte of wasm. I want to build my own key-value store that will eagerly download the index then lazily download the data blocks. It also gives me much more control over latency that way compared to implementing a lazy filesystem for sqlite.

After that I think all that remains is building a new wasm_bindgen interface for carmen core, building a lazy fst::FakeArr, then building vtquery with emscripten and using that instead of the vtquery node package. Inevitably there will be issues but this doesn't seem like more than another week of work. I have a 10 day vacation coming up though so my original estimate of 1 month might end up being accurate after all.

ellenhp · 2022-07-28T15:02:19Z

At this point I'm pretty convinced that Mapbox Carmen won't work as-is, which is a bummer. I've started exploring other options but I think it makes sense to get Headway into a working state as originally scoped. A lot of people were excited about it as originally scoped and I don't think I want to block its completion on me writing a geocoder from scratch.

ellenhp · 2022-08-08T23:09:25Z

I'm sure if I spent a few months on this I could get it to tech demo levels of functionality but I want more for for this project than that, so I'm going to move forward with a traditional geocoder stack. Expectations for privacy can be managed in some other way. I think it may eventually be reasonable to build a privacy-preserving replacement for nominatim but it is not reasonable IMO to try to replicate the performance or usability characteristics of photon. There's just like, so much work that's gone into making that fast, generalized and typo-tolerant.

I'm going to keep pursuing offline routing though. There are a few user stories that could preserve privacy better if offline routing were to work (route me home, or to any other location I've cached the lat/lng for)

3nprob · 2022-08-08T23:57:46Z

So I think as long as endpoints are straightforward enough to configure both at buildtime runtime, offline geocoding and routing are not that crucial from a pure privacy perspective, as those whose threat models are strict enough that this is a concern can also figure out something that works.

Not having to trust any server is less important if it's easy enough to set up a new server or use a friends'.

(There are still reasons why these, including routing, are interesting features)

ellenhp mentioned this issue Jun 7, 2022

Investigate moving towards a federated network of Headway instances #50

Open

9 tasks

ellenhp mentioned this issue Jun 8, 2022

[Bug] Error when hosting behind https reverse proxy #59

Closed

ellenhp closed this as completed Aug 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tracker: Offline geocoding #58

Tracker: Offline geocoding #58

ellenhp commented Jun 5, 2022 •

edited

Loading

ellenhp commented Jun 6, 2022

ellenhp commented Jun 7, 2022

ellenhp commented Jun 8, 2022

ellenhp commented Jun 10, 2022

ellenhp commented Jul 28, 2022

ellenhp commented Aug 8, 2022

3nprob commented Aug 8, 2022 •

edited

Loading

Tracker: Offline geocoding #58

Tracker: Offline geocoding #58

Comments

ellenhp commented Jun 5, 2022 • edited Loading

ellenhp commented Jun 6, 2022

ellenhp commented Jun 7, 2022

ellenhp commented Jun 8, 2022

ellenhp commented Jun 10, 2022

ellenhp commented Jul 28, 2022

ellenhp commented Aug 8, 2022

3nprob commented Aug 8, 2022 • edited Loading

ellenhp commented Jun 5, 2022 •

edited

Loading

3nprob commented Aug 8, 2022 •

edited

Loading