Skip to content
This repository has been archived by the owner on Jan 20, 2022. It is now read-only.

add python lib imposm.parser #2

Closed
wants to merge 4 commits into from
Closed

Conversation

missinglink
Copy link
Contributor

First stab at adding imposm.parser to benchmarks in response to #1

I couldn't get the script to output exactly what I wanted (which is only one line for each node OR coords) so it's currently outputting 2 lines per node; one from the coords_callback and one for the nodes_callback.

$ grep 564571123 tmpfile2
{"type":"node","id":564571123,"lat":-41.26042920000001,"lon":174.86484979999992}
{"type":"node","id":564571123,"lat":-41.26042920000001,"lon":174.86484979999992,"tags":{"tourism":"viewpoint"}}

I've never written any python before, so if someone can fix it, please do.
Maybe @olt can help out?

@naoliv here are some preliminary results so at least you can get an impression of relative speed.

$ ./run.sh 

--- pbf stats ---

file: /mnt/london_england.osm.pbf

lon min: -1.1149997
lon max: 0.8949991
lat min: 50.9410000
lat max: 51.9839997
nodes: 9861732
ways: 1390940
relations: 29157
node id min: 19
node id max: 3257561964
way id min: 73
way id max: 319342625
relation id min: 58
relation id max: 4436318
keyval pairs max: 255
keyval pairs max object: relation 62149
noderefs max: 1760
noderefs max object: way 204596511
relrefs max: 1143
relrefs max object: relation 2793118

--- go-osmpbf ---

real    1m40.932s
user    1m20.594s
sys 0m20.242s
total lines: 11252672
total nodes: 9861732
total ways: 1390940
shasum: (ec01984a0286c3ebb2ccb930de527c55c716e3b5  tmpfile)

--- py-imposm-parser ---

real    6m17.258s
user    7m11.600s
sys 0m4.261s
total lines: 11858571
total nodes: 10467631
total ways: 1390940
shasum: (2e33c1389e9bd168f1adf2e13b832a31e9208978  tmpfile)

I think the way I'm viewing CPU usages is wrong, for all the libs I see 100% on one core and little work being done on other cores, even with concurrency: 8.

cpu usage

@olt
Copy link

olt commented Jan 9, 2015

Sorry, imposm.parser returns all nodes as coords and all nodes with tags as nodes. That's by design and reflects how imposm itself works.

And you only see 100% on one CPU because you are actually benchmarking json.dumps :-)

@missinglink
Copy link
Contributor Author

so after playing with this for far more time than I wanted to, you're right, there is bunch of synchronous code using up all the CPU in the main thread...

since json.dumps and OrderedDict are so slow it's not fair to say this is benchmarking only the PBF parsing, as that bit is actually very fast.

if you just output a newline in each callback, you get much more even results, although that's not really useful to anyone.

--- go-osmpbf ---

real    0m42.266s
user    0m20.756s
sys 0m21.231s
total lines: 10947646
total nodes: 0
total ways: 0
shasum: (fe33124fdc3a7f4e983ad078173b58431ddcaa8e  tmpfile)

--- py-imposm-parser ---

real    0m39.956s
user    2m12.416s
sys 0m6.674s
total lines: 11537171
total nodes: 0
total ways: 0
shasum: (86ce397ec22a42d4317300dc7f63f94650dd4615  tmpfile)

@missinglink
Copy link
Contributor Author

after my changes above (to get rid of json and OrderedDict) this code outperforms the golang one, which is awesome!

$ ./run.sh 

--- pbf stats ---

file: /mnt/london_england.osm.pbf

lon min: -1.1149997
lon max: 0.8949991
lat min: 50.9410000
lat max: 51.9839997
nodes: 9861732
ways: 1390940
relations: 29157
node id min: 19
node id max: 3257561964
way id min: 73
way id max: 319342625
relation id min: 58
relation id max: 4436318
keyval pairs max: 255
keyval pairs max object: relation 62149
noderefs max: 1760
noderefs max object: way 204596511
relrefs max: 1143
relrefs max object: relation 2793118

--- go-osmpbf ---

real  1m38.972s
user  1m19.659s
sys 0m19.713s
total lines: 11252672
total nodes: 9861732
total ways: 1390940
shasum: (ec01984a0286c3ebb2ccb930de527c55c716e3b5  tmpfile)

--- py-imposm-parser ---

real  0m33.489s
user  1m39.732s
sys 0m4.643s
total lines: 11858571
total nodes: 10467631
total ways: 1390940
shasum: (21c12ce638e595a299d8c49857a7a331f59eb4b2  tmpfile)

@olt
Copy link

olt commented Jan 9, 2015

I guess that the go parser is not working parallel.
But I doubt that the PBF parsing is the real bottleneck of your ~20 day process (from README.md). Imposm3 parses and caches all nodes, ways and relations in around 15 minutes on a modern quad code. The whole import process takes about 4-5 hours.

@trescube
Copy link

Closed due to no longer supporting this repo.

@trescube trescube closed this May 30, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants