-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parametrized parser for speed #30
Comments
We also know when it contains a small union, and a function with a conditional like this should allow both optimisations: function wrap_wkb(::Type{Traits}, ::Val{HasZ}, ::Val{HasM}, ::Val{BigEndian}, wkb) where {Traits, HasZ, HasM, BigEndian}
if PolygonTrait <: Traits && _get_wkb_typenum(wkb) == POLYGON_CODE
return WKBPolygon{HasZ,HasM,BigEndian}(wkb)
elseif MultiPolygonTrait <: Traits && _get_wkb_typenum(wkb) == MULTIPOLYGON_CODE
return WKBMultiPolygon{HasZ,HasM,BigEndian}(wkb)
elseif MultiPoint <: Traits && _get_wkb_typenum(wkb) == MULTIPOINT_CODE
return WKBMultiPoint{HasZ,HasM,BigEndian}(wkb)
... # Repeat for all GeoInterface traits
else
error("type code not recognised")
end
end The compiler should be able to delete the code for branches not in the The idea is this would be used on a vector/column of geometries after a function barrier where the types were worked out for the whole column before the function barrier. In GeoPacakage this can be done with Nested geoms can use views into the wkb, maybe with some type of runtime information to deal with the fact some data is missing from child objects (like endianness) that is present if they're at the top layer. This should be able to mean all the cost is at the top layer and (The trait types should probably go in a wrapper rather than a raw type union, as in GeometryOps, but I left it simple for clarity) As you said @asinghvi17 we would just error in the nighmare datasets where e.g. endianness or HasZ changed down the column. Then the type parameter |
That sounds good! Another thing - I am not sure if this is the case, but a quick rematerializer for linestrings using For example, this works:
and we could do the same to get a quick reinterpretation of an entire linestring, rather than going point-by-point. I haven't benchmarked this for an entire linestring, but it's 13% slower (1.7ns vs 1.5ns) than indexing into a pre-constructed vector of tuples - which is not bad! |
Oh cool. Absolutely. Is it already stored like that in wkb? Things like this is where knowing |
You need to parse some stuff for the dimensions and the number of coordinates, but sure. I use reinterpret all over (in serializing mostly), so this would work fine. |
I think we want to do that outside the function barrier and then maybe check at each geometry that it matches, and either error or |
At least for values of "maybe has M" or "maybe has Z" it would have to be dynamic, but when it's defined that the geometry definitely has M or Z, that approach should be fine AFAICT. We would have to allow the user to choose whether to use a strict or dynamic parser in this case, since it's possible that people made flawed datasets. |
Totally. We could have |
# Parsing approach 2 - reduce to reinterpret
@benchmark collect(reinterpret(Tuple{Float64, Float64}, view($ls_as_wkb, (1+4+4+1):length($ls_as_wkb))))
# BenchmarkTools.Trial: 10000 samples with 334 evaluations.
# Range (min … max): 258.111 ns … 3.375 μs ┊ GC (min … max): 0.00% … 89.28%
# Time (median): 274.826 ns ┊ GC (median): 0.00%
# Time (mean ± σ): 280.685 ns ± 97.172 ns ┊ GC (mean ± σ): 1.11% ± 2.94%
# ▁▄█▇▅▂ ▁ ▂▁
# ▁▁▂▃▃▃▄▃▅███████▇████▆▆▄▄▄▄▃▃▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▃
# 258 ns Histogram: frequency by time 326 ns <
# Memory estimate: 160 bytes, allocs estimate: 4. which is not bad! |
Nice! Can we create a repo, or attach a few of these binary test/benchmark files to a release, so we can add this to CI, and test ourselves as well? |
In this case it's as simple as: import GeoFormatTypes as GFT, WellKnownGeometry as WKG, GeoInterface as GI
tups = tuple.(rand(300_000), rand(300_000))
geoms_as_points = GFT.val.(WKG.getwkb.(GI.Point.(tups)))
geoms_as_linestring = GFT.val(WKG.getwkb(GI.LineString(tups))) and try the different parsing approaches from there. # parse geoms as points, each WKB individually
@benchmark begin
map($(geoms_as_points)) do geom
only(reinterpret(Tuple{Float64, Float64}, view(geom, (5+1):length(geom)))) # this has to be parsed differently in EWKB and GeoPkg wkb
end
end # median time 3ms v/s regular parsing 300ms
# parse the whole linestring
@benchmark collect(reinterpret(Tuple{Float64, Float64}, view($ls_as_wkb, (1+4+4+1):length($ls_as_wkb))))
# median time 290ns vs unknown (still running)
# we could even drop collect here (there was an actual test geopkg file for reproducible coordinates, but I figure that doesn't matter so much for parsing). |
Looks like We could optimize that pretty easily no? |
If you know it's a line string, or know the dimension, yes, absolutely. |
It seems we haven't defined ncoord on things other than Point in GeoInterface? Did you fix this locally? julia> WKG.getwkb(GI.LineString(tups))
ERROR: MethodError: no method matching ncoord(::GeoInterface.LineStringTrait, ::GeoInterface.Wrappers.LineString{false, false, Vector{Tuple{Float64, Float64}}, Nothing, Nothing})
Closest candidates are:
ncoord(::Any)
@ GeoInterface ~/.julia/packages/GeoInterface/tuF7K/src/interface.jl:104
ncoord(::Union{GeoInterface.GeometryCollectionTrait, GeoInterface.LineStringTrait, GeoInterface.LinearRingTrait, GeoInterface.MultiLineStringTrait, GeoInterface.MultiPointTrait, GeoInterface.MultiPolygonTrait, GeoInterface.PointTrait, GeoInterface.PolygonTrait}, ::GeoFormatTypes.WellKnownBinary{GeoFormatTypes.Geom, <:AbstractVector{UInt8}})
@ WellKnownGeometry ~/code/WellKnownGeometry/src/wkb.jl:164
ncoord(::Union{GeoInterface.GeometryCollectionTrait, GeoInterface.LineStringTrait, GeoInterface.LinearRingTrait, GeoInterface.MultiLineStringTrait, GeoInterface.MultiPointTrait, GeoInterface.MultiPolygonTrait, GeoInterface.PointTrait, GeoInterface.PolygonTrait}, ::GeoFormatTypes.WellKnownText{GeoFormatTypes.Geom})
@ WellKnownGeometry ~/code/WellKnownGeometry/src/wkt.jl:136
Stacktrace:
[1] ncoord(geom::GeoInterface.Wrappers.LineString{false, false, Vector{Tuple{Float64, Float64}}, Nothing, Nothing})
@ GeoInterface ~/.julia/packages/GeoInterface/tuF7K/src/interface.jl:104
[2] _getwkb!(data::Vector{…}, type::GeoInterface.LineStringTrait, geom::GeoInterface.Wrappers.LineString{…}, first::Bool, repeat::Bool)
@ WellKnownGeometry ~/code/WellKnownGeometry/src/wkb.jl:113
[3] getwkb!
@ ~/code/WellKnownGeometry/src/wkb.jl:129 [inlined]
[4] getwkb(geom::GeoInterface.Wrappers.LineString{false, false, Vector{Tuple{Float64, Float64}}, Nothing, Nothing})
@ WellKnownGeometry ~/code/WellKnownGeometry/src/wkb.jl:56
Some type information was truncated. Use `show(err)` to see complete types. |
Huh, no this just worked for me...let me try it in a fresh session! |
No, I'm on latest released GeoInterface and WKG v0.2.2 (one patch version behind) Edit: trying this on WKG v0.2.3 fails as you posted. |
|
Monkeypatching GI.ncoord(::GI.Wrappers.WrapperGeometry{false, false}) = 2
GI.ncoord(::GI.Wrappers.WrapperGeometry{false, true}) = 3
GI.ncoord(::GI.Wrappers.WrapperGeometry{true, false}) = 3
GI.ncoord(::GI.Wrappers.WrapperGeometry{true, true}) = 4 fixes it for now, but this probably needs a better check. Maybe |
For what it's worth,
Which while still a bit unstable with ncoord, is already doable with 8.958 μs (15 allocations: 469.34 KiB). |
@asinghvi17 do you want to PR your patch to GI? I must have forgotten to add it ( or maybe we never had that, but we should) @evetion does I thin we need |
Yeah, it probably needs to be a tuple. And if the type has noop ncoord it should be pretty fast, even without refactoring it everywhere. |
It would be great to have a parameterized parser object for speed. Things we could specialize on are:
bswap
on the result Float64 or Int)The text was updated successfully, but these errors were encountered: