-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WFS paging and parallelization support #70
Comments
@salvafern make sure to use WFS 2.0 version; AFAIK pagination in WFS is only supported in WFS 2.0, I see you used 1.1.0 |
Try with setting version 2.0.0 like this: wfs <- WFSClient$
new("https://geo.vliz.be/geoserver/Dataportal/wfs", "2.0.0", logger = "INFO")$
getCapabilities()$
findFeatureTypeByName("Dataportal:eurobis-obisenv_basic")
params <- "where%3A%28%28up.geoobjectsids+%26%26+ARRAY%5B2350%5D%29%29+AND+datasetid+IN+%28216%29%3Bcontext%3A0100%3Baphiaid%3A104464"
#with pagination
system.time(feature_pagination <- wfs$getFeatures(viewParams = params, paging = TRUE, paging_length = 1000))
justed tested the pagination and it worked |
Indeed now it works, thanks a lot! I have also tried now using the parellel options: Using parellelization and pagination togetherProbably I'm doing something wrong. I expected that multiple requests would be done for each chunk, but I just ran into an error. library(ows4R)
library(parallel)
wfs <- WFSClient$
new("https://geo.vliz.be/geoserver/Dataportal/wfs", "2.0.0", logger = "INFO")$
getCapabilities()$
findFeatureTypeByName("Dataportal:eurobis-obisenv_basic")
# Querying dataset: https://www.emodnet-biology.eu/data-catalog?module=dataset&dasid=8020
# ~500K rows
params <- "where%3Adatasetid+IN+%288020%29"
# With pagination and parellelization
cl <- makeCluster(detectCores() - 1)
cl
#> socket cluster with 15 nodes on host ‘localhost’
debug(wfs$getFeatures)
system.time(feature_parallel <- wfs$getFeatures(viewParams = params, resultType="results",
paging = TRUE, paging_length = 10000,
parallel = TRUE, parallel_handler = parallel::mclapply, cl = cl))
#> Error in CPL_read_ogr(dsn, layer, query, as.character(options), quiet, :
#> No layers in datasource.
#> Timing stopped at: 0.023 0 11.45 via The response in <?xml version="1.0" encoding="UTF-8"?>
<wfs:FeatureCollection
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:fes="http://www.opengis.net/fes/2.0"
xmlns:wfs="http://www.opengis.net/wfs/2.0"
xmlns:gml="http://www.opengis.net/gml/3.2"
xmlns:ows="http://www.opengis.net/ows/1.1"
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" numberMatched="408603" numberReturned="0" timeStamp="2022-03-31T07:57:57.251Z" xsi:schemaLocation="http://www.opengis.net/wfs/2.0 http://schemas.opengis.net/wfs/2.0/wfs.xsd"/> Using only parallelizationI tried comparing no parallelization vs parallelization with # No pagination nor parellelization
system.time(feature <- wfs$getFeatures(viewParams = params, resultType="results"))
#> [ows4R][INFO] WFSGetFeature - Fetching https://geo.vliz.be/geoserver/Dataportal/wfs?service=WFS&version=2.0.0&typeNames=Dataportal:eurobis-obisenv_basic&viewParams=where%3Adatasetid+IN+%288020%29&resultType=results&request=GetFeature
#> user system elapsed
#> 26.718 2.080 67.476
# Parallelization parLapply
system.time(feature_parallel <- wfs$getFeatures(viewParams = params, resultType="results",
parallel = TRUE, parallel_handler = parallel::parLapply, cl = cl))
#> [ows4R][INFO] WFSGetFeature - Fetching https://geo.vliz.be/geoserver/Dataportal/wfs?service=WFS&version=2.0.0&typeNames=Dataportal:eurobis-obisenv_basic&viewParams=where%3Adatasetid+IN+%288020%29&resultType=results&request=GetFeature
#> user system elapsed
#> 27.457 2.477 65.883
# Parallelization mclapply
system.time(feature_parallel2 <- wfs$getFeatures(viewParams = params, resultType="results",
parallel = TRUE, parallel_handler = parallel::mclapply, cl = cl))
#> [ows4R][INFO] WFSGetFeature - Fetching https://geo.vliz.be/geoserver/Dataportal/wfs?service=WFS&version=2.0.0&typeNames=Dataportal:eurobis-obisenv_basic&viewParams=where%3Adatasetid+IN+%288020%29&resultType=results&request=GetFeature
#> user system elapsed
#> 26.226 2.274 63.895 Many thanks again for the help! Let me know if I there is anything I can do. |
Yes, sounds they are issues with the parallelization, will have a look asap. |
If you want to use the cluster approach, you can use this handler : |
I got the same error :( feature_parallel <- wfs$getFeatures(viewParams = params, resultType="results",
paging = TRUE, paging_length = 10000,
parallel = TRUE, parallel_handler = parallel::parLapply, cl = cl)
#> Error in CPL_read_ogr(dsn, layer, query, as.character(options), quiet, :
#> No layers in datasource. |
@salvafern i don't forget this, i started working on it, but still looking into the best way to fix the parallel handlers. |
Hi @eblondel ,
I have been giving a try to
ows4r
to query biological occurrence data from EMODnet-BiologyIn this example below, I requested:
I got a WFS request using the EMODnet-Biology download toolbox (at the end of the selection, you can copy the WFS request in "Get webservice url")
Good news are that
viewParams
via vendor params work like a charm! (although I have to watch out for the encoding lifewatch/eurobis#15 (comment))I am having troubles however with the paging and parallel options. After some debugging, I think the issue might be that
ows4r
is relying on a param namednumberMatched
when usingresultstype = "hits"
at: https://github.com/eblondel/ows4R/blob/master/R/WFSFeatureType.R#L240And this is not being returned geo.vliz.be (should happen around: https://github.com/eblondel/ows4R/blob/master/R/WFSFeatureType.R#L291)
Could you have a look and see what is happening?
Thanks a lot!
Created on 2022-03-29 by the reprex package (v2.0.1)
This issue partly follows up #29
The text was updated successfully, but these errors were encountered: