Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug with a layer of biology_occurrence_data #164

Closed
maelle opened this issue Mar 21, 2024 · 13 comments · Fixed by #170
Closed

Bug with a layer of biology_occurrence_data #164

maelle opened this issue Mar 21, 2024 · 13 comments · Fixed by #170
Labels
documentation Improvements or additions to documentation

Comments

@maelle
Copy link
Collaborator

maelle commented Mar 21, 2024

@salvafern is this expected? The layer is "abiotic_observations".

library("EMODnetWFS")
service <- "biology_occurrence_data"
layers_info <- emodnet_get_wfs_info(service = service)
#> Loading ISO 19139 XML schemas...
#> Loading ISO 19115 codelists...
#> ✔ WFS client created successfully
#> ℹ Service: "https://geo.vliz.be/geoserver/Dataportal/wfs"
#> ℹ Version: "2.0.0"
layer <- layers_info[layers_info[["format"]] == "sf",][1,][["layer_name"]]
    emodnet_get_layers(service = service, layers = layer)[[1]] |>
        sf::st_crs() |>
        print()
#> ✔ WFS client created successfully
#> ℹ Service: "https://geo.vliz.be/geoserver/Dataportal/wfs"
#> ℹ Version: "2.0.0"
#> Warning: Download of layer "abiotic_observations" failed: Error in gsub("<!--.*?-->",
#> "", text): 'R_Calloc' could not allocate memory (18446744071562067968 of 1
#> bytes)
#> Coordinate Reference System: NA

Created on 2024-03-21 with reprex v2.1.0

Standard output and standard error
-- nothing to show --
@maelle maelle added the bug Something isn't working label Mar 21, 2024
@maelle
Copy link
Collaborator Author

maelle commented Mar 21, 2024

oh actually this gsub() happens in ows4r, from what I can tell. @eblondel have you encountered this problem before? Could it be solved with the switch to xml2? I am wondering whether there's another way to remove the XML comments. Now, if the XML is gigantic, probably there could be other issues later down the road. 😅

@maelle
Copy link
Collaborator Author

maelle commented Mar 21, 2024

@eblondel
Copy link
Contributor

@maelle The switch to xml2 from XML will take time for ows4R, and other packages with complex XML models. I've made the transition for some other packages. It's done in geosapi, i've started tackling it in atom4R, but there is not always clear equivalent of XML methods in xml2, and some of questions in xml2 still remain with answer. From the exercices I've run, I can say XML has still more flexibility in writing XML than xml2.

Here the issue you face is the size of the response, beyond the operation gsub. I'm not sure that having an xml2 object will solve the issue.

Let me try to remember why I've introduced this line to remove XML comments, surely it was not dealing with a WFS GetFeatures response for which we never get XML comments (at least in my experience). In worst case we could make this check optional depending on the service we target. This could solve the issue. I'll let you know when I find something

@eblondel
Copy link
Contributor

By the way, you can still bypass this by using another outputFormat to retrieve the features. CSV or JSON for example. Given the amount of the data, I would give a try to CSV.

@eblondel
Copy link
Contributor

Please see small enhancement done to ows4R above.
I can run this example:

requireo(ows4R)
WFS = WFSClient$new(url = "https://geo.vliz.be/geoserver/Dataportal/wfs", serviceVersion = "2.0.0", logger = "DEBUG")
ft = WFS$capabilities$findFeatureTypeByName("abiotic_observations", F)
sf = ft$getFeatures()

This example works, although I get a message Memory allocation failed : growing input buffer
Please note that you can here the maximum nb of features as declared in your Geoserver WFS Settings (ie 1,000,000 features).

Let me know

Cheers
Emmanuel

@maelle
Copy link
Collaborator Author

maelle commented Mar 22, 2024

Thanks so much!!

Please note that you can here the maximum nb of features as declared in your Geoserver WFS Settings (ie 1,000,000 features).

I don't understand, could you please clarify?

@eblondel
Copy link
Contributor

In the Geoserver admin interface, you have a WFS settings to control the maximum nb of features to be returned in a response. If a request exceeds this number, only the max nb of features will be retrieved. Depending on your data, this can be adapted. In this example, you reach the default threshold which is 1 million features
See https://docs.geoserver.org/main/en/user/services/wfs/webadmin.html#features

@maelle
Copy link
Collaborator Author

maelle commented Mar 22, 2024

@eblondel Merci beaucoup !

@bart-v @salvafern see above, should the setting be tweaked for the biology occurrence data service and if so, who would be in charge?

@maelle maelle added this to the rOpenSci submission milestone Mar 28, 2024
@maelle
Copy link
Collaborator Author

maelle commented Jul 1, 2024

@bart-v @salvafern friendly reminder 🙏

@bart-v
Copy link

bart-v commented Jul 1, 2024

I don't understand why this is a problem at the WFS server?
You clearly ask for too much data could not allocate memory
Just ask less and paginate...

@maelle
Copy link
Collaborator Author

maelle commented Jul 1, 2024

Ok, thanks, I was asking because of this comment: #164 (comment)

@bart-v
Copy link

bart-v commented Jul 1, 2024

There is 40M+ records, I'm sure it will even crash worse on that...
The default 1M limit is fine and can stay

@maelle maelle added documentation Improvements or additions to documentation and removed bug Something isn't working labels Jul 1, 2024
@salvafern
Copy link
Collaborator

As Bart said I think the 1M limit is reasonable. @eblondel was looking into giving support to pagination to ows4R (see eblondel/ows4R#70) but I can imagine this is not an easy task. So for now we can maybe close this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants