-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: esriIndex function to list all folders and services on ArcGIS server #39
Comments
Happy new year, @jacpete. This isn't an urgent issue but I figured you wouldn't mind if I tagged you to check if you had a chance to consider this feature request. Suggestions or feedback are welcome! |
Hey @elipousson, Thanks for the ping, end of the year got pretty crazy and I had forgotten about this issue. I will try to get you a response with my thoughts tonight after I get re-established in where I left off. Happy you're around to discuss stuff with. |
The new function looks great. It was something I had written down to add and I am glad someone beat me to it. Tested it out on a couple url's I use regularly and it worked as expected. My only note for the function in the current form is the inclusion of the new base pipe In response to the other comments:
I see what you mean here and I will work on a pull request to fix this.
I would also love to see this functionality combined in. Would you send an example of where it works and better yet where it doesn't and we can try to root out a fix so this functionality can be added in.
This is definitely been influenced by my personal taste. I tend to mix camelCase with underscores being used for separating parts of a name. For example the |
Here is an updated version of esriIndex <- function(url, parent = NULL, recurse = TRUE, layers = FALSE) {
urlServer <-
stringr::str_extract(url, ".+rest/services")
urlInfo <-
jsonlite::read_json(paste0(urlServer, "?f=json"), simplifyVector = TRUE)
folders <-
tibble::tibble(
name = as.character(urlInfo[["folders"]]),
type = "Folder",
url = paste0(urlServer, "/", urlInfo[["folders"]]),
parent = parent
)
services <-
tibble::tibble(
name = as.character(urlInfo[["services"]][["name"]]),
type = as.character(urlInfo[["services"]][["type"]]),
url = paste0(urlServer, "/", urlInfo[["services"]][["name"]], "/", urlInfo[["services"]][["type"]], recycle0 = TRUE),
parent = parent
)
urlIndex <-
dplyr::bind_rows(
folders,
services
)
if (layers) {
layers <- purrr::map2_dfr(
index$url, index$name,
~ layerIndex(url = .x, parent = .y)
)
urlIndex <-
dplyr::bind_rows(
urlIndex,
layers
)
}
if (recurse) {
if (length(folders[["name"]]) > 0) {
urlIndex <-
dplyr::bind_rows(
urlIndex,
purrr::pmap_dfr(
folders,
~ esriIndex(url = ..3, parent = ..1)
)
)
}
}
urlIndex
} Here is the layerIndex <- function(url, parent = NULL) {
layerInfo <- jsonlite::read_json(paste0(url, "/layers?f=json"), simplifyVector = TRUE)
if (!is.null(layerInfo) && (class(layerInfo[["layers"]]) == "data.frame")) {
layers <- dplyr::select(
layerInfo[["layers"]],
id, name, type, geometryType
)
dplyr::mutate(
layers,
url = paste0(url,"/", id),
parent = parent
)
} else {
NULL
}
} I could set up a pull request for the esriIndex function without the layers parameter now or just wait until we have a chance to work out the problems. |
I am doing a lot of work to rewrite and generalize the esriUrl_* functions right now that I hope to have done tonight or tomorrow. Wait on your pull request until I do a pull request for those changes. Just want to make sure we aren't messing up each other in the process. Once I get that fixed I will take a deeper look into the esriLayers function and see if I can spot what's going on. |
@elipousson Would you be able to give more details about this comment:
What kind of performance issues did you get? Were there unexpected errors in the R code or was the server returning error codes? I was trying to test out what the issue could be but I have no examples of the issue you were having. Just as a note the '/layer' subpage is only available for MapServer and FeatureServer urls. If a feature url for one of these services is provided it will truncate the url to the service type. #All Valid entries
esriLayers("https://sampleserver1.arcgisonline.com/ArcGIS/rest/services/Demographics/ESRI_Census_USA/MapServer/3")
esriLayers("https://sampleserver1.arcgisonline.com/ArcGIS/rest/services/TaxParcel/AssessorsBasemap/MapServer")
esriLayers("https://carto.nationalmap.gov/arcgis/rest/services/contours/MapServer")
#Not Valid (current return shown below)-but better error handling (explicit check for '/(FeatureServer|MapServer)/?$' needs added)
esriLayers("https://sampleserver1.arcgisonline.com/ArcGIS/rest/services/Elevation/ESRI_Elevation_World/GPServer")
# $error
# $error$code
# [1] 400
#
# $error$message
# [1] "Unable to complete operation."
#
# $error$details
# [1] "GPTask 'layers' does not exist or is inaccessible."
#
# |
Hello @jacpete! I expect you've been too busy to return to this project (I know I've had a hectic winter and spring) but wanted to check if you need any help with issue #41 or #40 or #43. I can't recall if all three need to be resolved in order to make it even possible to create an |
Hey @elipousson. Thanks again for the prod. I finally got time to finish #40. And I believe it fixes the urlencoding and the token issues mentioned here. I know this was just the tip of the iceberg for your feature request, but I am happy to start tackling it piece by piece now. I think #39 should focus on the esriIndex function you proposed. Can you attach a latest draft of the function after updating the package to include the changes in #40? I'd love to get this added into the package. I am going to close #41 since I think the latest updates should have addressed some of the issues with |
That all sounds great! Thanks so much for making time for these updates. I should have time next week or the following weekend to rewrite the I'd also love to add support for buffer distances (since the FeatureServer API has native support for it) although I think that should be tracked in a separate issue and come in a separate pull request. |
I should have waited until next week but instead I stayed up later to update Updated (2022-05-12): This is a good start but it needs some work on how the URLs are being built and around error handling if the server doesn't have the type of resource included in the "return" parameter. I need to avoid the temptation to keep working on this but I do plan to return in the next week or so. library(esri2sf)
esriLayerIndex <- function(url, parent = NULL, return = c("layers", "tables")) {
return <- match.arg(return, c("layers", "tables"), several.ok = TRUE)
layerIndex <- NULL
if (grepl(pattern = "(MapServer|FeatureServer)$", url)) {
serviceInfo <- esriLayers(url)
if (!is.null(serviceInfo)) {
if ("layers" %in% return) {
layerIndex <-
dplyr::select(
serviceInfo$layers,
id, name, type, geometryType
)
}
if ("tables" %in% return) {
layerIndex <-
dplyr::bind_rows(
layerIndex,
dplyr::select(
serviceInfo$tables,
id, name, type
)
)
}
layerIndex <-
dplyr::mutate(
layerIndex,
url = paste0(url, "/", id)
)
}
}
return(layerIndex)
}
esriIndex <- function(url, parent = NULL, return = c("services", "folders", "layers", "tables"), recurse = TRUE, layers = FALSE) {
return <- match.arg(return, c("services", "folders", "layers", "tables"), several.ok = TRUE)
if (esriUrl_isValidRoot(url) | esriUrl_isValidFolder(url)) {
rootUrl <- url
} else {
parsed <-
esriUrl_parseUrl(url)
rootUrl <-
paste0(parsed$scheme, paste0(c(parsed$host, parsed$instance, parsed$restIndicator), collapse = "/"))
}
urlInfo <-
esrimeta(rootUrl)
folders <- NULL
if ("folders" %in% return) {
folders <- urlInfo$folders
folders <-
dplyr::bind_cols(
name = as.character(folders),
type = "Folder",
url = paste0(rootUrl, folders),
parent = parent
)
}
services <- NULL
if ("services" %in% return) {
services <- urlInfo$services
if (is.null(parent)) {
services_url <- paste0(rootUrl, services$name, "/", services$type, recycle0 = TRUE)
} else {
services_url <- paste0(rootUrl, "/", services$type, recycle0 = TRUE)
}
services <-
dplyr::bind_cols(
services,
url = services_url,
parent = parent
)
}
urlIndex <-
dplyr::bind_rows(
folders,
services
)
if (any(c("layers", "tables") %in% return) && !is.null(services) && (nrow(services) > 0)) {
ValidServices <- vapply(services$url, esriUrl_isValidService, TRUE)
layerIndex <-
purrr::map2(
services$url[ValidServices], services$name[ValidServices],
~ esriLayerIndex(url = .x, parent = .y, return = return)
)
urlIndex <-
dplyr::bind_rows(
urlIndex,
layerIndex
)
}
if (recurse && !is.null(folders) && (nrow(folders) > 0)) {
urlIndex <-
dplyr::bind_rows(
urlIndex,
purrr::map2_dfr(
folders$url, folders$name,
~ esriIndex(url = .x, parent = .y, return = return)
)
)
}
return(urlIndex)
}
testIndex <- esriIndex(url = "https://sampleserver1.arcgisonline.com/ArcGIS/rest/services/")
dplyr::glimpse(testIndex)
#> Rows: 43
#> Columns: 4
#> $ name <chr> "Demographics", "Elevation", "Locators", "Louisville", "Network…
#> $ type <chr> "Folder", "Folder", "Folder", "Folder", "Folder", "Folder", "Fo…
#> $ url <chr> "https://sampleserver1.arcgisonline.com/ArcGIS/rest/services/De…
#> $ parent <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "Demographics", "De…
testIndex <- esriIndex("https://geodata.md.gov/imap/rest/services", return = "folders")
#> Error in value[[3L]](cond): Url is not a valid ESRI Service Url.
#> Error code: 403
#> Message: Access to this resource is not allowed
dplyr::glimpse(testIndex)
#> Rows: 43
#> Columns: 4
#> $ name <chr> "Demographics", "Elevation", "Locators", "Louisville", "Network…
#> $ type <chr> "Folder", "Folder", "Folder", "Folder", "Folder", "Folder", "Fo…
#> $ url <chr> "https://sampleserver1.arcgisonline.com/ArcGIS/rest/services/De…
#> $ parent <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "Demographics", "De…
testIndex <- esriIndex("https://geodata.md.gov/imap/rest/services/Transportation", return = c("services", "layers"))
dplyr::glimpse(testIndex)
#> Rows: 33
#> Columns: 3
#> $ name <chr> "Transportation/MD_AlternativeFuel", "Transportation/MD_Alternati…
#> $ type <chr> "FeatureServer", "MapServer", "FeatureServer", "MapServer", "Feat…
#> $ url <chr> "https://geodata.md.gov/imap/rest/services/TransportationTranspor… Created on 2022-05-12 by the reprex package (v2.0.1) |
This is taking things in a bit of a different direction but I fixed the issues with the prior draft of the index function and effectively rebuilt esriLayers and esrimeta using the {httr2} package. I started on this side track because I wanted to try out the "sitemap" parameter of the Catalog API service (which I think is what the Obviously, switching to {httr2} would require a separate issue and likely a dedicated development branch but it may be worth exploring. # Create an index of folders, services, layers, and tables for an ArcGIS Server
esriIndex <- function(url, parent = NULL, recurse = FALSE, ...) {
esriResp <- esriCatalog(url, ...)
index <- NULL
urlIndex <- url
if (!!length(esriResp[["folders"]])) {
folders <-
dplyr::bind_cols(
"name" = unlist(esriResp$folders),
"index" = "folder",
"type" = NA
)
index <-
dplyr::bind_rows(
index,
folders
)
}
if (!!length(esriResp[["services"]])) {
services <-
dplyr::bind_rows(esriResp$services)
services <-
dplyr::bind_cols(
services,
"index" = "service"
)
index <-
dplyr::bind_rows(
index,
services
)
}
if (is.null(index)) {
return(index)
}
urlbase <-
regmatches(
urlIndex,
regexpr(pattern = ".+(?=/)", text = urlIndex, perl = TRUE)
)
index <-
dplyr::mutate(
index,
url = NULL,
url = dplyr::if_else(
grepl(pattern = "/", x = name),
urlbase,
urlIndex
),
url = dplyr::case_when(
(index == "folder") ~ paste0(url, "/", name),
TRUE ~ paste0(url, "/", name, "/", type)
)
)
if (!is.null(parent)) {
index <-
dplyr::bind_cols(
index,
"parent" = parent
)
}
if (recurse) {
folderIndex <- subset(index, index == "folder")
if (nrow(folderIndex) > 0) {
folderIndex <-
purrr::map2_dfr(
folderIndex$url,
folderIndex$name,
~ esriIndex(url = .x, parent = .y, recurse = TRUE)
)
index <-
dplyr::bind_rows(
index,
folderIndex
)
}
layerIndex <- subset(index, type %in% c("MapServer", "FeatureServer"))
if (nrow(layerIndex) > 0) {
layerIndex <-
purrr::map2_dfr(
layerIndex$url,
layerIndex$name,
~ esriLayers(url = .x, parent = .y)
)
index <-
dplyr::bind_rows(
index,
layerIndex
)
}
}
return(index)
}
# Create an index of layers and tables from an ArcGIS Service
esriLayers <- function(url, parent = NULL, ...) {
esriResp <- esriCatalog(url, ...)
index <- NULL
if (!!length(esriResp[["layers"]])) {
layers <-
dplyr::bind_cols(
dplyr::bind_rows(esriResp$layers),
"index" = "layer"
)
index <-
dplyr::bind_rows(
index,
layers
)
}
if (!!length(esriResp[["tables"]])) {
tables <-
dplyr::bind_cols(
dplyr::bind_rows(esriResp$tables),
"index" = "table"
)
index <-
dplyr::bind_rows(
index,
tables
)
}
if (is.null(index)) {
return(index)
}
index <-
dplyr::bind_cols(
index,
"parent" = parent,
"url" = paste0(url, "/", index$id)
)
return(index)
}
# Get a catalog of folders, services, tables, and layers
esriCatalog <- function(url, format = "json", option = NULL, outSR = NULL, ...) {
format <- match.arg(format, c("json", "html", "kmz", "sitemap", "geositemap"))
req <- httr2::request(url)
req <- httr2::req_url_query(req = req, f = format)
resp <- httr2::req_perform(req = req)
if (format == "json") {
if (!is.null(option) && (option == "footprints")) {
req <- httr2::req_url_query(req = req, option = option)
if (!is.null(outSR)) {
req <- httr2::req_url_query(req = req, outSR = outSR)
}
}
json <- httr2::resp_body_json(resp = resp, check_type = FALSE, ...)
return(json)
} else if (format %in% c("sitemap", "geositemap")) {
sitemap <- httr2::resp_body_xml(resp, ...)
sitemap <- xml2::as_list(sitemap)
sitemap <- dplyr::bind_rows("url" = unlist(sitemap, use.names = FALSE))
return(sitemap)
}
} |
Got a chance to start looking through this. I started with # Get a catalog of folders, services, tables, and layers
esriCatalog <- function(url, format = "json", token = "", option = NULL, outSR = NULL, ...) {
format <- match.arg(format, c("json", "html", "kmz", "sitemap", "geositemap"))
req <- httr2::request(url)
req <- httr2::req_url_query(.req = req, f = format, token = token)
resp <- httr2::req_perform(req = req)
if (format == "json") {
if (!is.null(option) && (option == "footprints")) {
req <- httr2::req_url_query(req = req, option = option)
if (!is.null(outSR)) {
req <- httr2::req_url_query(req = req, outSR = outSR)
}
}
json <- httr2::resp_body_json(resp = resp, check_type = FALSE, ...)
return(json)
} else if (format %in% c("sitemap", "geositemap")) {
sitemap <- httr2::resp_body_xml(resp, ...)
sitemap <- xml2::as_list(sitemap)
sitemap <- dplyr::bind_rows("url" = unlist(sitemap, use.names = FALSE))
return(sitemap)
}
} I have access to a secure server at through work that host some MapServers privately behind authorization and can confirm that Notes:
|
That all sounds great. Thanks for taking a look at this so quickly. Would it be helpful for me to open a draft pull request for this (and incorporate the token parameter throughout) so we can test the functions as part of the package and incorporate the URL validity check functions where necessary? Or share updated code here but wait until you merge #48 before opening a new branch? |
I'd say go ahead and start a pull request and we can troubleshoot and check things together there. I did merge #48 so go ahead and rebase from master before you do your pull request.
|
Sounds great! Excited to dig into this but may need to chip away a little bit at a time over the next couple weeks. I think the difference between the layers endpoint and the catalog API service could also explain the slight performance difference (if it actually exists and isn't just my imagination). |
I'll fix up the documentation and add an example to the README so this will be ready to close whenever #50 can be merged. |
I think there it may be useful to allow users to return a list of all folders and services hosted by an ArcGIS server. I came up with a function (tentatively named esriIndex although esriStructure could also work) that does this:
Created on 2021-11-28 by the reprex package (v2.0.1)
A couple of notes:
esriUrl_isValid
only validates MapServer and FeatureServer URLs so the url cannot be passed toesrimeta()
without returning an error and the base url can't be extracted by theesri2sf::esriUrl_parseUrl()
function for the same reason. It may be helpful to allow these functions to accept server or folder URLs.esriLayers()
function but I ran into some performance issues when testing it out. Suggestions on how to incorporate this are welcome.esri2sf
andesri2df
functions used the convention of all lowercase function names. I noticed that more recent additions use a camel case convention that matches the conventions of ArcGIS web services. I stuck with the latter in this draft but wasn't sure if all lowercase may be preferred.The text was updated successfully, but these errors were encountered: