[Analysis page] Query for datasets via collection metadata? #658

j08lue · 2023-09-19T08:10:52Z

Currently, we query for datasets available for the user-defined area and date range of interest by asking STAC for all items and then finding all collections.

This approach has several issues - it is costly and a lot of data gets transferred to the client that is not needed and it currently does not return all collections that should be returned, probably because not all items are loaded due to some limit / no pagination.

We have been discussing adding an aggregation endpoint to the STAC API / pgSTAC that could perform these queries in the database. However, also there, the issue remains that these queries are very costly and pgSTAC (unlike ElasticSearch) is not too fast for them.

An alternative solution is to make use of the total bounding box and date range information on the STAC collection level: STAC collection metadata already contains this information and we would just need to do the intersection in the client. While this approach is less accurate than the item query for edge cases where data coverage is sparse with large gaps, it is a lot faster and could at least limit the number of collections to query.

We will push for developing an aggregation function on our STAC backend, but that will take a while to develop. In the meantime, replacing the current approach by the fast collection metadata method would be great

Acceptance criteria

Tested whether collection metadata can be used to query for collections that cover area/time of interest

anayeaye · 2023-09-19T15:16:55Z

A few quick thoughts here about high level full catalog searches without collection filters:

stac-api/collection/items/search|aggregation (answer specific questions)

When we implement some aggregation functionality, we will have lots of opportunity for innovation and will be able to support investigations like:

A disaster happened in this AOI, I want to know what VEDA collections have recent spectral data
I am doing a historical study and I want to know what VEDA collections have measured precipitation in an AOI-TOI
I am starting a project and want to see how much data is available by collection item count that match specified item metadata filters

stac-api/collections (provide a little spatial temporal info about all collections)

The collections endpoint gives us gross information about where and when collections have coverage. There is a lot of flexibility in the descriptive metadata we add to collection records including more precise geometry.

RE

An alternative solution is to make use of the total bounding box and date range information on the STAC collection level: STAC collection metadata already contains this information and we would just need to do the intersection in the client. While this approach is less accurate than the item query for edge cases where data coverage is sparse with large gaps, it is a lot faster and could at least limit the number of collections to query.

Relevant properties for using only the stac-api/collections response and one suggestion

extent.temporal
dashboard:is_periodic
dashboard:time_density
extent.spatial
New dashboard:continuous_spatial_distribution or :is_spotlight, ... or some other indicator of data that do not have the same spatial coverage at all times. This property could be used to trigger different behavior or to inform the user to not expect global coverage over the entire timespan of the dataset

Smallsat data explorer

For the case in which a user arrives at an explore interface and simply wants to know what collections have any data within a time and area of interest, we should look into how the smallsat explorer supports completely open ended searches with a sampling grid. Is this something we can do? I think the backend is very similar.
https://github.com/NASA-IMPACT/csdap-frontend/
https://csdap.earthdata.nasa.gov/explore/

hanbyul-here · 2023-09-20T19:10:31Z

I used the collections endpoint in #666. I think the main concern with this approach is that we can filter datasets only through their bbox, therefore spatially sparse datasets can have empty results. Check the preview and let me know what you think / if the filter can be better fine-tuned.

j08lue · 2023-09-20T19:27:18Z

Wow, that turnaround was quick.

I am sure we will hit the challenge with spatially (or temporally) sparse datasets eventually, but this solution is better than the current situation, at least for the GHG datasets. Rather show a bit too many datasets (and then have empty plots) than too few.

We need to make a few random tests and validate that the results are as expected. All datasets that (possibly) have any data within the query should be listed.

To address the spatial case in the future, maybe we could compute the real coverage upon ingest (union(existing_geom, new_geom)) and store that in addition to the max bbox. 🤷

@anayeaye

This PR uses `collections` endpoint to get all the collections, and filters them on the client side based on aoi/date range that user inputs. I followed the guidance provided in this issue: - #658 A few things to note - I am not sure how item search works. Currently, the code catches all the datasets with the bbox that intersects with the AOI && the date domain that overlaps with the selected date range. - `collections` endpoint doesn't offer a detailed spatial extent. The bbox is a convex hull that includes all the data points. If a dataset is sparse like nightlight and plume, a user can see an empty chart like the screenshot below. @anayeaye and I talked and it might be helpful to have a flag to signal that this dataset is spatially sparse. (something similar to `is_periodic` but for spatial extent.) ![Screen Shot 2023-09-20 at 2 35 39 PM](https://github.com/NASA-IMPACT/veda-ui/assets/4583806/7f87f3d7-9ad2-4fdd-a206-fd7fcab86e6b) @anayeaye thanks for your help 🙇 and let me know if you see anything unexpected! ## Related issues Supersedes / temporarily replaces #534

j08lue · 2023-09-28T13:31:01Z

Done!

hanbyul-here self-assigned this Sep 19, 2023

hanbyul-here mentioned this issue Sep 20, 2023

Use collections endpoint instead of search for analysis #666

Merged

j08lue closed this as completed Sep 28, 2023

j08lue mentioned this issue Oct 9, 2023

An analysis graph is generated by selecting a part of the map that does not have a layer, after selecting one that does have a layer. #396

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Analysis page] Query for datasets via collection metadata? #658

[Analysis page] Query for datasets via collection metadata? #658

j08lue commented Sep 19, 2023

anayeaye commented Sep 19, 2023

hanbyul-here commented Sep 20, 2023

j08lue commented Sep 20, 2023

j08lue commented Sep 28, 2023

[Analysis page] Query for datasets via collection metadata? #658

[Analysis page] Query for datasets via collection metadata? #658

Comments

j08lue commented Sep 19, 2023

Acceptance criteria

anayeaye commented Sep 19, 2023

stac-api/collection/items/search|aggregation (answer specific questions)

stac-api/collections (provide a little spatial temporal info about all collections)

Smallsat data explorer

hanbyul-here commented Sep 20, 2023

j08lue commented Sep 20, 2023

j08lue commented Sep 28, 2023