-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Query parquet database data for selected area on click #41
Comments
…onsole, and only in proof of concept form, later we could load this via AWS-CDK; towards #41
Have addressed permissions issues, prototyped some example queries (e.g. basic queries of 3 million records takes about a second, apparently) and about to attempt to integrate basic query into app. I'll write up full approach later, just marking this milestone as it seems using Lambda to run an Athena query on Parquet file is now functional and appears promising. Next step is to invoke the Lambda from typescript and create a display using the retrieved data. For example, the following query when run in Test of Lambda within the AWS console:
returns,
In the above
The data will be interpreted in JSON more or less as:
which is a bit idiosyncrative (i.e. it shoes the variable names, then the values corresponding to these), but totally usable. Ideally, we'll reframe it so it returns
But that's just a detail. For now, the priorities are:
|
To abstract and reduce the data included in a map, and provide additional flexibility, rather than include these as column attributes (e.g. of administrative areas), the linkage ID of a selected area could be used to retrieve, summarise and display relevant data on click.
A possible application for this workflow relates to presenting socio-demographic charactersistics of areas based on analysis of highly granular synthetic population data. Rather than pre-aggregate statistics as averages for areas (which can bloat the spatial data retrieved), the distribution could be analysed on click and represented, for example as a box plot of age, stratified by gender or other categorical aspects.
One approach for this is to create a Parquet file that contains the additional attributes with linkage codes. Parquet is designed as an efficient format for querying. One way of doing this in an interactive web app is using a query engine like Amazon Athena.
The basic workflow is
User clicks on area >>> area id is passed to a lambda function >>> lambda function runs Athena query of parquet data in S3 bucket >>> Lambda returns summary >> Typescript is used to format summary for display to user
A sketch has been made of this workflow, but I have some lingering permissions issues impeding data retrieval needed to prototype this full workflow.
The text was updated successfully, but these errors were encountered: