This module provides direct read-only access to the Cumulus AWS RDS database. It can be used to query records much more quickly than using the Cumulus API. Retrieved results are stored in S3.
This lambda can be used by itself once deployed or it can be integrated into a workflow that uses Cumulus.
This is an example of a terraform module configuration block for the lambda:
module "ghrc_rds_lambda" {
source = "https://github.com/ghrcdaac/ghrc_rds_lambda/releases/download/v0.2.0/ghrc_rds_lambda.zip"
stack_prefix = var.prefix
region = var.region
layers = [aws_lambda_layer_version.cma-python.arn]
memory_size = 2048
timeout = 900
aws_decrypt_key_arn = module.cumulus.provider_kms_key_id
cumulus_lambda_role_arn = module.cumulus.lambda_processing_role_arn
cumulus_lambda_role_name = module.cumulus.lambda_processing_role_name
cumulus_message_adapter_dir = local.CUMULUS_MESSAGE_ADAPTER_DIR
cumulus_user_credentials_secret_arn = data.terraform_remote_state.data_persistence.outputs.user_credentials_secret_arn
s3_bucket_name = lookup(var.buckets.internal, "name", null)
subnet_ids = module.ngap.ngap_subnets_ids
security_group_ids = [
aws_security_group.no_ingress_all_egress.id,
data.terraform_remote_state.data_persistence.outputs.rds_security_group
]
}
It is possible to deploy the lambda without terraform though it might be difficult to ensure the subnet IDs and the
security groups are set up properly. To build the zip file for the --zip-file
lambda argument, clone the repo and run
python create_package.py
.
The code imposes some restrictions on the type of query that can be built and run on the lambda. Firstly, the cursor is used as read-only. Secondly, there is a simplified DSL for querying that restricts what can be passed to the query builder.
Below is an example AWS lambda test event that shows the format of the event that is expected:
{
"is_test": true,
"rds_config": {
"records": "",
"columns": [],
"where": "",
"limit": 0
}
}
rds_config
: Block required to contain the query items.records
: The Cumulus database table name to get records for.columns
: The columns to request from the database. This will default to*
if nothing is provided.where
: A Postgresql compliant where clause.limit
: The number of records to return. 0 means return all records that match the query and will default to 100 if not provided.is_test
: If true, the code will not be run as acumulus_task
and the input event will not go through the CMA.
The columns
, where
, and limit
keys are optional.
The lambda returns a dictionary with the following format:
{
"bucket": "prefix-name",
"key": "rds_lambda/query_results_1694108903180410167.json",
"count": 113192
}
bucket
: The bucket where the results are stored.key
: The S3 key of the results file. The numerical string is a epoc nanosecond value to prevent overwriting query results.count
: The number of records stored in the results file.