-
Notifications
You must be signed in to change notification settings - Fork 7
Data Visibility MVP Tech spec
Currently VA.gov activity data, including disability benefits claim submission data, is functionally inaccessible to Benefits Portfolio product teams, with the exception of a handful of engineers with command line access to query the production postgres database in vets-api. OCTO wants to develop a safer, more accessible, and more user-friendly way for teams to access this data.
Thus, VRO as a platform as an MVP will be resposible for safely and securly providing VRO partner teams within the Benefits portfolio the cliams data submitted via 526EZ
forms through va.gov.
In-order to make this happen, the VRO team is responsible for coordinating this effort via collaboration across the Benefits Portfolio, in particular with the Disability Benefits Experience team(s) who are familiar with the va.gov Postgres database and the needs of engineers working on va.gov benefits products.
- Disability benefits claim submission data is only avaiable via rails console in prod.
- Cannot use any BI/dashboarding tools to view metrics.
- Focused on the
526EZ
form benefits claims submission data - Data dump from production vets.gov postgres db happens daily into a s3 bucket through a another process.
- Data at rest is decrypted before being dumped into the bucket
- S3 bucket is already setup, encyrpted and secured via SSE-KMS or other AWS provided options
- Benefits claims data is available via sql initially from the VRO postgres db
- Utilize Kubernetes
cron job
to run a python script daily - Use Pandas or another dataframe python library to read the csv file, sanitize the data, filter any unwanted data, standardize datetime if nessessary.
- Keep track of processed csv file s3 bucket file names in a database.
- Store the processed claims data into the database using transactions.
- Re-try mechanism for errors when they happen during cron-job.
- Slack notification when a dump has been processed or failed.
- Create Datadog dashboard to monitor the cron jobs
- Generate fake data csv file without any PII to simulate daily dumps
- To emulate s3 bucket functionality locally, use local stack
- Rather than using docker-compose.yaml files for the container, use the kubernetes deployment files used for LHDI env locally and leverage
minikube
to run the container locally and it can be ingrained into the existing Gradle tasks. Added step will be that VRO will installing minikube.
graph TD
subgraph LHDI AWS
subgraph S3 Bucket
csv-files
end
DB[("(Platform)\nDB")]
end
csv-files <-.-> cron-job
local-stack-or-minio <-.-> cron-job
subgraph VRO
subgraph local-env[Local Env]
local-stack-or-minio[localstack to emulate S3]
local-db[("(Local)\nDB")]
end
subgraph Kubernetes
cron-job[Cron Job written in Python] -.->|Benefits Claims data| DB
subgraph cron-job
pandas-with-python[Python with Pandas]
end
end
end
pandas-with-python -.->|Errors and logs| DataDog
pandas-with-python -.->|Errors and Success Messages| Slack
pandas-with-python -.->|Store processed cron-job transaction history | DB
DB <-.-> data-visualization[Data Visualization tool]
style DB fill:#aea,stroke-width:4px
style cron-job fill:#AAF,stroke-width:2px,stroke:#777
style local-stack-or-minio fill:#AAA,stroke-width:2px,stroke:#777
style data-visualization fill:#BFD,stroke-width:2px,stroke:#777
- How do we handle storage of PII data because of possible ATO restrictions?
- What exactly does current data look like? This can help us design exception handling and job-retry mechanisms.
- Have a backup mechanism in place for the data in case of any failures or data loss