-
Notifications
You must be signed in to change notification settings - Fork 7
VRO Data Visibility Initiative
In 2023, the VRO Team identified significant challenges in accessing downstream Veteran claim data, which is critical for understanding the impact of changes made by various teams within the Benefits Portfolio. These teams need insights to assess whether their improvements are delivering the intended value to the claims process. The key challenges are as follows:
- All VA.gov activity data is stored in a Postgres database, accessible only through a production Rails console, making access risky and limited to a few individuals within the Benefits Portfolio.
- Visualizing and analyzing the data is difficult due to the lack of integration with tools that facilitate these tasks.
- Teams often rely on external resources to request custom data queries, but these external teams are frequently overwhelmed, leading to long delays or unfulfilled requests. The VRO platform was designed to streamline the VA's internal claims process, making it the right team to address this issue. The original goal of the project was to provide benefits claims data via SQL from the VRO Postgres database, which could be queried by partner teams. However, this initiative stalled due to competing priorities.
As of today, these challenges remain unresolved. If this problem is not addressed, BIE contention data on the VRO platform will continue to go unused, and the Contention Classification Team will remain in long queues, waiting to access data they may or may not receive. In light of this, the VRO Team has decided to revisit the Data Visibility initiative with a narrower focus.
The VRO Team is currently tracking five contention events through the BIE Kafka API in their RDS database. These events include contention associated with a claim, classified, updated, completed, and deleted, each containing metadata such as Veteran participant ID, action details, and timestamps. The goal of the MVP is to enable the Contention Classification Team to easily query these datasets and analyze the impact of accurate classification on claim processing times. Existing VRO data access methods, such as RabbitMQare not suited for historical data analysis. Likewise custom query requests are very time-consuming. To address this, the VRO Team has chosen Streamlit, a Python framework for building interactive data applications. Streamlit will interact directly with the VRO platform's database, providing secure and user-friendly data access for the Contention Classification Team.
The VRO Team expects that enabling the Contention Classification Team to use Streamlit will significantly streamline their ability to access and analyze contention data. Previously reliant on external support for data access, the team will benefit from an efficient, customizable solution tailored to their specific needs. In the long term, the VRO Team anticipates broader adoption of Streamlit across other teams within the Benefits Portfolio. They plan to connect Streamlit to additional data services and expand its use with the Employee Experience Team. The VRO Team also aims to enhance the tool's capabilities to allow users to independently download and analyze large datasets, eliminating the need for direct connections to data sources managed by VRO. To measure success, the VRO Team will regularly survey Partner Teams after new features or capabilities are released. For the MVP, they will specifically survey the Contention Classification Team to assess whether Streamlit delivers the expected improvements in data accessibility and analysis.
- Set up a base Python project in GitHub.
- Configure the project for the Streamlit application.
- Establish role-based access controls to ensure data visibility.
- Deploy the shell project to production for demonstration purposes.
- Integrate contention event data stored in RDS into the Streamlit application and determine any logging restrictions.
- Run Performance tests on the data sets
- Develop a homepage and a contention classification page for user interaction.
- Conduct research into the queries used by the Contention Classification Team to align outcomes with their needs.
- Collaborate with the Contention Classification Team to define roles and responsibilities within the tool.
- Develop a user guide and train developers on the Contention Classification Team to use Streamlit effectively.
- Assist developers in maximizing Streamlit's potential for analyzing contention events.
- Develop documentation for onboarding new partners and ensuring smooth adoption of Streamlit.
- Generate a post-launch survey to assess whether the tool has met the anticipated objectives for the Contention Classification Team.
All EP merge jobs are stored in RDS, allowing the Employee Experience Team to leverage Streamlit as an alternative to Datadog if necessary. The Benefits Disability Team requires assistance in analyzing large sets of call center data, and Streamlit could provide an efficient solution for rapid data analysis. The Employee Experience Team plans to use Streamlit to visualize claim lifecycle data once the VRO Team completes integration with the Claims API via the BIE Kafka service. The VRO Team has the potential to integrate with additional data services, offering significant benefits to other teams across the Benefits Portfolio.
- Training Risk: The VRO Team expects Streamlit to be user-friendly, but this assumption may change after collaboration begins with the Contention Classification Team on their specific use cases. Mitigation Plan: Collaborate with the Contention Classification Team early to identify training needs and ensure adequate support is provided.
- Roles and Responsibilities Risk: The VRO Team will develop the initial framework for the Streamlit application but intends to delegate ongoing customization to Partner Team engineers. Unclear roles and responsibilities could cause delays or miscommunication. Mitigation Plan: Engage with the Contention Classification Team early to define clear roles, responsibilities, and develop a user guide for future partner teams.
- Performance Risk: The Contention Classification Team has raised concerns about working with large datasets, so the VRO Team must proactively address performance testing. Mitigation Plan: After establishing a connection between Streamlit and RDS, the VRO Team will conduct baseline performance tests. Additionally, they will research the typical size of the datasets the Contention Classification Team handles to ensure the solution meets performance requirements.
- Maintenance and Scalability Risk: While the VRO Team anticipates Streamlit will require minimal maintenance and be easy to scale, this assumption can only be confirmed once the application is fully operational on the VRO platform. Mitigation Plan: Monitor the application closely during initial deployment to assess maintenance and scalability requirements and adjust plans as necessary to ensure long-term stability.
- PII and Data Restrictions Risk: The VRO Team may need to hide or obscure data classified as Personally Identifiable Information (PII). Ongoing discussions with app accessors are being held to address this concern. Mitigation Plan: Continue working closely with the VRO app accessors to establish clear data handling protocols that comply with PII regulations. Ensure that any sensitive data is properly anonymized or restricted to mitigate potential risks.
- VA.gov activity data, including disability benefits claim submission data, is functionally inaccessible to Benefits Portfolio product teams, except for a handful of engineers with command-line access to query the production database in vets-api.
- OCTO wants to develop a safer, more accessible, and more user-friendly way for teams to access this data.
- The VRO team is responsible for coordinating this effort via collaboration across the Benefits Portfolio, in particular with the Disability Benefits Experience team(s) who are familiar with the va.gov Postgres database and the needs of engineers working on va.gov benefits products.
Teams working to improve the end-to-end experience of digitally submitted disability benefit claims need access to va.gov activity data (including claim submission data) to learn about problems, validate ideas, troubleshoot issues, measure experiments, and iterate on solutions. However, teams can't easily access this data from the database where it's currently stored.
How is va.gov claim submission data trapped in this Postgres DB?
- VA.gov stores all activity in a Postgres database that can only be queried via prod Rails console, and the data can’t be used by BI tools/etc. to interact with the data. Due to the nature of some of the information (PII data) it is often not able to be logged.
- Only a handful of Benefits portfolio team members have access to query the production database via the prod Rails console.
Why not just get the data from the VBA side?
- While va.gov claim submissions end up in the Enterprise Data Warehouse (EDW, which sits one layer above the source-of-truth Corporate Data Warehouse, CDW) after claim establishment, all of EDW's claims that come from va.gov are slightly mislabeled when it comes to identifying va.gov as their source. While this mislabeling problem is currently being investigated in hopes of resolving it point forward (outside the scope of this effort), the problem will still apply to historical claims data (since Nov 2021) in EDW.
- Even disregarding that EDW data is slightly mislabeled, no one in OCTO has access to EDW; queries must go through VBA's PA&I team
What about getting the data via Kafka streams? The current thinking is that in the long-term, the ideal would be for the VES Event Bus Kafka service to act as the source for this data, however, OCTO would like to implement an intermediary solution without waiting for this option to solidify.
Va.gov activity data in its current state is functionally inaccessible for the majority of Benefits Portfolio teams' needs.
What's wrong with getting this data from the Postgres DB?
- Having engineers going into the prod Rails console and querying for data risks potentially impacting actual production and end-users, or altering real production data
- It's hard to look at large amounts of this data, because of the risk of overloading the actual production system (due to size of the queries)
- While only a small number of Benefits Portfolio engineers can access this data (Yang, Luke, Steve, and Kyle Soskin are the ones we're aware of), increasing that number would violate the principle of least privilege
- The folks who have access to query the database are not in support roles tasked with fielding requests for data, so there's no official way to ask for a data pull
What's wrong with requesting this data from PA&I?
- Given that PA&I fields requests from all across VBA and OCTO, it can take weeks or months for data requests to be fulfilled.
- Furthermore, because all of VBA's reporting is slightly mislabeled when it comes to va.gov as a claim source, it's not possible for PA&I to report on claims from va.gov with full confidence of accuracy.
OCTO needs a way for Benefits Portfolio teams to retrieve va.gov activity data in a safer, more accessible, and more user-friendly way:
- without using a prod command line to access the va.gov Postgres database,
- without lengthy turnaround times on the scale of weeks or months,
- and with confidence that the data we're seeing covers all va.gov 526 claims.
Ideally, non-engineers who need visibility into the data will be able to retrieve it for themselves without having to go through an engineer. Building upon that ideal scenario, we can imagine enabling the configuration of data dashboards to meet teams' specific and ever-present needs for data analysis and insights. And in a perfect world, this va.gov activity data would be matched up to "downstream" claim lifecycle data from EDW (available via kafka event topics) so that teams could follow claims from submission on va.gov through to claim completion in VBMS.
Note that the VA has a larger effort underway related to reducing/eliminating va.gov engineers' dependencies on interacting with the prod Rails console (as shared by Bill Chapman in the July Benefits Portfolio engineering all-hands) – our focus on data visibility represents just one aspect of this overall effort.
By December 31, 2023, Benefits Portfolio teams will have visibility to all disability benefit form data submitted on VA.gov.
More context:
- The whole benefits portfolio should be part of the discovery of needs even if we choose to implement early solutions that focus on a particular team or crew
- At minimum, "visibility" = Benefits portfolio engineers can pull data via a secure solution (e.g. read-only credentials, scoped only to claim data) that doesn't require them to have prod console access to the va.gov Postgres DB.
- Currently, "all disability benefit form data" is a hypothesis about what will be valuable to the portfolio teams. For now, we can assume that we're talking about 526EZ form submission data (including historical submission data), but we will refine expectations of what data to include based on further discovery that defines and prioritizes data visibility needs across Benefits Portfolio teams. Other types of data that may be prioritized include va.gov activity and error data.
Given VRO's mission to make it easy to build software to improve the VA's internal claims process, with particular emphasis on our vision of allowing teams to quickly build and validate product ideas, the VRO team is well positioned to lead the effort of identifying a pathway to deliver value in this problem space.
Our VA partners are asking the VRO team to:
-
Be responsible for coordinating this work. If VRO's research or roadmap requires work or input from other teams (such as DBEx), that's totally fine.
- Expectations:
- VRO should make it as easy as possible for other teams to stay informed and complete relevant tasks.
- This work should (as always!) follow OCTO's principle of working in the open. All chatter about this project should be in open channels for folks across the portfolio (i.e. #benefits-vro-support #benefits-portfolio #benefits-cft or in a place that’s new / opt-in / not 1:1 DMs).
- Expectations:
-
Collaborate with teams across the Benefits Portfolio to define and prioritize the needs related to the visibility of va.gov data. There are a variety of needs related to claim data across the portfolio, both at submission and beyond. Some are related to monitoring in the moment, and some are more driven by product/design, and researching historical data. The Disability Benefits Experience (DBEx) teams, given their knowledge of the va.gov database, should be primary collaborators on this effort.
- Required output:
- Recommend a prioritized set of needs as initial and subsequent areas of focus for the teams' efforts between now and the end of the year
- Expectations:
- Set up a touchpoint/meeting between VRO and DBEx team (or teams, but likely starting with DBEx Team 2) by first week of August
- Build out a comprehensive list of needs (and their relative priority/frequency)** through end of August and use that list to define our roadmap.
- Required output:
-
Collaborate with DBEx on shaping and scoping solutions to the portfolio's prioritized needs.
- Required output:
- Recommend a roadmap to deliver on the priority needs, including determining which team will implement which portions of which solutions (assuming the solutions include elements that span va.gov and VRO).
- Expectations:
- VRO should aim to implement the solution as far as possible, reducing dependency on DBex or any other teams if possible.
- Required output:
-
Implement against the agreed upon roadmap.
- Expectations:
- An MVP solution should be started by ~ end of August
- Expectations:
Our product owners recognize that a "productized" version of available data via dashboards and other tooling is a product in itself, somewhat separate from VRO as a platform. We are empowered to recommend the best structure for long-term maintenance and expansion of this work stream, however, the expectation is that VRO will lead the initial shaping and roadmapping of solutions to this set of problems, identify a path to quick value, and deliver on it.
-
Q: Who are the ultimate decision-makers about what can/can't be done with the va.gov data and what can/can't be built on the va.gov side? (We know it's all OCTO; is there anyone we don't know yet who's a key decision-maker?)
- A: Need to loop in/keep aligned with the ATO team for Lighthouse and va.gov (Jesse House has been the person Steve has talked to; #platform-security-review is the team's slack channel). Do the same with VRO's cATO contacts.
-
Q: Is it accurate to think of this effort as addressing one slice of the overall "Steve and Bill idea" (ie. Steve Albers & Bill Chapman's exploration of internal APIs and/or other solutions to reduce/eliminate dependencies on accessing the va.gov database via prod console)?
- A: Short answer, no. They're related problems but we don't want to create dependencies between these efforts.
-
Q: Technical question: is there an existing backup of the va.gov database?
- A: No. It's hard. System written in almost NoSQL fashion... it's not trivial to extract data from the DB; have to decrypt. Would be potentially a performance impact because it needs to run in the same space as production.
-
Q: For Steve & Cory: How baked are your ideas on where to start (e.g. replicate data to an S3 bucket)? We don't want to go back to square one if you're already feeling confident that there's an obvious first step we should take.
- A: Kyle Soskin is writing up a best practices guide on using Sidekiq for backend queries and we should wait for that before making decisions. There's a backlog ticket for CE teams (DBEx 2) to do this monthly data extraction -- but maybe this is a short-term thing, we might only want it once! It makes sense for them to own it, but in terms of capacity it might make sense for VRO to build it, if we can get to it sooner.
-
Q: When we talk about access as a problem, is [technical] skill part of the problem? For example, could we assume that everyone who needs access to this data can use SQL?
- A: Maybe for an MVP but since part of the problem is making the data accessible beyond engineers, probably can't assume e.g. all PMs are SQL fluent.
-
Q: What would overthinking this look like?
-
A: Don't over-engineer in the beginning. For example, if we felt like a JSON file or something would be insufficient and assumed we need complex data viz to solve the need. We CAN start small!
- We don't need all data in real-time. Figuring out which data is part of the question for us! Picturing a list that lays out, "We need this, this often, and this is the person who needs it" -- then prioritize this list.
- There's also underthinking it! Doing a data pull, dropping it on Sharepoint and calling it done is not taking holistic enough view of the problem. There are many needs!
- A good deliverable would be laying out what we can/should do now with sidekiq and which things should wait until data is available from EventBus.
- We're excited about the potential linkage between claim submission data and downstream claim lifecycle data (but don't start there!)
-
A: Don't over-engineer in the beginning. For example, if we felt like a JSON file or something would be insufficient and assumed we need complex data viz to solve the need. We CAN start small!