LEAF-4487 - large queries local server and code #2546

shaneodd · 2024-09-13T17:55:02Z

When a Large Query is detected, we will send their request to another larger container to do our processing. This will allow for longer running tasks and tasks that can consume more resources to process.

sequenceDiagram
    actor User
    participant Proxy
    participant API Container
    participant Large Query Container
    User->>Proxy: Send Request
    Proxy->>API Container: Forward Request
    API Container->>User: If Standard Query return results with header:LEAF_Is_Large_Query=FALSE
    API Container->>Large Query Container: If Large Query turn request over to Large Query Container
    Large Query Container->>User: Return results to user after process is done header:LEAF_Is_Large_Query=TRUE

What is a Large Query

No limit parameter set
Limit is > 10,000 records
Limit is > 1000 and <= 10,000 records AND more than 10 indicators are requested

Container Setup

NGINX

This has a header called LEAF_Is_Large_Query, this is to let the code know we want to run this on another container if it meets the Large Query criteria

NGINX API

The NGINX Container will have some reverse proxies to marshal where a request will end up. If we see that the return code is 306 it will redirect user to the Large Query Container. Headers will be sent out to help inform people looking into issues weather or not the process was a Large Query.

    fastcgi_read_timeout 3000;
    fastcgi_send_timeout 3000;

PHP-API

The php-api is the php portion that does the processing. Memory limit is set to 2gb, max requests are set to 1, this helps with memory leak issues (see below). On staging I had this set to 14GB however I am now wondering if I was running into an issue with memory, locally I was having it use 8GB of memory to return 33MB of data. I would then restart the containers then it would use less than 2GB.

www.conf

pm.max_requests = 1

php.ini

memory_limit 2048M

Memory Limit issues

Set your max_request for API and normal PHP fpm to be be 100. Pull a Large Query, the Large Query Container will eventually no longer garbage collect properly and start eating up memory. The memory used will go beyond what PHP would be set to pull. I have not been able to figure out how to reliably get this to happen, it will just happen. If I set it to 1 or 2 it will clear memory. Setting it to 0 will need more testing, I was hoping to find out how to trigger this before going down a path of figuring out all of these options.
https://www.php.net/manual/en/features.gc.collecting-cycles.php
https://tideways.com/profiler/blog/what-is-garbage-collection-in-php-and-how-do-you-make-the-most-of-it

In this screenshot I ran command watch free -mh, when I ran the process used would get close to equaling total and available would be down in the MB before the system locked up.

Additional options

The Large Query Container must be monitored as a separate service so that the response time metrics don't interfere with typical queries.

Testing

When testing you will see a header called "LEAF_Is_Large_Query" if it did not run on the Large Query Container you will see false if it did run you will see true

This query would be hit since there is no limit on the amount of data:
https://host.docker.internal/LEAF_Request_Portal/api/form/query/?q={%22terms%22:[{%22id%22:%22stepID%22,%22operator%22:%22!=%22,%22match%22:%22resolved%22,%22gate%22:%22AND%22},{%22id%22:%22deleted%22,%22operator%22:%22=%22,%22match%22:0,%22gate%22:%22AND%22}],%22joins%22:[%22status%22,%22initiatorName%22],%22sort%22:{},%22getData%22:[%229%22,%228%22,%2210%22,%224%22,%225%22,%227%22,%223%22,%226%22,%222%22]}&x-filterData=recordID,title,stepTitle,lastStatus,lastName,firstName
This query would be hit since there are 10 indicators selected and the limit is = 1000:
https://host.docker.internal/Test_Request_Portal/api/form/query/?q={%22terms%22:[{%22id%22:%22stepID%22,%22operator%22:%22!=%22,%22match%22:%22resolved%22,%22gate%22:%22AND%22},{%22id%22:%22deleted%22,%22operator%22:%22=%22,%22match%22:0,%22gate%22:%22AND%22}],%22joins%22:[%22status%22,%22initiatorName%22],%22sort%22:{},%22limit%22:10000,%22getData%22:[%229%22,%228%22,%2210%22,%224%22,%225%22,%227%22,%223%22,%226%22,%222%22,%22-7%22]}&x-filterData=recordID,title,stepTitle,lastStatus,lastName,firstName

…done coding wise

…rent server

…large queries there.

…reviously

… use headers in php previously.

…o to php

…I had forgotten I was in the middle of something

…issue

…t dying for some reason

…you need to make sure you point it to that config.

…nding on comments, clear up configs we do not need, some will just be what is in the php or nginx and do not need to be replicated.

…f, get a test in place to test the headers.

…-4487/large-queries-local-server

… double check server settings

…he latest large query definitions

…to move this to the other repo before finalizing things

…led a different way.

…away

…tus code). I would have thought this could be done without the proxy pass but I am thinking there may be some mechanism that allows it to easily slide the user to the correct spot.

mgaoVA

Please include a brief summary of the change in the PR's comment as the first line. This will help in the future because all of these PRs are incorporated into reports and changelogs. E.g. "This routes large queries to a specialized high-resource container"
The flowchart needs to be updated to clarify how the headers work, and what they specifically are. See https://info.aiim.org/aiim-blog/flowcharting-in-business-process-management for common flowchart conventions. Switching to a sequence diagram might be better overall since there's some back-and-forth: https://mermaid.js.org/syntax/sequenceDiagram.html
The headers should be simplified

This is unclear:
LEAF_Large_Queries
- pass_onto_large_query_server
- process_ran_on_large_query_server
Simplified:
LEAF_Is_Large_Query
- true
- false
Regarding slow rollout: I think we should rely on our autoscaling infrastructure to handle load.
Additional considerations involving monitoring/deployment: The Large Query Container must be monitored as a separate service so that the response time metrics don't interfere with typical queries.

shaneodd · 2024-12-03T20:41:37Z

I agree, I have updated that let me know how that looks to you.
I have this updated, I am not entirely happy with it, but that is what I have
I was thinking on changing pass_onto_large_query_server -> process_ran_on_api_server. When looking from the front end you will know where this process ran so if it is spitting other errors out you can pinpoint. For me trying to translate true/false into location of where it would run is a bit harder. This was something I have been trying to get some thoughts from Pete on.
Yeah this was when the server landscape was different, I replaced/removed it. rename "report programmer" #5 is what I ended up putting there since I think that fits perfectly.

…r where a process ran on. This will help with troubleshooting issues to give clues to where something had run on.

shane added 19 commits September 6, 2024 15:21

LEAF-4487 - work on a local configuration so I can show what we have …

de4f0f2

…done coding wise

LEAF-4487 - This is the code to allow large queries to run on a diffe…

354323b

…rent server

LEAF-4487 - add in header for the non api side so we know not to run …

ec56bbc

…large queries there.

LEAF-4487 - trying something with the template this is how I had it p…

ed395cf

…reviously

LEAF-4487 - move a header over

3f59279

LEAF-4487 - seting fastcgi param, this might be why I was not able to…

5dbc235

… use headers in php previously.

LEAF-4487 - look at some configs, I needed to work on getting the inf…

b6f6653

…o to php

LEAF-4487 - addin 502, something is happening up the line

1564f33

LEAF-4487 - adjust the caching stuff

3526a0f

LEAF-4487 - adjust the ports lets go at this a different r oute.

9863cd7

LEAF-4487 adjust naming and what ports we are connecting to. I think …

d46b7b6

…I had forgotten I was in the middle of something

LEAF-4487 - adjust the main pages, i think this might be part of the …

98e2030

…issue

LEAF-4487 - a container with no name

f47e9b5

LEAF-4487 - a container with no name

7b16d41

LEAF-4487 -swap the servers lets make sure the second server isnt jus…

1717cda

…t dying for some reason

LEAF-4487 - so when you want something to use a certain confiuration …

68386a5

…you need to make sure you point it to that config.

LEAF-4487 - get random data loading in here. that may be removed depe…

b173308

…nding on comments, clear up configs we do not need, some will just be what is in the php or nginx and do not need to be replicated.

LEAF-4487 - adjustments to keep from killing the dev box

0a1c4b9

LEAF-4487 - Get the response code setup to the best one I can think o…

de6b1ad

…f, get a test in place to test the headers.

shaneodd mentioned this pull request Sep 13, 2024

LEAF-4487 - large queries code #2549

Closed

shane added 5 commits September 17, 2024 09:13

Merge branch 'feature/LEAF-4487/large-queries-code' into feature/LEAF…

48eb5cf

…-4487/large-queries-local-server

LEAF-4487 - Combine the two branches and adjust the header, needed to…

8cbbae8

… double check server settings

Merge branch 'master' into feature/LEAF-4487/large-queries-local-server

02401e2

LEAF-4487 - adjustment to test from server changes

f9e8652

LEAF-4487 - adjustment to what a large query is, this lines up with t…

37a39dd

…he latest large query definitions

jampaul3 previously approved these changes Sep 19, 2024

View reviewed changes

shaneodd changed the title ~~LEAF-4487 - large queries local server~~ LEAF-4487 - large queries local server and code Sep 23, 2024

shane added 3 commits September 27, 2024 13:36

LEAF-4487 - adjust the tests to only run if the header is present

5ff09bb

LEAF-4487 - get the testing to not care where it is run. I will need …

aac9395

…to move this to the other repo before finalizing things

Merge branch 'dev' into feature/LEAF-4487/large-queries-local-server

794c768

shane added 3 commits October 28, 2024 11:30

remove api test so I can migrate master changes in, this will be hand…

b8e1554

…led a different way.

Docker file had changes from the master merge since x-test was going …

64dfd0b

…away

LEAF-4487 - remove rp nginx and the nginx api, we only want one.

896af16

shaneodd mentioned this pull request Oct 29, 2024

LEAF-4487 - Large Queries department-of-veterans-affairs/LEAF-Automated-Tests#11

Open

shaneodd marked this pull request as ready for review November 14, 2024 13:55

shaneodd requested review from mgaoVA, Pelentan, aerinkayne, pete-nerantzinis, nk2136 and KCN8 as code owners November 14, 2024 13:55

aerinkayne previously approved these changes Nov 19, 2024

View reviewed changes

Pelentan previously approved these changes Nov 19, 2024

View reviewed changes

Pelentan added the With QA Ticket is to QA. No changes unless pulled back to in progress label Nov 19, 2024

shane added 2 commits November 19, 2024 09:04

Merge branch 'master' into feature/LEAF-4487/large-queries-local-server

b227d3f

LEAF-4487 - this is my change, set it to 306 (I think that is the sta…

926f8e0

…tus code). I would have thought this could be done without the proxy pass but I am thinking there may be some mechanism that allows it to easily slide the user to the correct spot.

mgaoVA requested changes Dec 2, 2024

View reviewed changes

shaneodd dismissed stale reviews from Pelentan, aerinkayne, and jampaul3 via 926f8e0 December 4, 2024 20:31

LEAF-4487 - working on updating the header value to make it more clea…

2348273

…r where a process ran on. This will help with troubleshooting issues to give clues to where something had run on.

shaneodd requested a review from mgaoVA December 9, 2024 15:27

jampaul3 previously approved these changes Dec 10, 2024

View reviewed changes

Pelentan previously approved these changes Dec 10, 2024

View reviewed changes

LEAF-4487 - switch headers over to use the new name and value chosen

2560331

shaneodd dismissed stale reviews from Pelentan and jampaul3 via 2560331 December 10, 2024 16:13

shaneodd removed the request for review from KCN8 December 10, 2024 20:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LEAF-4487 - large queries local server and code #2546

LEAF-4487 - large queries local server and code #2546

shaneodd commented Sep 13, 2024 •

edited

Loading

mgaoVA left a comment

shaneodd commented Dec 3, 2024

LEAF-4487 - large queries local server and code #2546

Are you sure you want to change the base?

LEAF-4487 - large queries local server and code #2546

Conversation

shaneodd commented Sep 13, 2024 • edited Loading

What is a Large Query

Container Setup

NGINX

NGINX API

PHP-API

Memory Limit issues

Additional options

Testing

mgaoVA left a comment

Choose a reason for hiding this comment

shaneodd commented Dec 3, 2024

shaneodd commented Sep 13, 2024 •

edited

Loading