Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LEAF-4487 - large queries local server and code #2546

Open
wants to merge 34 commits into
base: master
Choose a base branch
from

Conversation

shaneodd
Copy link
Contributor

@shaneodd shaneodd commented Sep 13, 2024

When a Large Query is detected, we will send their request to another larger container to do our processing. This will allow for longer running tasks and tasks that can consume more resources to process.

sequenceDiagram
    actor User
    participant Proxy
    participant API Container
    participant Large Query Container
    User->>Proxy: Send Request
    Proxy->>API Container: Forward Request
    API Container->>User: If Standard Query return results with header:LEAF_Is_Large_Query=FALSE
    API Container->>Large Query Container: If Large Query turn request over to Large Query Container
    Large Query Container->>User: Return results to user after process is done header:LEAF_Is_Large_Query=TRUE 


Loading

What is a Large Query

  • No limit parameter set
  • Limit is > 10,000 records
  • Limit is > 1000 and <= 10,000 records AND more than 10 indicators are requested

Container Setup

NGINX

This has a header called LEAF_Is_Large_Query, this is to let the code know we want to run this on another container if it meets the Large Query criteria

NGINX API

The NGINX Container will have some reverse proxies to marshal where a request will end up. If we see that the return code is 306 it will redirect user to the Large Query Container. Headers will be sent out to help inform people looking into issues weather or not the process was a Large Query.

    fastcgi_read_timeout 3000;
    fastcgi_send_timeout 3000;

PHP-API

The php-api is the php portion that does the processing. Memory limit is set to 2gb, max requests are set to 1, this helps with memory leak issues (see below). On staging I had this set to 14GB however I am now wondering if I was running into an issue with memory, locally I was having it use 8GB of memory to return 33MB of data. I would then restart the containers then it would use less than 2GB.

www.conf

pm.max_requests = 1

php.ini

memory_limit 2048M

Memory Limit issues

Set your max_request for API and normal PHP fpm to be be 100. Pull a Large Query, the Large Query Container will eventually no longer garbage collect properly and start eating up memory. The memory used will go beyond what PHP would be set to pull. I have not been able to figure out how to reliably get this to happen, it will just happen. If I set it to 1 or 2 it will clear memory. Setting it to 0 will need more testing, I was hoping to find out how to trigger this before going down a path of figuring out all of these options.
https://www.php.net/manual/en/features.gc.collecting-cycles.php
https://tideways.com/profiler/blog/what-is-garbage-collection-in-php-and-how-do-you-make-the-most-of-it

image
In this screenshot I ran command watch free -mh, when I ran the process used would get close to equaling total and available would be down in the MB before the system locked up.

Additional options

The Large Query Container must be monitored as a separate service so that the response time metrics don't interfere with typical queries.

Testing

When testing you will see a header called "LEAF_Is_Large_Query" if it did not run on the Large Query Container you will see false if it did run you will see true

This query would be hit since there is no limit on the amount of data:
https://host.docker.internal/LEAF_Request_Portal/api/form/query/?q={%22terms%22:[{%22id%22:%22stepID%22,%22operator%22:%22!=%22,%22match%22:%22resolved%22,%22gate%22:%22AND%22},{%22id%22:%22deleted%22,%22operator%22:%22=%22,%22match%22:0,%22gate%22:%22AND%22}],%22joins%22:[%22status%22,%22initiatorName%22],%22sort%22:{},%22getData%22:[%229%22,%228%22,%2210%22,%224%22,%225%22,%227%22,%223%22,%226%22,%222%22]}&x-filterData=recordID,title,stepTitle,lastStatus,lastName,firstName
This query would be hit since there are 10 indicators selected and the limit is = 1000:
https://host.docker.internal/Test_Request_Portal/api/form/query/?q={%22terms%22:[{%22id%22:%22stepID%22,%22operator%22:%22!=%22,%22match%22:%22resolved%22,%22gate%22:%22AND%22},{%22id%22:%22deleted%22,%22operator%22:%22=%22,%22match%22:0,%22gate%22:%22AND%22}],%22joins%22:[%22status%22,%22initiatorName%22],%22sort%22:{},%22limit%22:10000,%22getData%22:[%229%22,%228%22,%2210%22,%224%22,%225%22,%227%22,%223%22,%226%22,%222%22,%22-7%22]}&x-filterData=recordID,title,stepTitle,lastStatus,lastName,firstName

shane added 19 commits September 6, 2024 15:21
…I had forgotten I was in the middle of something
…you need to make sure you point it to that config.
…nding on comments, clear up configs we do not need, some will just be what is in the php or nginx and do not need to be replicated.
…f, get a test in place to test the headers.
jampaul3
jampaul3 previously approved these changes Sep 19, 2024
@shaneodd shaneodd changed the title LEAF-4487 - large queries local server LEAF-4487 - large queries local server and code Sep 23, 2024
aerinkayne
aerinkayne previously approved these changes Nov 19, 2024
Pelentan
Pelentan previously approved these changes Nov 19, 2024
@Pelentan Pelentan added the With QA Ticket is to QA. No changes unless pulled back to in progress label Nov 19, 2024
shane added 2 commits November 19, 2024 09:04
…tus code). I would have thought this could be done without the proxy pass but I am thinking there may be some mechanism that allows it to easily slide the user to the correct spot.
Copy link
Collaborator

@mgaoVA mgaoVA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Please include a brief summary of the change in the PR's comment as the first line. This will help in the future because all of these PRs are incorporated into reports and changelogs. E.g. "This routes large queries to a specialized high-resource container"

  2. The flowchart needs to be updated to clarify how the headers work, and what they specifically are. See https://info.aiim.org/aiim-blog/flowcharting-in-business-process-management for common flowchart conventions. Switching to a sequence diagram might be better overall since there's some back-and-forth: https://mermaid.js.org/syntax/sequenceDiagram.html

  3. The headers should be simplified

    This is unclear:
    LEAF_Large_Queries

    • pass_onto_large_query_server
    • process_ran_on_large_query_server

    Simplified:
    LEAF_Is_Large_Query

    • true
    • false
  4. Regarding slow rollout: I think we should rely on our autoscaling infrastructure to handle load.

  5. Additional considerations involving monitoring/deployment: The Large Query Container must be monitored as a separate service so that the response time metrics don't interfere with typical queries.

@shaneodd
Copy link
Contributor Author

shaneodd commented Dec 3, 2024

  1. I agree, I have updated that let me know how that looks to you.
  2. I have this updated, I am not entirely happy with it, but that is what I have
  3. I was thinking on changing pass_onto_large_query_server -> process_ran_on_api_server. When looking from the front end you will know where this process ran so if it is spitting other errors out you can pinpoint. For me trying to translate true/false into location of where it would run is a bit harder. This was something I have been trying to get some thoughts from Pete on.
  4. Yeah this was when the server landscape was different, I replaced/removed it. rename "report programmer" #5 is what I ended up putting there since I think that fits perfectly.

@shaneodd shaneodd dismissed stale reviews from Pelentan, aerinkayne, and jampaul3 via 926f8e0 December 4, 2024 20:31
…r where a process ran on. This will help with troubleshooting issues to give clues to where something had run on.
@shaneodd shaneodd requested a review from mgaoVA December 9, 2024 15:27
jampaul3
jampaul3 previously approved these changes Dec 10, 2024
Pelentan
Pelentan previously approved these changes Dec 10, 2024
@shaneodd shaneodd dismissed stale reviews from Pelentan and jampaul3 via 2560331 December 10, 2024 16:13
@shaneodd shaneodd removed the request for review from KCN8 December 10, 2024 20:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
With QA Ticket is to QA. No changes unless pulled back to in progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants