-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Degraded full text search performance on larger datasets w/ partial matching #172
Comments
Thanks for posting @jpcamara. I'm wondering at 25mm records vs. 10mm, if it's exceeding what can be stored in the shared buffers. The query plan may be “tipping” meaning it's doing something different like moving from an index scan to a sequential scan. We'd want a execution plan from a query with both amounts of row data, if you're able to set that up. Also, I presume you're talking about "warmed up" query performance (meaning, repeatedly running the same query with the same parameters) vs. cold performance. When buffers/pages are accessed (PostgreSQL storage) via queries, results are placed into shared buffers. When we run the query again with the same parameters, it will be a "warm" query when it reads from the cache, and usually performance is dramatically improved when the result is fully from memory. Check this out for more info: https://pganalyze.com/blog/5mins-explain-analyze-buffers-nested-loops To see what's happening, can you run the query with Keep in mind that for any bulk loading operations whether you're loading 10mm or 25mm records, run
|
@andyatkinson all great questions, many of which I should have provided in the initial issue 🤦🏼♂️ ! I will follow up with some more detailed analysis soon, thanks! |
@jpcamara Following up on this. Any thoughts? |
I was experimenting with table size from the "Implementing Full Text Search (FTS)" chapter and I found that at 25 million rows +
prefix: true
, the queries start to take a big perf hit (3-5 seconds).So for instance:
db/scripts/bulk_load.sh
to useFROM GENERATE_SERIES(1, 25000000) seq;
instead ofFROM GENERATE_SERIES(1, 10000000) seq;
On this ☝🏼, the performance is pretty good. I see anywhere from 80ms to 150ms.
But if I modify it to use
prefix: true
, so I can match on partial words, the performance takes a nosedive:At 10 million records, even with
prefix: true
, the performance stays pretty good. But as I ramped it up, around 25 million rows things get pretty bad and i'm curious if others have ideas on how to tune this for Rideshare?The text was updated successfully, but these errors were encountered: