Load test the options for retrieving counts over time #1

lucyb · 2022-04-07T13:56:54Z

We don't currently have an efficient way of calculating counts over time. So we should understand how much load a query will put on the database server and how long a user may have to wait until the results are available.

Test original version with a single wide CSV output, with month/week per column
Test --index-date-range, producing multiple CSVs and more queries
Test both against codelists with 100 and 1,000 codes

Links to opensafely-core/research-action#42 and opensafely-core/cohort-extractor#777, but is not dependent on them

sebbacon · 2022-04-08T07:10:07Z

Was just dreaming about this last night (!) and came here to say, the two aspects that may affect performance are (a) the size of the code list, and (b) the size of the patient population it matches.

So you will probably want to include codelists that cover common things like Full Blood Count pathology tests or blood pressure monitoring codes in your tests. @HelenCEBM and others will be able to advise

lucyb · 2022-04-11T13:30:29Z

From Helen:

this ethnicity codelist has ~500 codes and should be present for the majority of the population (but not recorded particularly often per person)
this blood pressure codelist has ~100 codes and should be common and frequently repeated
this aortic aneurysm codelist has 18 codes and will be relatively uncommon
this has just 1 code and should be relatively rare

HelenCEBM · 2022-04-11T13:35:43Z

Not sure if it's more effort than it's worth but could test the same codelist in 2 different subpopulations e.g. PSA test in men vs women to see how much difference the number of matches makes

HelenCEBM · 2022-04-11T13:42:37Z

Could also refer to this data to find single codes that are rare vs common

lucyb added this to the MVP milestone Apr 7, 2022

lucyb changed the title ~~Load test the query~~ Load test the options for retrieving counts over time Apr 7, 2022

lucyb mentioned this issue Apr 21, 2022

Optimise SQL query to reduce runtime #42

Closed

lucyb closed this as completed Apr 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Load test the options for retrieving counts over time #1

Load test the options for retrieving counts over time #1

lucyb commented Apr 7, 2022 •

edited

Loading

sebbacon commented Apr 8, 2022

lucyb commented Apr 11, 2022

HelenCEBM commented Apr 11, 2022

HelenCEBM commented Apr 11, 2022

Load test the options for retrieving counts over time #1

Load test the options for retrieving counts over time #1

Comments

lucyb commented Apr 7, 2022 • edited Loading

sebbacon commented Apr 8, 2022

lucyb commented Apr 11, 2022

HelenCEBM commented Apr 11, 2022

HelenCEBM commented Apr 11, 2022

lucyb commented Apr 7, 2022 •

edited

Loading