Skip to content
This repository has been archived by the owner on Nov 24, 2023. It is now read-only.

Load test the options for retrieving counts over time #1

Closed
lucyb opened this issue Apr 7, 2022 · 4 comments
Closed

Load test the options for retrieving counts over time #1

lucyb opened this issue Apr 7, 2022 · 4 comments
Milestone

Comments

@lucyb
Copy link
Contributor

lucyb commented Apr 7, 2022

We don't currently have an efficient way of calculating counts over time. So we should understand how much load a query will put on the database server and how long a user may have to wait until the results are available.

  • Test original version with a single wide CSV output, with month/week per column

  • Test --index-date-range, producing multiple CSVs and more queries

  • Test both against codelists with 100 and 1,000 codes

Links to opensafely-core/research-action#42 and opensafely-core/cohort-extractor#777, but is not dependent on them

@lucyb lucyb added this to the MVP milestone Apr 7, 2022
@lucyb lucyb changed the title Load test the query Load test the options for retrieving counts over time Apr 7, 2022
@sebbacon
Copy link

sebbacon commented Apr 8, 2022

Was just dreaming about this last night (!) and came here to say, the two aspects that may affect performance are (a) the size of the code list, and (b) the size of the patient population it matches.

So you will probably want to include codelists that cover common things like Full Blood Count pathology tests or blood pressure monitoring codes in your tests. @HelenCEBM and others will be able to advise

@lucyb
Copy link
Contributor Author

lucyb commented Apr 11, 2022

From Helen:

  • this ethnicity codelist has ~500 codes and should be present for the majority of the population (but not recorded particularly often per person)
  • this blood pressure codelist has ~100 codes and should be common and frequently repeated
  • this aortic aneurysm codelist has 18 codes and will be relatively uncommon
  • this has just 1 code and should be relatively rare

@HelenCEBM
Copy link
Contributor

Not sure if it's more effort than it's worth but could test the same codelist in 2 different subpopulations e.g. PSA test in men vs women to see how much difference the number of matches makes

@HelenCEBM
Copy link
Contributor

Could also refer to this data to find single codes that are rare vs common

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants