Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include basic stats about study definition in standard log output #777

Open
evansd opened this issue Apr 6, 2022 · 1 comment
Open

Include basic stats about study definition in standard log output #777

evansd opened this issue Apr 6, 2022 · 1 comment
Assignees

Comments

@evansd
Copy link
Contributor

evansd commented Apr 6, 2022

In the first instance, this can be as simple as a count of the number of variables in the study definition. As long as it gets written to stdout it will end up in the logs.

The aim is for these to be machine readable, so it should use some easily greppable prefix and have a simple easily parsable syntax. Something like (though this is just an initial suggestion):

cohortextractor-stats: variable-count=123

The idea is to make it easier to debug and prevent potential performance problems by surfacing this information in a machine readable fashion.

@sebbacon
Copy link
Contributor

sebbacon commented Apr 7, 2022

Just to record a couple of ideas I'd had about where to log which I'd noted down last week - there may be better places:

  • len(output_columns) from here
  • min and max dates from here

We will also want to extract total running time stats, which we actually already log (e.g. the start time is here, not sure where the end time is); there's also other bits we could time, including temporary table downloads, total cohort generation time, time per index_date, etc. We could either log durations, or start and end times. We should probably add the same grep prefix to those log entries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants