generated from opensafely-core/repo-template
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix and test remaining dataset definitions in ehrQL examples #1697
Milestone
Comments
StevenMaude
added a commit
that referenced
this issue
Nov 2, 2023
This code has been subject to considerable work to get it into this form. However, it did not seem useful to retain the various approaches and versions of the code before this state. A quick guide to this code: * It finds any Markdown files in `docs/`. * It uses the SuperFences extension, as we do in the MkDocs configuration, to extract Markdown code blocks labelled with `ehrql` syntax. These are assumed to be self-contained dataset definitions. * The code blocks that will be tested should appear as code blocks in the documentation, by default (provided the CSS isn't changed to modify the appearance of code blocks somehow, which shouldn't be the case, because why would you?). They are identified in the parametrized tests by their ordinal fence number in the source file. * It finds any Python modules indicated by a `.py` extension. Python modules are assumed to be self-contained dataset definitions. * The found dataset definitions are run to generate a dataset, and the output checked to see if it's a CSV. There is some monkeypatching necessary to make this work: * `codelist_from_csv()` relies on having CSV data available, and the checks on valid codelist codes are patched out. Without further work, we don't have any direct way of including data for inline dataset definitions in Markdown source, or specifying which mock CSV data to use without any established convention for examples to use. #1697 proposes ideas to remove this monkeypatching further. * The sandboxing code is monkeypatched out to use "unsafe" loading of dataset definitions. Without doing so, it is not possible to monkeypatch any other ehrQL code: the ehrQL is run in a subprocess otherwise. For more details and discussion, see the related PR for this code (#1648) and the previous PR (#1475) which this approach replaces.
StevenMaude
added a commit
that referenced
this issue
Nov 2, 2023
This code has been subject to considerable work to get it into this form. However, it did not seem useful to retain the various approaches and versions of the code before this state. A quick guide to this code: * It finds any Markdown files in `docs/`. * It uses the SuperFences extension, as we do in the MkDocs configuration, to extract Markdown code blocks labelled with `ehrql` syntax. These are assumed to be self-contained dataset definitions. * The code blocks that will be tested should appear as code blocks in the documentation, by default (provided the CSS isn't changed to modify the appearance of code blocks somehow, which shouldn't be the case, because why would you?). They are identified in the parametrized tests by their ordinal fence number in the source file. * It finds any Python modules indicated by a `.py` extension. Python modules are assumed to be self-contained dataset definitions. * The found dataset definitions are run to generate a dataset, and the output checked to see if it's a CSV. There is some monkeypatching necessary to make this work: * `codelist_from_csv()` relies on having CSV data available, and the checks on valid codelist codes are patched out. Without further work, we don't have any direct way of including data for inline dataset definitions in Markdown source, or specifying which mock CSV data to use without any established convention for examples to use. #1697 proposes ideas to remove this monkeypatching further. * The sandboxing code is monkeypatched out to use "unsafe" loading of dataset definitions. Without doing so, it is not possible to monkeypatch any other ehrQL code: the ehrQL is run in a subprocess otherwise. For more details and discussion, see the related PR for this code (#1648) and the previous PR (#1475) which this approach replaces.
StevenMaude
added a commit
that referenced
this issue
Nov 2, 2023
This code has been subject to considerable work to get it into this form. However, it did not seem useful to retain the various approaches and versions of the code before this state. A quick guide to this code: * It finds any Markdown files in `docs/`. * It uses the SuperFences extension, as we do in the MkDocs configuration, to extract Markdown code blocks labelled with `ehrql` syntax. These are assumed to be self-contained dataset definitions. * The code blocks that will be tested should appear as code blocks in the documentation, by default (provided the CSS isn't changed to modify the appearance of code blocks somehow, which shouldn't be the case, because why would you?). They are identified in the parametrized tests by their ordinal fence number in the source file. * It finds any Python modules indicated by a `.py` extension. Python modules are assumed to be self-contained dataset definitions. * The found dataset definitions are run to generate a dataset, and the output checked to see if it's a CSV. There is some monkeypatching necessary to make this work: * `codelist_from_csv()` relies on having CSV data available, and the checks on valid codelist codes are patched out. Without further work, we don't have any direct way of including data for inline dataset definitions in Markdown source, or specifying which mock CSV data to use without any established convention for examples to use. #1697 proposes ideas to remove this monkeypatching further. * The sandboxing code is monkeypatched out to use "unsafe" loading of dataset definitions. Without doing so, it is not possible to monkeypatch any other ehrQL code: the ehrQL is run in a subprocess otherwise. For more details and discussion, see the related PR for this code (#1648) and the previous PR (#1475) which this approach replaces.
StevenMaude
added a commit
that referenced
this issue
Nov 2, 2023
This code has been subject to considerable work to get it into this form. However, it did not seem useful to retain the various approaches and versions of the code before this state. A quick guide to this code: * It finds any Markdown files in `docs/`. * It uses the SuperFences extension, as we do in the MkDocs configuration, to extract Markdown code blocks labelled with `ehrql` syntax. These are assumed to be self-contained dataset definitions. * The code blocks that will be tested should appear as code blocks in the documentation, by default (provided the CSS isn't changed to modify the appearance of code blocks somehow, which shouldn't be the case, because why would you?). They are identified in the parametrized tests by their ordinal fence number in the source file. * It finds any Python modules indicated by a `.py` extension. Python modules are assumed to be self-contained dataset definitions. * The found dataset definitions are run to generate a dataset, and the output checked to see if it's a CSV. There is some monkeypatching necessary to make this work: * `codelist_from_csv()` relies on having CSV data available, and the checks on valid codelist codes are patched out. Without further work, we don't have any direct way of including data for inline dataset definitions in Markdown source, or specifying which mock CSV data to use without any established convention for examples to use. #1697 proposes ideas to remove this monkeypatching further. * The sandboxing code is monkeypatched out to use "unsafe" loading of dataset definitions. Without doing so, it is not possible to monkeypatch any other ehrQL code: the ehrQL is run in a subprocess otherwise. For more details and discussion, see the related PR for this code (#1648) and the previous PR (#1475) which this approach replaces.
This doesn't cover examples in the surrounding OpenSAFELY documentation, for which there's another issue: opensafely/documentation#1373. |
I opened a separate issue for measure definitions not yet being tested: #1726. |
StevenMaude
added a commit
that referenced
this issue
Dec 13, 2023
Fixes #1697. These were left in, but not all of these were tested. We had a case where an ehrQL user actually tried to follow one of these broken examples, which led to some confusion.
StevenMaude
added a commit
that referenced
this issue
Dec 13, 2023
Fixes #1697. These were left in, but not all of these were tested. We had a case where an ehrQL user actually tried to follow one of these broken examples, which led to some confusion.
StevenMaude
added a commit
that referenced
this issue
Dec 13, 2023
Fixes #1697. These were left in, but not all of these were tested. We had a case where an ehrQL user actually tried to follow one of these broken examples, which led to some confusion.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
#1648 adds tests for most of the dataset definitions in the examples page. There were a couple left as it wasn't immediately clear how best to fix them. We should fix and test these.
Specifically these are the two sections below.
"What is the earliest/latest hospitalisation event matching some criteria?"
ehrql/docs/how-to/examples.md
Lines 493 to 527 in 260caf1
Issues
apcs.snomedct_code
which doesn't exist. This could be switched toprimary_diagnosis
which is a single ICD-10 code, orall_diagnoses
which is a string of diagnosis codes separated by semi-colons.dmd_code
."Finding the observed value of clinical events matching some criteria expressed relative to another value"
ehrql/docs/how-to/examples.md
Lines 661 to 684 in 260caf1
Issues
mean_hba1c
but does this by calculating themaximum_for_patient
.max_hba1c
, which doesn't exist.hba1c_codelist
where the value is the same asmax_hba1c
, finds the mean (which will be the same value) and then subtracts this frommax_hba1c
.What this might have been intended as
Maybe something like the following?
The text was updated successfully, but these errors were encountered: