-
Hi there Just wondering how I can use ehrql to categorise someone as died of a certain cause were the ICD-10 codes are in a codelist csv? I've tried to use .isin() but clearly that is not working - is there a way to use case for a list of conditions?
Thanks P.S. ehrql is amazing and much better than the cohort extractor - really phenomenal work guys!! |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
Ah sorry, this is our fault! There were some typos in some of documentation examples which used So if you just add underscores in two places to your code then it should all work. You also have the option, instead of using
But use whichever seems clearest to you. And thanks for the encouraging words! |
Beta Was this translation helpful? Give feedback.
-
Thanks that has now worked. I have two follow up questions: QUESTION 1: When I run the code now, it seems that t only generates patients who have died of covid, or an unknown cause (ie only uses the two values rather than autogenerate some with non-covid deaths).
Then when I look at the data:
I'm not sure what I am doing wrong here? QUESTION 2 I only want a cohort of participants that have diet so far (I'm looking at cause of deaths). #DEFINE POPULATION - all patients who have died (NB end_date is "today") However when I generate the date of death later on: last_ons_death = ons_deaths.sort_by(ons_deaths.date).last_for_patient() It seems that I have patients that are (no death date) I assume this is to mimic "missing data" which would appear in the dataset but there is knowledge that the patient has died (somehow?) |
Beta Was this translation helpful? Give feedback.
-
What's happening here is that, in both cases, you're bumping up against the limits of the dummy data generator, which is currently still quite simplistic. It's an area that we still want to do a lot of work on, but it's a fundamentally quite difficult problem so we're trying to understand how far we can get with the current system before we develop new solutions. In the first case, the problem is that the dummy data generator doesn't "know" about any codes other than the ones it sees in your dataset definition. So when it's picking random codes to populate your data it's only picking from ones in the COVID list. In the second case, it doesn't "know" that the ONS death certificate table should match up with date of death in the primary care record. You have a few options here: 1. Live with itDummy data only exists to allow you to check that your code works before running it against real data. So, in a sense, it doesn't matter if it's wildly unrealistic as long as it exercises your analysis code correctly. If it's possible to tweak your code to cope with the oddities of the dummy data without doing too much violence to it then one option is just to do that and live with the dummy data being weird. 2. Supply your own dummy datasetYou can supply your own dummmy dataset and bypass the dummy data generator entirely by using the The downside here is that you'll be responsible for updating this file to match any changes you might make to your dataset definition. ehrQL should warn you if the dummy dataset file no longer matches the format it expects, but it won't be able to fix it for you. 3. Supply your own dummy tablesYou can use the Give You can edit this data however you wish and then run The advantage of this approach is that you have more flexibility to change your dataset definition without necessarily having to make any manual changes to your dummy tables. As long as your dummy tables contain enough data to work with then ehrQL will just recompute your new query against the old dummy tables. The downside is that it can be more fiddly to edit the dummy tables in the first place because changing the data for a single dummy patient may require edits across multiple different tables. Hope that's enough to unblock you. We know that the dummy data system needs work, and we need better how-to documentation for the workarounds above, but we're trying to focus on fixing specific blockers for our researchers at the moment. |
Beta Was this translation helpful? Give feedback.
Ah sorry, this is our fault! There were some typos in some of documentation examples which used
isin
but the actual method name isis_in
:https://docs.opensafely.org/ehrql/reference/language/#CodePatientSeries.is_in
So if you just add underscores in two places to your code then it should all work.
You also have the option, instead of using
~
, to write:But use whichever seems clearest to you.
And thanks for the encouraging words!