Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cohort Tracker #658

Merged
merged 50 commits into from
Mar 13, 2024
Merged

Cohort Tracker #658

merged 50 commits into from
Mar 13, 2024

Conversation

eroell
Copy link
Collaborator

@eroell eroell commented Feb 14, 2024

PR Checklist

  • This comment contains a description of changes (with reason)
  • Referenced issue is linked
  • If you've fixed a bug or added code that should be tested, add tests!
  • Documentation in docs is updated

Description of changes
Resolves #646.
This PR suggests a new feature, the ep.tl.CohortTracker, which offers tracking and visualization of the cohort during data filtering.
It is inspired by the ideas outlined by Ellen et al., offering automation.

Technical details
The implementation is carried out by suggestion a new object, ep.tl.CohortTracker, which features methods

  • __call__ to track an AnnData or DataFrame object whenever filtering steps have been made
  • plot_cohort_barplot to plot a visual summarization of the cohort considered. This is a boiled down visual of a tableone summarization.
  • plot_flowchart to plot a flowchart of the filtering steps
    and has attributes
  • .tracked_steps (int), how many trackings have happened
  • .tracked_tables (list), tracked tableones at each step

Additional context
Example:

import ehrapy as ep

# load dataset
adata = ep.dt.diabetes_130(columns_obs_only=["gender", "race", "time_in_hospital_days"])

# instantiate the cohort tracker
pop_track = ep.tl.CohortTracker(adata, categorical=["gender", "race"])

# track the initial state of the dataset
pop_track(adata, label="Initial cohort")

# do a filtering step
adata = adata[:1000]

# track the filtered dataset
pop_track(adata, label="Cohort 1", operations_done="filtered to first 1000 entries")

# plot the change of the cohort
pop_track.plot_cohort_barplot()

# more settings allowed
pop_track.plot_cohort_barplot(
    legend_labels={
         0.0: "Female",
         1.0: "Male",
         "time_in_hospital_days": "Time in hospital (days)",
     },
     yticks_labels={"time_in_hospital_days": "Time in hospital (days)", "race": "Race"},
)

# plot a flowchart
pop_track.plot_flowchart()

(basic plot)
image
(more customization)
image

image

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link
Member

@Zethson Zethson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much!

Just a few general comments and then some random code comments.

  1. What do you think about actually combining both plots into one? That's what I thought we'd do. Having them separate is also a good option!
  2. I think that we'll have to discuss the naming of the functions and parameters a bit more but that's easily changed
  3. The colormap looks a bit weird. The colors are overlapping a bit too much I think?

ehrapy/tools/cohort_tracking/_cohort_tracker.py Outdated Show resolved Hide resolved
ehrapy/tools/cohort_tracking/_cohort_tracker.py Outdated Show resolved Hide resolved
ehrapy/tools/cohort_tracking/_cohort_tracker.py Outdated Show resolved Hide resolved
ehrapy/tools/cohort_tracking/_cohort_tracker.py Outdated Show resolved Hide resolved
plt.tight_layout()
plt.show()

def plot_flowchart(self, save: str = None, return_plot: bool = True):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way to style the graphviz plot? It's pretty ugly^^

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my limited experience here, going to matplotlib in our case seems to make more sense and certainly saves some pain. Not amazingly pretty, still

tests/tools/cohort_tracking/test_cohort_tracking.py Outdated Show resolved Hide resolved
tests/tools/cohort_tracking/test_cohort_tracking.py Outdated Show resolved Hide resolved
tests/tools/cohort_tracking/test_cohort_tracking.py Outdated Show resolved Hide resolved
tests/tools/cohort_tracking/test_cohort_tracking.py Outdated Show resolved Hide resolved
@eroell
Copy link
Collaborator Author

eroell commented Mar 6, 2024

Fairly big overhaul, now including some test for the plots, moving from graphviz to matplotlib, enhanced coloring, better documentation.

To your questions

  1. What do you think about actually combining both plots into one? That's what I thought we'd do. Having them separate is also a good option!

Yes, I thought to end up there as well. Started off with the two individual plots with the idea of combining them in the end. By now, I am not sure this is helpful anymore; The arguments for two very different plots would be mixed somehow. Making this requires quite some engineering and is open to Errors and additional maintenance, while the added value of having them beneath instead of underneath in a Notebook is marginal I think?

  1. I think that we'll have to discuss the naming of the functions and parameters a bit more but that's easily changed

More than happy for any inputs

  1. The colormap looks a bit weird. The colors are overlapping a bit too much I think?

Oh yes, I think the coloring now is a step forward at least giving some consistency of different colors vs different shadings.
I think one way or the other, at one point there can be too many different categories and types to track reasonably with such a plot based on colors: then, one does not get around to look at the logged tableone.

New questions

  • Make tableone a dependency of ehrapy?

tests/conftest.py Show resolved Hide resolved
ehrapy/tools/cohort_tracking/_cohort_tracker.py Outdated Show resolved Hide resolved
ehrapy/tools/cohort_tracking/_cohort_tracker.py Outdated Show resolved Hide resolved
ehrapy/tools/cohort_tracking/_cohort_tracker.py Outdated Show resolved Hide resolved
ehrapy/tools/cohort_tracking/_cohort_tracker.py Outdated Show resolved Hide resolved
ehrapy/tools/cohort_tracking/_cohort_tracker.py Outdated Show resolved Hide resolved
ehrapy/tools/cohort_tracking/_cohort_tracker.py Outdated Show resolved Hide resolved
ehrapy/tools/cohort_tracking/_cohort_tracker.py Outdated Show resolved Hide resolved
tests/tools/cohort_tracking/test_cohort_tracking.py Outdated Show resolved Hide resolved
@eroell
Copy link
Collaborator Author

eroell commented Mar 12, 2024

TODO's

  • future import for having | in typehints in python 3.9
  • 3.11 violin plots seem to caus issues in CI test suite (Lukas)
  • check grey bars in plots in notebook sometimes
  • track_t1 -> tracked_tables refactor
  • no color shading but need distinct colors (1. color shading illegible with >3 colors, 2. for categorical variable shading can be misleading)
  • plot_cohort_change -> plot_cohort_barplot
  • dict for legend and yaxis labels

@Zethson Zethson marked this pull request as ready for review March 13, 2024 11:11
Copy link
Member

@Zethson Zethson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments! Let's merge it afterwards.

You may want to delete the cohort_tracking.ipynb notebook from this PR

ehrapy/tools/cohort_tracking/_cohort_tracker.py Outdated Show resolved Hide resolved
ehrapy/tools/cohort_tracking/_cohort_tracker.py Outdated Show resolved Hide resolved
ehrapy/tools/cohort_tracking/_cohort_tracker.py Outdated Show resolved Hide resolved
ehrapy/tools/cohort_tracking/_cohort_tracker.py Outdated Show resolved Hide resolved
ehrapy/tools/cohort_tracking/_cohort_tracker.py Outdated Show resolved Hide resolved
ehrapy/tools/cohort_tracking/_cohort_tracker.py Outdated Show resolved Hide resolved
ehrapy/tools/cohort_tracking/_cohort_tracker.py Outdated Show resolved Hide resolved
ehrapy/tools/cohort_tracking/_cohort_tracker.py Outdated Show resolved Hide resolved
ehrapy/tools/cohort_tracking/_cohort_tracker.py Outdated Show resolved Hide resolved
@eroell eroell merged commit a11887b into theislab:main Mar 13, 2024
4 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tracking population changes throughout analysis
2 participants