Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cohort Tracker #658

Merged
merged 50 commits into from
Mar 13, 2024
Merged
Show file tree
Hide file tree
Changes from 16 commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
decd34a
nb with stuff that at least does not just fail
eroell Feb 12, 2024
6407623
population logging and tracking first support
eroell Feb 13, 2024
cd7e113
refined and first-line debugged population tracking
eroell Feb 14, 2024
5ee6b5d
population to cohort
eroell Feb 14, 2024
2e28f2b
cohort logging with tests
eroell Feb 14, 2024
3673d28
toy notebook cleaned
eroell Feb 15, 2024
9687e4d
small comments included
eroell Feb 29, 2024
c6f5955
documentation working somewhat
eroell Feb 29, 2024
dcc3841
remove class in tests
eroell Mar 1, 2024
b844096
move read_csv to fixture
eroell Mar 1, 2024
75451bd
remove tracking dict, use tableones for tracking instead
eroell Mar 5, 2024
c42d847
remove DataFrame as accepted input
eroell Mar 5, 2024
232c2b1
legend label order matching bar order
eroell Mar 5, 2024
4090f6f
prepare type detection for alignment, added test
eroell Mar 5, 2024
4606155
add ax and remove unused args, return not solved yet
eroell Mar 6, 2024
2728ba5
tests for plots, move to pyplot for flowchart
eroell Mar 6, 2024
c8ff205
Update ehrapy/tools/cohort_tracking/_cohort_tracker.py
eroell Mar 9, 2024
3c7cdc9
Update ehrapy/tools/cohort_tracking/_cohort_tracker.py
eroell Mar 9, 2024
088983c
Update ehrapy/tools/cohort_tracking/_cohort_tracker.py
eroell Mar 9, 2024
e9dedbb
Update ehrapy/tools/cohort_tracking/_cohort_tracker.py
eroell Mar 9, 2024
711c710
Update ehrapy/tools/cohort_tracking/_cohort_tracker.py
eroell Mar 9, 2024
7f25d66
Update ehrapy/tools/cohort_tracking/_cohort_tracker.py
eroell Mar 9, 2024
331c095
Update ehrapy/tools/cohort_tracking/_cohort_tracker.py
eroell Mar 9, 2024
ee0b68f
Update ehrapy/tools/cohort_tracking/_cohort_tracker.py
eroell Mar 9, 2024
7472eaa
remove reset, add updated notebook for quick check
eroell Mar 9, 2024
89e5d5b
typehints and review comments
eroell Mar 9, 2024
ced853e
remove comment in test
eroell Mar 9, 2024
0d44608
tableone to requirements?
eroell Mar 9, 2024
482d0cb
allow typehint union
eroell Mar 12, 2024
da4f922
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 12, 2024
83c0cca
Fix scanpy pre-release compat
Zethson Mar 12, 2024
6327638
Remove anndata warning ignore
Zethson Mar 12, 2024
3e58315
future import fixed in test conf
eroell Mar 12, 2024
92f9112
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 12, 2024
057dd8f
track_t1 -> tracked_tables
eroell Mar 12, 2024
5589a22
updates with better names, label-dicts, better colors, more tests
eroell Mar 13, 2024
a7646a0
Merge branch 'pop-log' of github.com:eroell/ehrapy into pop-log
eroell Mar 13, 2024
e79c031
remove grid lines, add notebook for testimages generation
eroell Mar 13, 2024
d24f677
prettier docstring demo
eroell Mar 13, 2024
a4c021b
Update ehrapy/tools/cohort_tracking/_cohort_tracker.py
eroell Mar 13, 2024
b75cdbb
Update ehrapy/tools/cohort_tracking/_cohort_tracker.py
eroell Mar 13, 2024
d5cfa23
Update ehrapy/tools/cohort_tracking/_cohort_tracker.py
eroell Mar 13, 2024
5261de3
Update ehrapy/tools/cohort_tracking/_cohort_tracker.py
eroell Mar 13, 2024
61683f5
Update ehrapy/tools/cohort_tracking/_cohort_tracker.py
eroell Mar 13, 2024
9663cfb
remove old comments, better variable names, simplify adata check
eroell Mar 13, 2024
0bd391a
Merge branch 'pop-log' of github.com:eroell/ehrapy into pop-log
eroell Mar 13, 2024
d638a88
fix two doc typos
eroell Mar 13, 2024
50118d3
Merge branch 'main' of github.com:eroell/ehrapy
eroell Mar 13, 2024
2b78475
Merge branch 'main' into pop-log
eroell Mar 13, 2024
3c92f26
identical Returns field
eroell Mar 13, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
460 changes: 460 additions & 0 deletions cohort_tracking.ipynb

Large diffs are not rendered by default.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/docstring_previews/flowchart.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
10 changes: 10 additions & 0 deletions docs/usage/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -253,6 +253,16 @@ In contrast to a preprocessing function, a tool usually adds an easily interpret
tools.causal_inference
```

### Cohort Tracking

```{eval-rst}
.. autosummary::
:toctree: tools
:nosignatures:

tools.CohortTracker
```

## Plotting

The plotting module `ehrapy.pl.\*` largely parallels the `tl.\*` and a few of the `pp.\*` functions.
Expand Down
1 change: 1 addition & 0 deletions ehrapy/tools/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
from ehrapy.tools._sa import anova_glm, cox_ph, glm, kmf, ols, test_kmf_logrank, test_nested_f_statistic
from ehrapy.tools._scanpy_tl_api import * # noqa: F403
from ehrapy.tools.causal._dowhy import causal_inference
from ehrapy.tools.cohort_tracking._cohort_tracker import CohortTracker
from ehrapy.tools.feature_ranking._rank_features_groups import filter_rank_features_groups, rank_features_groups

try: # pragma: no cover
Expand Down
Empty file.
393 changes: 393 additions & 0 deletions ehrapy/tools/cohort_tracking/_cohort_tracker.py

Large diffs are not rendered by default.

31 changes: 31 additions & 0 deletions tests/conftest.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,39 @@
import os
from pathlib import Path

import pytest
from matplotlib import pyplot as plt
from matplotlib.figure import Figure
from matplotlib.testing.compare import compare_images


@pytest.fixture
def root_dir():
return Path(__file__).resolve().parent


# simplified from https://github.com/scverse/scanpy/blob/main/scanpy/tests/conftest.py
@pytest.fixture
def check_same_image(tmp_path):
eroell marked this conversation as resolved.
Show resolved Hide resolved
def check_same_image(
fig: Figure,
base_path: Path | os.PathLike,
*,
tol: float,
) -> None:
expected = Path(base_path).parent / (Path(base_path).name + "_expected.png")
if not Path(expected).is_file():
raise OSError(f"No expected output found at {expected}.")
actual = tmp_path / "actual.png"

fig.tight_layout()
fig.savefig(actual, dpi=80)

result = compare_images(expected, actual, tol=tol, in_decorator=True)

if result is None:
return None

raise AssertionError(result)

return check_same_image
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
145 changes: 145 additions & 0 deletions tests/tools/cohort_tracking/test_cohort_tracking.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
from pathlib import Path

import pytest

import ehrapy as ep
from ehrapy.io._read import read_csv

CURRENT_DIR = Path(__file__).parent
_TEST_DATA_PATH = f"{CURRENT_DIR.parent}/test_data_features_ranking"
_TEST_IMAGE_PATH = f"{CURRENT_DIR.parent}/_images"


@pytest.fixture
def adata_mini():
eroell marked this conversation as resolved.
Show resolved Hide resolved
return read_csv(f"{_TEST_DATA_PATH}/dataset1.csv", columns_obs_only=["glucose", "weight", "disease", "station"])


@pytest.mark.parametrize("columns", [None, ["glucose", "weight", "disease", "station"]])
def test_CohortTracker_init_vanilla(columns, adata_mini):
ct = ep.tl.CohortTracker(adata_mini, columns)
assert ct._tracked_steps == 0
assert ct.tracked_steps == 0
assert ct._tracked_text == []
assert ct._tracked_operations == []


def test_CohortTracker_type_detection(adata_mini):
ct = ep.tl.CohortTracker(adata_mini, ["glucose", "weight", "disease", "station"])
assert set(ct.categorical) == {"disease", "station"}


def test_CohortTracker_init_set_columns(adata_mini):
# limit columns
eroell marked this conversation as resolved.
Show resolved Hide resolved
ep.tl.CohortTracker(adata_mini, columns=["glucose", "disease"])

# invalid column
with pytest.raises(ValueError):
ep.tl.CohortTracker(
adata_mini,
columns=["glucose", "disease", "non_existing_column"],
)

# force categoricalization
ep.tl.CohortTracker(adata_mini, columns=["glucose", "disease"], categorical=["glucose", "disease"])

# invalid category
with pytest.raises(ValueError):
ep.tl.CohortTracker(
adata_mini,
columns=["glucose", "disease"],
categorical=["station"],
)


def test_CohortTracker_call(adata_mini):
ct = ep.tl.CohortTracker(adata_mini)

ct(adata_mini)
assert ct.tracked_steps == 1
assert ct._tracked_text == ["Cohort 0\n (n=12)"]

ct(adata_mini)
assert ct.tracked_steps == 2
assert ct._tracked_text == ["Cohort 0\n (n=12)", "Cohort 1\n (n=12)"]


def test_CohortTracker_reset(adata_mini):
ct = ep.tl.CohortTracker(adata_mini)

ct(adata_mini)
ct(adata_mini)

ct.reset()
assert ct.tracked_steps == 0
assert ct._tracked_text == []
assert ct._tracked_operations == []


def test_CohortTracker_plot_cohort_change_test_sensitivity(adata_mini, check_same_image):
ct = ep.tl.CohortTracker(adata_mini)

# check that e.g. different color triggers error
ct(adata_mini, label="First step", operations_done="Some operations")
fig1, _ = ct.plot_cohort_change(show=False, color_palette="husl")

with pytest.raises(AssertionError):
check_same_image(
fig=fig1,
base_path=f"{_TEST_IMAGE_PATH}/cohorttracker_adata_mini_step1",
tol=1e-1,
)


def test_CohortTracker_plot_cohort_change(adata_mini, check_same_image):
ct = ep.tl.CohortTracker(adata_mini)

ct(adata_mini, label="First step", operations_done="Some operations")
fig1, _ = ct.plot_cohort_change(show=False)

check_same_image(
fig=fig1,
base_path=f"{_TEST_IMAGE_PATH}/cohorttracker_adata_mini_step1",
tol=1e-1,
)

ct(adata_mini, label="Second step", operations_done="Some other operations")
fig2, _ = ct.plot_cohort_change(show=False)

check_same_image(
fig=fig2,
base_path=f"{_TEST_IMAGE_PATH}/cohorttracker_adata_mini_step2",
tol=1e-1,
)


def test_CohortTracker_flowchart_sensitivity(adata_mini, check_same_image):
ct = ep.tl.CohortTracker(adata_mini)

ct(adata_mini, label="Base Cohort")
ct(adata_mini, operations_done="Some processing")

# check that e.g. different arrow size triggers error
fig, _ = ct.plot_flowchart(show=False, arrow_size=0.5)

with pytest.raises(AssertionError):
check_same_image(
fig=fig,
base_path=f"{_TEST_IMAGE_PATH}/cohorttracker_adata_mini_flowchart",
tol=1e-1,
)


def test_CohortTracker_flowchart(adata_mini, check_same_image):
ct = ep.tl.CohortTracker(adata_mini)

ct(adata_mini, label="Base Cohort")
ct(adata_mini, operations_done="Some processing")

fig, _ = ct.plot_flowchart(show=False)

check_same_image(
fig=fig,
base_path=f"{_TEST_IMAGE_PATH}/cohorttracker_adata_mini_flowchart",
tol=1e-1,
)
13 changes: 13 additions & 0 deletions tests/tools/ehrapy_data/dataset1.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
idx,sys_bp_entry,dia_bp_entry,glucose,weight,disease,station
1,138,78,80,77,A,ICU
2,139,79,90,76,A,ICU
3,140,80,120,60,A,MICU
4,141,81,130,90,A,MICU
5,148,77,80,110,B,ICU
6,149,78,135,78,B,ICU
7,150,79,125,56,B,MICU
8,151,80,95,76,B,MICU
9,158,55,70,67,C,ICU
10,159,56,85,82,C,ICU
11,160,57,125,59,C,MICU
12,161,58,125,81,C,MICU
Loading