-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rank features groups obs #622
Conversation
@eroell this is exactly what I had in mind! Really like the None / all / some_list options too :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! A few minor comments.
Generally, I have the impression that you could leave out a few code comments. I myself comment a lot and am actually learning to reduce it a bit. Hence, I gave a few suggestions.
Perfectly fine.
Maybe we should check the dtype of the obs columns and shout at least if we saw dates/times?
Only if we're handling date and time. Otherwise LGTM |
Co-authored-by: Lukas Heumos <[email protected]>
Co-authored-by: Lukas Heumos <[email protected]>
Co-authored-by: Lukas Heumos <[email protected]>
Co-authored-by: Lukas Heumos <[email protected]>
Co-authored-by: Lukas Heumos <[email protected]>
Co-authored-by: Lukas Heumos <[email protected]>
Co-authored-by: Lukas Heumos <[email protected]>
New version now can consider highly variable features in This introduces two new arguments:
But it might be that this has become overengineered, with more pitfalls of how it could raise Errors in a confusing way @Zethson what do you think? Example import ehrapy as ep
import numpy as np
import pandas as pd
import scanpy as sc
def create_dummy_dataset_numerical_in_obs():
"""
Create a dummy dataset with numerical and non-numerical variables in obs.
Also, has numerical variables in .X.
"""
dummy_obs = {"disease":['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'C', 'C', 'C', 'C'],
"station": ['ICU', 'ICU', 'MICU', 'MICU', 'ICU', 'ICU', 'MICU', 'MICU', 'ICU', 'ICU', 'MICU', 'MICU'],
"syst_bp_entry": [138, 139, 140, 141, 148, 149, 150, 151, 158, 159, 160, 161],
"diast_bp_entry": [78, 79, 80, 81, 77, 78, 79, 80, 55, 56, 57, 58]}
dummy_var = pd.DataFrame({"Unit": ["mg/dl", "kg"]})
dummy_var.index = ['glucose', "weight"]
dummy_X = np.array([[80, 90, 120, 130, 80, 130, 120, 90, 80, 90, 120, 130],
[77, 76, 60, 90, 110, 78, 56, 76, 67, 82, 59, 81]]).T
adata_dummy = sc.AnnData(X=dummy_X, obs=dummy_obs, var=dummy_var)
adata_dummy.uns['numerical_columns'] = ['glucose', 'weight']
adata_dummy.uns['non_numerical_columns'] = []
adata_dummy.uns['encoded_non_numerical_columns'] = []
return adata_dummy
# classical way of using .X
adata_dummy = create_dummy_dataset_numerical_in_obs()
ep.tl.rank_features_groups(adata_dummy, groupby='disease')
ep.pl.rank_features_groups(adata_dummy) # added way of using .obs
adata_dummy = create_dummy_dataset_numerical_in_obs()
ep.tl.rank_features_groups(adata_dummy, groupby='disease', field_to_rank="obs")
ep.pl.rank_features_groups(adata_dummy) # added way of using both
adata_dummy = create_dummy_dataset_numerical_in_obs()
ep.tl.rank_features_groups(adata_dummy, groupby='disease', field_to_rank="layer_and_obs")
ep.pl.rank_features_groups(adata_dummy) # added way of using e.g. both, with selecting only a some variables
adata_dummy = create_dummy_dataset_numerical_in_obs()
ep.tl.rank_features_groups(adata_dummy, groupby='disease',
field_to_rank="layer_and_obs",
columns_to_rank={"var_names": ["glucose"],
"obs_names": ["syst_bp_entry", "station"]
}
) |
Bonus - pre-commit now starts to report ruff errors it so far did only locally for me - but did not yesterday, and enforcing a style I think is not kept in other files, too? Don't mind adapting to what ruff requests for this file, but might get a bit out of sync with style in other files I think |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Let's ignore ruff for now. Maybe it sorts itself out. If not, I'll have a proper look
PR Checklist
docs
is updated (via docstrings)Description of changes
Resolves #593.
As outlined in this issue, this PR adds an additional functionality for the user to use
rank_features_groups
: for the.obs
field only (as opposed to using.X
/.layers
).It does so by allowing the user to specify an additional argument
rank_obs_columns
inrank_features_groups
. It is by defaultNone
, and takes the values"all"
or a list of keys if only a selection of observation columns is wished/meaningful to be ranked.Technical details
Uses
move_to_x
andencode
(autodetect=True
,encodings="label"
). Makes no changes to the passed adata except for adding the respective entries inadata.uns
.Questions
Additional context
Example: