-
Notifications
You must be signed in to change notification settings - Fork 145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Peppy #8
base: main
Are you sure you want to change the base?
Peppy #8
Conversation
…variant-calling into peppy_units
that is a thing of beauty. 🍦 |
validate(samples, schema="../schemas/samples.schema.yaml") | ||
|
||
units = pd.read_table(config["units"], dtype=str).set_index(["sample", "unit"], drop=False) | ||
units = p.subsample_table |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice! What about always converting the subsample_table index into string inside peppy (in order to get rid of the line below)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi johannes, it looks like this has been done (the line is now gone).
hey @johanneskoester what do you think? What's the next step? |
So, as I have seen from your repo, there is now a SnakeProject subclass? Can you tell me what is the difference to the main Project class? If possible I would love to not have a special case, but maybe there is some reason I do not see. |
It handles naming differences you requested removed from the workflow. |
Hey @johanneskoester -- it would be possible to not use the What do you think? Since we used different terminology to refer to these things, we have to either have an adapter, or we have to change one of the standards -- in this case we implemented an adapter. |
Changing from unit to subsample and the other colname changes would be fine for me. Regarding the rest:
|
OK
I'm fine with setting the index. Identifier column name(s) shouldn't be configurable, though. The solution in place bubbles knowledge of the schema with which standard data (like sample name and subsample name/index) may be referenced up to the level of the types themselves ( |
For your first point: we are also planning to adopt the snakemake (or similar) validation for PEPs as well. So, sounds good. we should address that soon. For your second point: I agree we should set the index in peppy and also I think that @vreuter is right that making the column name parameterizable could lead to some downstream trouble... one solution, maybe: could we take whatever column name the user provides and internally just standardize it to 'sample_name'? would solve your concern I think |
I think we have a misunderstanding here. With the parameter to |
Indeed! Can't speak for @nsheff , but I'd misinterpreted your suggestion. This sounds fine, but to minimally burden Snakemake workflow authors, would you want the default for the second component (besides name) of indexing for |
@johanneskoester, @vreuter -- where do we stand on this? I agree we can easily make this parameterizable. I think we should use |
The selection of columns to index, or certain names? Indexing flexibility is fine, but naming flexibility of particularly meaningful columns isn’t, at least in terms of how they’re referenced once the object is built. So yeah we could accept, for example, unit and subsample_name but have them referenced the same |
indexes are parameterize, not names.
I guess you can make it |
I am fine with renaming unit to subsample_name in our pipelines here. I would like to not have a snakeproject at all. If I am not wrong, once the indexing can be configured upon instantiating |
With the new release, can you give me a quick pointer how to adapt this PR to the new capabilities? Or do you want to do it yourself? |
@johanneskoester it looks like the latest release of Snakemake on PyPI targets Python 3.5 or at least supports it, but that this workflow uses string formatting introduced in PEP 498 that only targets Python 3.6? |
Good catch, that was not intended. Fixed in master. |
prjcfg.yaml
Outdated
sample_annotation: samples.tsv | ||
sample_subannotation: units.tsv | ||
sample_table: samples.tsv | ||
sample_subtable: units.tsv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be subsample_table, not sample_subtable, right?
@johanneskoester @nsheff
Note that
units_peppy.tsv
is included not for necessity, but to show how apeppy
user may encode a subsample/units table, and have it still function with the workflow.