metadata naming #281

nsheff · 2019-03-21T14:25:03Z

Originally posted by @johanneskoester in snakemake-workflows/dna-seq-gatk-variant-calling#6

Another thing. Shouldn't the project.yaml read like this for consistency:

metadata:
    sample_table: path/to/samples.tsv
    subsample_table: path/to/subsamples.tsv

Given our conclusions in #280 this seems logical. We originally had merge_table, which changed to sample_subannotation.

The text was updated successfully, but these errors were encountered:

nsheff · 2019-03-21T14:28:36Z

I'm not really sure why we used sample_annotation originally as opposed to sample_table, but we have been doing it that way since the beginning. To my ear right now, I actually prefer sample_table...

@vreuter -- maybe you won't like it from the dev perspective? What do you think of another major name change like this? @afrendeiro ?

vreuter · 2019-03-21T14:33:32Z

I'm pro big name changes, anti backwards compatibility, so what to do about sample_annotation determines my opinion.

nsheff · 2019-03-21T14:39:22Z

This is one where we'd definitely need to keep backwards compatibility, because sample_annotation is widespread.

vreuter · 2019-03-21T14:43:28Z

Right, the point was to preemptively clarify the reason for the dislike.

vreuter · 2019-03-21T14:45:21Z

The great part is that with an impending 1.0, we have a major point at which we can remove all past support, right? ;D

nsheff · 2019-03-21T14:52:36Z

remove all past support, right

I'm getting cold feet on that 😄 but yeah. I'm still on board for now

But the more specific question here is: which do you prefer: sample_annotation or sample_table, regardless of other issues?

vreuter · 2019-03-21T15:32:41Z

I'm getting cold feet on that smile but yeah. I'm still on board for now

My preference is clear, but it's your call obviously. Totally agree that breadth of usage is a big deal with sample_annotation. OTOH, it should be almost or entirely superficial or in our control. In other words, I guess that most if not all usages in places besides our own packages/libraries are in end-use context (individual analysis projects), where a change wouldn't propagate a requirement to update beyond that end use itself.

The place that comes to mind as potentially not that way is @afrendeiro 's toolkit, through which impact could extend to another layer(s) of users. But like for anything with caravel, pepr, etc. we could change relatively inconsequentially (since probably no/few external projects depend on those, right?)

As a separate benefit to converging on a single key/name for the metadata sheet, I could also see someone wanting to use the sample_annotation key to (more commonly?) denote genomic annotation rather than sample metadata, though probably in that case something like refann or genome_annotation makes more sense.

But the more specific question here is: which do you prefer: sample_annotation or sample_table, regardless of other issues?

I agree with @johanneskoester (and @nsheff ?) that it'd be ideal to align the end user's encoding and the API; my preference is for that parallelism, regardless of which one it is.

afrendeiro · 2019-03-27T19:52:47Z

I honestly don't have a strong preference for either but would argue that changes like this should be decided as soon as possible.
I'm also working towards a release of my library with a stable API and would appreciate if upstream things were stable sooner rather than later, but I know this just takes time to figure out what's best :)

I was reading this article recently on the development of pandas towards 1.0 and they do something releases where the main goal is to warn of all deprecations with backward compatibility although with already the new API in place and release 1.0 is exactly the same just with backward breaking changes removed.
We could also do something like that...

nsheff · 2019-03-27T20:21:39Z

We could also do something like that...

Yes, that's exactly what we're planning to do.

nsheff · 2020-02-18T13:24:27Z

In the end we went with sample_table and subsample_table, right?

vreuter added a commit to vreuter/peppy that referenced this issue Mar 28, 2019

use sample_table for annotations sheet reference; pepkit#281

db00139

vreuter added a commit to vreuter/peppy that referenced this issue Mar 28, 2019

amid testing metadata overhaul; pepkit#281

1a3d763

vreuter self-assigned this Apr 30, 2019

vreuter added the likely-solved label Apr 30, 2019

vreuter closed this as completed May 2, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

metadata naming #281

metadata naming #281

nsheff commented Mar 21, 2019

nsheff commented Mar 21, 2019

vreuter commented Mar 21, 2019

nsheff commented Mar 21, 2019

vreuter commented Mar 21, 2019

vreuter commented Mar 21, 2019

nsheff commented Mar 21, 2019

vreuter commented Mar 21, 2019 •

edited

Loading

afrendeiro commented Mar 27, 2019 •

edited

Loading

nsheff commented Mar 27, 2019

nsheff commented Feb 18, 2020

metadata naming #281

metadata naming #281

Comments

nsheff commented Mar 21, 2019

nsheff commented Mar 21, 2019

vreuter commented Mar 21, 2019

nsheff commented Mar 21, 2019

vreuter commented Mar 21, 2019

vreuter commented Mar 21, 2019

nsheff commented Mar 21, 2019

vreuter commented Mar 21, 2019 • edited Loading

afrendeiro commented Mar 27, 2019 • edited Loading

nsheff commented Mar 27, 2019

nsheff commented Feb 18, 2020

vreuter commented Mar 21, 2019 •

edited

Loading

afrendeiro commented Mar 27, 2019 •

edited

Loading