Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metadata naming #281

Closed
nsheff opened this issue Mar 21, 2019 · 10 comments
Closed

metadata naming #281

nsheff opened this issue Mar 21, 2019 · 10 comments
Assignees

Comments

@nsheff
Copy link
Contributor

nsheff commented Mar 21, 2019

Originally posted by @johanneskoester in snakemake-workflows/dna-seq-gatk-variant-calling#6

Another thing. Shouldn't the project.yaml read like this for consistency:

metadata:
    sample_table: path/to/samples.tsv
    subsample_table: path/to/subsamples.tsv

Given our conclusions in #280 this seems logical. We originally had merge_table, which changed to sample_subannotation.

@nsheff
Copy link
Contributor Author

nsheff commented Mar 21, 2019

I'm not really sure why we used sample_annotation originally as opposed to sample_table, but we have been doing it that way since the beginning. To my ear right now, I actually prefer sample_table...

@vreuter -- maybe you won't like it from the dev perspective? What do you think of another major name change like this? @afrendeiro ?

@vreuter
Copy link
Member

vreuter commented Mar 21, 2019

I'm pro big name changes, anti backwards compatibility, so what to do about sample_annotation determines my opinion.

@nsheff
Copy link
Contributor Author

nsheff commented Mar 21, 2019

This is one where we'd definitely need to keep backwards compatibility, because sample_annotation is widespread.

@vreuter
Copy link
Member

vreuter commented Mar 21, 2019

Right, the point was to preemptively clarify the reason for the dislike.

@vreuter
Copy link
Member

vreuter commented Mar 21, 2019

The great part is that with an impending 1.0, we have a major point at which we can remove all past support, right? ;D

@nsheff
Copy link
Contributor Author

nsheff commented Mar 21, 2019

remove all past support, right

I'm getting cold feet on that 😄 but yeah. I'm still on board for now

But the more specific question here is: which do you prefer: sample_annotation or sample_table, regardless of other issues?

@vreuter
Copy link
Member

vreuter commented Mar 21, 2019

I'm getting cold feet on that smile but yeah. I'm still on board for now

My preference is clear, but it's your call obviously. Totally agree that breadth of usage is a big deal with sample_annotation. OTOH, it should be almost or entirely superficial or in our control. In other words, I guess that most if not all usages in places besides our own packages/libraries are in end-use context (individual analysis projects), where a change wouldn't propagate a requirement to update beyond that end use itself.

The place that comes to mind as potentially not that way is @afrendeiro 's toolkit, through which impact could extend to another layer(s) of users. But like for anything with caravel, pepr, etc. we could change relatively inconsequentially (since probably no/few external projects depend on those, right?)

As a separate benefit to converging on a single key/name for the metadata sheet, I could also see someone wanting to use the sample_annotation key to (more commonly?) denote genomic annotation rather than sample metadata, though probably in that case something like refann or genome_annotation makes more sense.

But the more specific question here is: which do you prefer: sample_annotation or sample_table, regardless of other issues?

I agree with @johanneskoester (and @nsheff ?) that it'd be ideal to align the end user's encoding and the API; my preference is for that parallelism, regardless of which one it is.

@afrendeiro
Copy link
Contributor

afrendeiro commented Mar 27, 2019

I honestly don't have a strong preference for either but would argue that changes like this should be decided as soon as possible.
I'm also working towards a release of my library with a stable API and would appreciate if upstream things were stable sooner rather than later, but I know this just takes time to figure out what's best :)

I was reading this article recently on the development of pandas towards 1.0 and they do something releases where the main goal is to warn of all deprecations with backward compatibility although with already the new API in place and release 1.0 is exactly the same just with backward breaking changes removed.
We could also do something like that...

@nsheff
Copy link
Contributor Author

nsheff commented Mar 27, 2019

We could also do something like that...

Yes, that's exactly what we're planning to do.

vreuter added a commit to vreuter/peppy that referenced this issue Mar 28, 2019
vreuter added a commit to vreuter/peppy that referenced this issue Mar 28, 2019
@vreuter vreuter self-assigned this Apr 30, 2019
@vreuter vreuter closed this as completed May 2, 2019
@nsheff
Copy link
Contributor Author

nsheff commented Feb 18, 2020

In the end we went with sample_table and subsample_table, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants