Support for unstranded data #32

dominikburri · 2021-11-05T12:32:47Z

This issue is a transferal from the previous issue on gitlab (number 89).

Original description

It seems that the v0.1 milestone will support only stranded data.
We need to make sure that in later versions we support unstranded data as well.

Comments

@fgypas
It seems that we might not support it in the end, so closing for now.

@uniqueg
I disagree strongly and think it should be part of the published version. The very first test with external real-world data (SARS-related samples) demonstrated that there are still plenty of relevant unstranded libraries around. Besides, whether a library is stranded or not is not something that is (unfortunately) typically reported when uploading samples to SRA, so whenever anyone wants to run any sample from there, it's a gamble we will accept it (after checking first with another pipeline) or not. That's a really poor user experience...

@mzavolan
Well.. I think the generating unstranded data makes no sense, especially now. When analyzing such data people make choices that are not warranted and introduce errors for sure (e.g. cumulating reads from plus and minus strand, and of course, discarding regions where there is transcription in both directions. I don't want to spend time implementing and testing choices like this. How do you want to proceed?

@uniqueg
I understand the sentiment but then that argument applies more or less to any data that were obtained with inferior protocols or outdated technology. And to my taste that depends a little bit too much on personal opinion and what kind of resources you have access to. Unstranded data has been proven useful in the past, so I can't really see how analyzing them makes no sense, despite all the obvious drawbacks.
Anyway, how about we leave this issue open and defer the discussion until we know how to proceed with Rhea? If we want to allow our users to analyze samples straight from SRA, I think Rhea should handle the vast majority of samples on there, and that would probably mean it should handle unstranded libraries. If we don't care too much about that particular use case and mostly concern ourselves with how Rhea can serve ourselves, then we should probably drop it for the reasons mentioned.

@mkatsanto
This feature will not be implemented in the near future according to our scope. If the public requires it we can implement it in later versions.

@uniqueg
Do if reviewers ask for that, otherwise wait until users ask for it.

dominikburri · 2021-11-05T12:39:56Z

I created a new branch support-unstranded and commited 6f7f52c a first fix to support unstranded paired-end libraries.

The fix includes only the appropriate keywords for salmon and kallisto. The results for the samples I'm using seem good so far, the multiqc report seems fine in that the majority of reads map in STAR, salmon and kallisto.

ALFA is not yet corrected, it needs a new rule to properly work. Right now, ALFA runs and reports the biotypes, but as expected, half of the reads map to "opposite strand".
It needs a new rule because when running unstranded data, only one bedgraph file can be supplied.

Other tools and output files are not tested for correctness.

mkatsanto · 2022-11-15T14:20:35Z

Revisiting this issue:

Tasks to apply this feature

star_rpm rule : there is an option for unstranded
alfa can support unstranded data
build appropriate test
there is need for a subworkflow that will be unstranded specific

dominikburri added the enhancement New feature or request label Nov 5, 2021

ninsch3000 added the future will not be fixed for NOW label Apr 20, 2022

mkatsanto self-assigned this Oct 28, 2022

mkatsanto removed the future will not be fixed for NOW label Oct 28, 2022

mkatsanto added this to the submission_related_updates milestone Oct 28, 2022

mkatsanto removed this from the submission_related_updates milestone Nov 28, 2022

mkatsanto added the future will not be fixed for NOW label Nov 28, 2022

mkatsanto removed their assignment Nov 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for unstranded data #32

Support for unstranded data #32

dominikburri commented Nov 5, 2021

dominikburri commented Nov 5, 2021

mkatsanto commented Nov 15, 2022