You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue is a transferal from the previous issue on gitlab (number 89).
Original description
It seems that the v0.1 milestone will support only stranded data.
We need to make sure that in later versions we support unstranded data as well.
Comments
@fgypas
It seems that we might not support it in the end, so closing for now.
@uniqueg
I disagree strongly and think it should be part of the published version. The very first test with external real-world data (SARS-related samples) demonstrated that there are still plenty of relevant unstranded libraries around. Besides, whether a library is stranded or not is not something that is (unfortunately) typically reported when uploading samples to SRA, so whenever anyone wants to run any sample from there, it's a gamble we will accept it (after checking first with another pipeline) or not. That's a really poor user experience...
@mzavolan
Well.. I think the generating unstranded data makes no sense, especially now. When analyzing such data people make choices that are not warranted and introduce errors for sure (e.g. cumulating reads from plus and minus strand, and of course, discarding regions where there is transcription in both directions. I don't want to spend time implementing and testing choices like this. How do you want to proceed?
@uniqueg
I understand the sentiment but then that argument applies more or less to any data that were obtained with inferior protocols or outdated technology. And to my taste that depends a little bit too much on personal opinion and what kind of resources you have access to. Unstranded data has been proven useful in the past, so I can't really see how analyzing them makes no sense, despite all the obvious drawbacks.
Anyway, how about we leave this issue open and defer the discussion until we know how to proceed with Rhea? If we want to allow our users to analyze samples straight from SRA, I think Rhea should handle the vast majority of samples on there, and that would probably mean it should handle unstranded libraries. If we don't care too much about that particular use case and mostly concern ourselves with how Rhea can serve ourselves, then we should probably drop it for the reasons mentioned.
@mkatsanto
This feature will not be implemented in the near future according to our scope. If the public requires it we can implement it in later versions.
@uniqueg
Do if reviewers ask for that, otherwise wait until users ask for it.
The text was updated successfully, but these errors were encountered:
I created a new branch support-unstranded and commited 6f7f52c a first fix to support unstranded paired-end libraries.
The fix includes only the appropriate keywords for salmon and kallisto. The results for the samples I'm using seem good so far, the multiqc report seems fine in that the majority of reads map in STAR, salmon and kallisto.
ALFA is not yet corrected, it needs a new rule to properly work. Right now, ALFA runs and reports the biotypes, but as expected, half of the reads map to "opposite strand".
It needs a new rule because when running unstranded data, only one bedgraph file can be supplied.
Other tools and output files are not tested for correctness.
This issue is a transferal from the previous issue on gitlab (number 89).
Original description
It seems that the v0.1 milestone will support only stranded data.
We need to make sure that in later versions we support unstranded data as well.
Comments
@fgypas
It seems that we might not support it in the end, so closing for now.
@uniqueg
I disagree strongly and think it should be part of the published version. The very first test with external real-world data (SARS-related samples) demonstrated that there are still plenty of relevant unstranded libraries around. Besides, whether a library is stranded or not is not something that is (unfortunately) typically reported when uploading samples to SRA, so whenever anyone wants to run any sample from there, it's a gamble we will accept it (after checking first with another pipeline) or not. That's a really poor user experience...
@mzavolan
Well.. I think the generating unstranded data makes no sense, especially now. When analyzing such data people make choices that are not warranted and introduce errors for sure (e.g. cumulating reads from plus and minus strand, and of course, discarding regions where there is transcription in both directions. I don't want to spend time implementing and testing choices like this. How do you want to proceed?
@uniqueg
I understand the sentiment but then that argument applies more or less to any data that were obtained with inferior protocols or outdated technology. And to my taste that depends a little bit too much on personal opinion and what kind of resources you have access to. Unstranded data has been proven useful in the past, so I can't really see how analyzing them makes no sense, despite all the obvious drawbacks.
Anyway, how about we leave this issue open and defer the discussion until we know how to proceed with Rhea? If we want to allow our users to analyze samples straight from SRA, I think Rhea should handle the vast majority of samples on there, and that would probably mean it should handle unstranded libraries. If we don't care too much about that particular use case and mostly concern ourselves with how Rhea can serve ourselves, then we should probably drop it for the reasons mentioned.
@mkatsanto
This feature will not be implemented in the near future according to our scope. If the public requires it we can implement it in later versions.
@uniqueg
Do if reviewers ask for that, otherwise wait until users ask for it.
The text was updated successfully, but these errors were encountered: