-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Different length in bedgraph output #12
Comments
Hi Laura, Do you mean different row lengths, rather than column lengths? (i.e. the output of "wc -l" for each file is different?) If so, yes, all files should certainly have the same number of GATC fragments and thus the same number of rows. Can you tell me how extensive the difference is? Also, were there any errors or issues noted in the log files? Thanks, |
Hi Owen, I have tried both with my own made GATC file and with the one provided by you for Dmel_BDGP6.GATC.gff.gz. Thank you very much!
|
Hi Laura, That's very strange -- this shouldn't happen, and I can't see anything that's obviously wrong. If you have all your bedgraphs in the same directory, can you run the following one-liner in the directory and send me the output? (This should output any GATC fragments that are not fully shared between the files, which will hopefully help me understand what's going on here a bit more):
Thanks, |
Hi Owen, |
Ok, that really helps a lot. As you probably noticed, all of those non-matching fragments are coming from unmapped scaffolds in the genome assembly. My pre-built GATC files exclude these, so in theory if you run the pipeline on the BAMs using the GATC file from the damidseq_pipeline repository you should get files with matching lengths. That should, at least, fix the problem for now (and please let me know if it doesn't?) (And can I also confirm that the GATC file you build included the unmapped scaffolds?) As to why this has happened in the first place -- is there any chance that either the bowtie2 alignment indices or the GATC file may not been the same for all samples? If this isn't the case, then I should check how very small scaffolds are handled if there is no mapping data at all. It's possible that scaffolds that fail to map any reads at all are excluded, which could also explain what has happened. |
Hi Owen, To the second part. Everything was run with the same bowtie2 index and GATC files. I am sure because all was done at the same time and I have the commands stored. Thank you very much for your attention, I really appreciate! I hope everything goes well now on. |
That's great to hear, Laura. It's not that you're new to this, though -- it would appear that there's a bug in my code in dealing with chromosomes without any mapping data (a highly unlikely occurrence with the main chromosomes, but not at all impossible when including the small, often heterochromatic, unmapped scaffolds). It should be fairly straightforward to ensure any chromosomes without data are still included in the output file, so I'll look into it further and update once fixed. |
Hi,
I have run the pipeline with paired end data. I have bedgraphs with different column lengths for each sample. Should not be equal since I have used the same reference GATC.gff?
thank you very much,
Laura
The text was updated successfully, but these errors were encountered: