Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chip-seq_preprocess not compatible with new samtools #9

Open
ericakorb opened this issue Feb 26, 2019 · 0 comments
Open

chip-seq_preprocess not compatible with new samtools #9

ericakorb opened this issue Feb 26, 2019 · 0 comments

Comments

@ericakorb
Copy link

Hi- I've been using your awesome chip-seq_preprocess pipeline for several years. I just tried to run it on a computer on which I'm using the newest version of samtools (version 1.9) and am getting errors that I've traced to updates put into the new samtools. I've fixed one easier problem in fast2bam_by_bowtie2.sh. Here line 76 needs to have -o added before the output file name because the sort command on the new samtools requires this. With that change I can now align all my fastq files and run the subsequent steps (such as fastqc) until the rmdup step.
This issue has me a little stuck. rmdup.bam.sh uses rmdup in line 7. However, this command no longer exists in samtools and has been replaced by markdup. markdup requires a few initial steps that I'm not quite sure how to incorporate here since I'm not quite sure of how to name the output files appropriately so as to not mess up the subsequent steps of the pipeline. I'm also not totally clear on how the input file is sorted at this stage in the pipeline and that is important for markdup. I think it would just require an additional few lines of code added to rmdup.bam.sh to replace the rmdup command but I'm having trouble figuring out how to do this. Is this something you could help with? Here is the link to the new samtools manual and below is the relevant info on markdup.
I'd love to be able to use the newer version of samtools if possible so any suggestions would be very much appreciated! Thank you!

http://www.htslib.org/doc/samtools.html

markdup
samtools markdup [-l length] [-r] [-s] [-T] [-S] in.algsort.bam out.bam

Mark duplicate alignments from a coordinate sorted file that has been run through fixmate with the -m option. This program relies on the MC and ms tags that fixmate provides.

-l INT
Expected maximum read length of INT bases. [300]

-r
Remove duplicate reads.

-s
Print some basic stats.

-T PREFIX
Write temporary files to PREFIX.samtools.nnnn.mmmm.tmp

-S
Mark supplementary reads of duplicates as duplicates.

EXAMPLE

The first sort can be omitted if the file is already name ordered

samtools sort -n -o namesort.bam example.bam

Add ms and MC tags for markdup to use later

samtools fixmate -m namesort.bam fixmate.bam

Markdup needs position order

samtools sort -o positionsort.bam fixmate.bam

Finally mark duplicates

samtools markdup positionsort.bam markdup.bam

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant