-
Notifications
You must be signed in to change notification settings - Fork 369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IlluminaBasecallsToSam skipping barcodes #1774
Comments
An update: I have tried running with a simpler read structure (50T99S8B50T) and this did not change the results. I have also tried deleting the barcodes files and running just IlluminaBasecallsToSam with MATCH_BARCODES_INLINE=true and this also did not change the results. |
Hi @nchernia have you tried running this on a newer version of Picard (the latest release is 2.26.10)? We fixed a number of errors in EIB and IBCTo[Sam|Fastq] last year (although this does not sound like one of those). Also, is it possible that the missing barcodes are those that do not exactly match the input barcodes list (have more than one error)? That would be indicated in the metrics file generated by EIB. If possible, can you upload that file too? Thanks. George |
Hi George, Thanks for getting back to me. I've rerun with the latest jar and the results are the same. Here is the head of the metrics file.
This run is with IBtoS directly, but when I run EIB and IBtoS the counts are the same, except that the metrics file produced via the flag METRICS_FILE (and INLINE_BARCODES) doesn't contain the library name; moreover as another issue reported, the NNNN string is reported as all 0s when running IBtoS directly, and so the percentages are different. Looking at the metrics file when running EIB and IBtoS, I can sum the numbers in the READS column. This total number, which includes reads that matched one of the passed in barcodes and reads that didn't, sums to the total number of reads in the Undetermined_I2 fastq file produced by bcl2fastq. The number of reads reported in the READS column also matches the number of reads in the BAM file. It seems as if Picard is reporting these as unmatched when they appear as though they should be perfect matches and assigned to a library. I am wondering if there's some other filter that's not obvious from the options that Picard is doing? Here is an example of a read from the above - you can see it matches the first barcode perfectly. It is not in the resulting bam file.
|
Hello, I've confirmed this bug on a third BCL folder, on a different experiment (this time ChiP-seq, with a simple read structure 76T8B). I am at the Broad and happy to share results with you so you can continue to debug. It's consistently a small but persistent error, affecting a small percentage of reads. For something like ChiP-seq, maybe losing a small number of reads isn't important, but for single cell experiments we don't want to lose any reads, and this bug would force us to use bcl2fastq (which we'd prefer not to do, since we'd then have to rewrite the Picard functionality of mismatch tolerance and writing to bams). |
Please do let me know if you'd like me to share an example BCL together with the other files. They are stored on a Google bucket. My Broad username is neva if you'd like to email instead. |
Bug Report
Affected tool(s)
IlluminaBasecallsToSam (possibly ExtractIlluminaBarcodes)
Affected version(s)
Version:2.26.3
Description
I am using ExtractIlluminaBarcodes and IlluminaBasecallsToSam as a first demultiplexing step to separate reads by library barcode. I am comparing to doing two passes via bcl2fastq and then directly examining the index reads to demultiplex by library.
The bam files produced by IlluminaBasecallsToSam are missing barcodes that exist in bcl2fastq. These are exact matches with high quality basecall strings, e.g. FFFFFFFF. I have tested this on two different bcl files so far and in both cases there's approximately 0.1% of reads missing.
Since ExtractIlluminaBarcodes is called before IlluminaBasecallsToSam and reports the number of matches per library (which is equal to the number of reads in the corresponding library bam file), this might be an error in ExtractIlluminaBarcodes instead of IlluminaBasecallsToSam.
Steps to reproduce
barcode_params.tsv
library_params1.tsv
Expected behavior
All reads with matching library barcode are in appropriate bam file
Actual behavior
0.1% of reads with exact match barcodes are missing.
The text was updated successfully, but these errors were encountered: