Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add check for sample name collision when using the same sample for multiple antibodies #440

Open
pontushojer opened this issue Dec 16, 2024 · 0 comments

Comments

@pontushojer
Copy link

I run chipseq v2.1.0 with a samplesheet that had shared sample entries across multiple antibody entires.

Here is a snipet of the samplesheet

sample,fastq_1,fastq_2,replicate,antibody,control,control_replicate
10k_55min_ChIP,P33703_1014.R1.fastq.gz,P33703_1014.R2.fastq.gz,1,Ab1,10k_55min_IC,1
10k_55min_ChIP,P33703_1015_S15_L006_R1_001.fastq.gz,P33703_1015_S15_L006_R2_001.fastq.gz,2,Ab1,10k_55min_IC,1
10k_55min_ChIP,P33703_1016.R1.fastq.gz,P33703_1016.R2.fastq.gz,3,Ab1,10k_55min_IC,1
10k_55min_ChIP,P33703_1036.R1.fastq.gz,P33703_1036.R2.fastq.gz,1,Ab2,10k_55min_IC,1
10k_55min_ChIP,P33703_1037.R1.fastq.gz,P33703_1037.R2.fastq.gz,2,Ab2,10k_55min_IC,1
10k_55min_ChIP,P33703_1038.R1.fastq.gz,P33703_1038.R2.fastq.gz,3,Ab2,10k_55min_IC,1
10k_55min_IC,P33703_1013.R1.fastq.gz,P33703_1013.R2.fastq.gz,1,,,

Everything started smoothly, even passing the check_samplesheet.py script, until NFCORE_CHIPSEQ:CHIPSEQ:BAM_PEAKS_CALL_QC_ANNOTATE_MACS3_HOMER:PLOT_MACS3_QC when I run into this error.

Error executing process > 'NFCORE_CHIPSEQ:CHIPSEQ:BAM_PEAKS_CALL_QC_ANNOTATE_MACS3_HOMER:PLOT_MACS3_QC'

Caused by:
  Process `NFCORE_CHIPSEQ:CHIPSEQ:BAM_PEAKS_CALL_QC_ANNOTATE_MACS3_HOMER:PLOT_MACS3_QC` input file name collision -- There are multiple input files for each of the following file names: 50k_50min_ChIP_REP1_peaks.narrowPeak, 50k_60min_ChIP_REP2_peaks.narrowPeak, 10k_60min_ChIP_REP2_peaks.narrowPeak, 10k_50min_ChIP_REP1_peaks.narrowPeak, 50k_55min_ChIP_REP1_peaks.narrowPeak, 50k_50min_ChIP_REP3_peaks.narrowPeak, 10k_55min_ChIP_REP2_peaks.narrowPeak, 10k_50min_ChIP_REP3_peaks.narrowPeak, 10k_50min_ChIP_REP2_peaks.narrowPeak, 50k_50min_ChIP_REP2_peaks.narrowPeak, 10k_60min_ChIP_REP1_peaks.narrowPeak, 50k_60min_ChIP_REP1_peaks.narrowPeak, 50k_60min_ChIP_REP3_peaks.narrowPeak, 50k_55min_ChIP_REP3_peaks.narrowPeak, 10k_55min_ChIP_REP3_peaks.narrowPeak, 50k_55min_ChIP_REP2_peaks.narrowPeak, 10k_55min_ChIP_REP1_peaks.narrowPeak, 10k_60min_ChIP_REP3_peaks.narrowPeak


Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

I naively thought that the antibody column would be appended to the sample name to make it unique, but this is seemingly not the case.

It would be great if the documentation could be improved here so that it is clear that the sample column entries need to be unique across antibody instances. Also the script check_samplesheet.py could ideally catch this formatting error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant