Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Samplesheet info discrepancy #224

Closed
maltesemike opened this issue Mar 3, 2023 · 5 comments
Closed

Samplesheet info discrepancy #224

maltesemike opened this issue Mar 3, 2023 · 5 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@maltesemike
Copy link

Description of the bug

There is a discrepancy in the information regarding the format for the design samplesheet.

The documentation states "there is a strict requirement for the first 3 columns to match those defined in the table below", yet the table shown is an example of a 2 column csv file with "sample,fastq_1" headers. The pipeline currently throws an error with this column file.

image

he documentation should be fixed to avoid confusion since a the pipeline will only work with a 3 column csv file with "sample,fastq_1,fastq_2" as headers, even though the pipeline only accepts single-end fastq files.

Command used and terminal output

No response

Relevant files

No response

System information

No response

@maltesemike maltesemike added the bug Something isn't working label Mar 3, 2023
@apeltzer apeltzer added this to the 2.2.1 milestone Mar 3, 2023
@apeltzer apeltzer self-assigned this Mar 3, 2023
@apeltzer
Copy link
Member

apeltzer commented Mar 3, 2023

Yeah I think we can simply update the python tool to ignore cases where there is no fastq2 header present - I'll do this 👍

@Adrian-Zet
Copy link

Adrian-Zet commented Jun 12, 2023

Dear Nextflow smrnaseq team,

Is this solved in the stable conda version of nextflow?

I did try to use it with only 3 columns, placing nothing in the 3rd column like this:

sample,fastq_1,fastq_2
Cancer_SRR5230552,/home/adrz/test-nextflow/SRR5230552.fastq.gz
Cancer_SRR5230553,/home/adrz/test-nextflow/SRR5230553.fastq.gz
Control_SRR5230634,/home/adrz/test-nextflow/SRR5230634.fastq.gz
Control_SRR5230685,/home/adrz/test-nextflow/SRR5230685.fastq.gz

But ,I received the following errors:
Command executed:

check_samplesheet.py
input_nf.csv
samplesheet.valid.csv

cat <<-END_VERSIONS > versions.yml
"NFCORE_SMRNASEQ:SMRNASEQ:INPUT_CHECK:SAMPLESHEET_CHECK":
python: $(python --version | sed 's/Python //g')
END_VERSIONS

Command exit status:
1

Command output:
ERROR: Please check samplesheet -> Invalid number of columns: found 0 columns (header has 2)
Line #2: 'Cancer_SRR5230552,/home/adrz/test-nextflow/SRR5230552.fastq.gz'

Command error:
ERROR: Please check samplesheet -> Invalid number of columns: found 0 columns (header has 2)
Line #2: 'Cancer_SRR5230552,/home/adrz/test-nextflow/SRR5230552.fastq.gz'

Work dir:
/home/adrz/test-nextflow/work/82/0b7258bc39f2d6377e4f4be957a86e

Tip: view the complete command output by changing to the process work dir and entering the command cat .command.out

-- Check '.nextflow.log' file for details

Edit2:

I also tried the following structure for the CSV file:
sample,fastq_1,fastq_2
Cancer_SRR5230552,/home/adrz/test-nextflow/SRR5230552.fastq.gz,/home/adrz/test-nextflow/SRR5230552.fastq.gz
Cancer_SRR5230553,/home/adrz/test-nextflow/SRR5230553.fastq.gz,/home/adrz/test-nextflow/SRR5230553.fastq.gz
Control_SRR5230634,/home/adrz/test-nextflow/SRR5230634.fastq.gz,/home/adrz/test-nextflow/SRR5230634.fastq.gz
Control_SRR5230685,/home/adrz/test-nextflow/SRR5230685.fastq.gz,/home/adrz/test-nextflow/SRR5230685.fastq.gz

and received the following error: Command output:
ERROR: Please check samplesheet -> Invalid number of columns: found 4 columns (header has 1)
Line #6: ''

Command error:
ERROR: Please check samplesheet -> Invalid number of columns: found 4 columns (header has 1)
Line #6: ''

Thanks in advance,
Adrian

@apeltzer
Copy link
Member

apeltzer commented Sep 1, 2023

This is fixed in dev and should be working fine in 2.2.2 which is imminent for release 👍🏻 Thanks for reporting, both :)

@apeltzer apeltzer closed this as completed Sep 1, 2023
@mdozmorov
Copy link

I'm using 2.2.3 and still encountered this error. After long debugging and staring at check_samplesheet.py, the solution was to remove the empty line at the end of the sample sheet.

Suggesting checking for that and giving an informative error. Currently, the message is very misleading. For the 3-sample sheet (with an empty line), the code gives:

ERROR: Please check samplesheet -> Invalid number of columns: found 3 columns (header has 1)
Line #5: ''

Everything is wrong here - the header has 2 columns, the number of rows, not columns, is 3. Here's the sample sheet, samplesheet_test.csv, and to reproduce: python .nextflow/assets/nf-core/smrnaseq/bin/check_samplesheet.py samplesheet_test.csv samplesheet.test.valid.csv.

I'm not as familiar with Python to make a PR, even with the help of ChatGPT. Reporting here, it should be a simple but very valuable fix.

@apeltzer
Copy link
Member

apeltzer commented Sep 11, 2023

THanks @mdozmorov - I have added this to the open PR for 2.2.4 now. #281, see here: 21b10e9

nschcolnicov pushed a commit that referenced this issue Oct 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants