Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue running concoct with crossMapParallel pathway #57

Closed
jorgeln0 opened this issue Jun 22, 2021 · 7 comments
Closed

Issue running concoct with crossMapParallel pathway #57

jorgeln0 opened this issue Jun 22, 2021 · 7 comments
Assignees
Labels
question Further information is requested

Comments

@jorgeln0
Copy link

I am trying to run concoct on metaGEM after running crossMapParallel. I am trying to run metaGEM using a large dataset that has already been quality filtered and assembled into contigs. I was successfully able to run my files through megahit and ran crossMapParallel. I used crossMapParallel since it's recommended for large datasets and it outputted the expected files into the kallisto folder. I ran concoct as the next job in the workflow which calls kallisto2concoct but fails after encountering the following error. Do you know how I can avoid the issue to be able to continue the workflow? Thank you!

P.S. Line 598 has the output file commented out so I removed the "#"

Traceback (most recent call last):
  File "/projectnb2/talbot-lab-data/jlopezna/metaGEM/scripts/kallisto2concoct.py", line 41, in <module>
    main(args)
  File "/projectnb2/talbot-lab-data/jlopezna/metaGEM/scripts/kallisto2concoct.py", line 22, in main
    samplename = samplenames[i]
IndexError: list index out of range
@franciscozorrilla
Copy link
Owner

franciscozorrilla commented Jun 23, 2021

Hi Jorge,

Thanks for your interest in metaGEM, and glad you were able to run most of the crossMapParallel subworkflow.

Based on the error message, I suspect that the culprit may be line 18 in the Snakefile:

focal = get_ids_from_path_pattern('dataset/*')

Does your dataset folder contain sample specific subdirectories? Even if they are empty, metaGEM will look into this folder for determining sample IDs as shown here.

Regarding line 598, did you run into any rule dependency resolution errors after uncommenting? There are some commented outputs in the Snakefile for cases where alternative rules can generate the same output, and thus Snakemake would be unable to determine which rule to run. Commenting out the output for the "un-used" rule makes Snakemake happy.

Just curious, what type of sequencing data is it (e.g. human gut), how many samples do you have, and how big are they (e.g. size in GB or number of reads per sample)? Even if you have a lot of samples (e.g. >250), you may be able to use the crossMapSeries subworkflow if the samples are small.

Best wishes,
Francisco

@franciscozorrilla franciscozorrilla self-assigned this Jun 23, 2021
@franciscozorrilla franciscozorrilla added the question Further information is requested label Jun 23, 2021
@franciscozorrilla
Copy link
Owner

Closing issue due to inactivity, please reopen if issues arise.

@Xentrics
Copy link
Contributor

Xentrics commented Feb 3, 2022

I encountered the same issue, but I found the solution. There is an error in the code that lists the sub-directories.
In my case {input} did not end with a folder /, so instead of listing sub-directories, it listed only the main directory itself.

--samplenames <(for s in {input}*; do echo $s|sed 's|^.*/||'; done) \
Should be
--samplenames <(for s in {input}/*; do echo $s|sed 's|^.*/||'; done) \

@franciscozorrilla
Copy link
Owner

Thanks @Xentrics! I will get around to fixing this soon hopefully. In case its fresh on your mind feel free to submit a PR fix, it would be greatly appreciated 🥲

@kunaljaani
Copy link

Hi @franciscozorrilla,

I am trying to run concoct on the toy dataset and getting a "TypeError" (please find the picture). Surprisingly, I could run both metabat and maxbin but somehow not able to run concoct. Could you please suggest some fix?

Thanks a lot.
Kunal

image

@franciscozorrilla
Copy link
Owner

Hey @kunaljaani , this error seems to be associated with recent CONCOCT installations, in particular with the dependency package sklearn. For example, here you can see three issues in the CONCOCT repo detailing how to resolve this recent problem.

BinPro/CONCOCT#321
BinPro/CONCOCT#322
BinPro/CONCOCT#323

Sounds like replacing scikit-learn 1.2 with scikit-learn 1.1 resolves this issue.

@kunaljaani
Copy link

Hi Francisco,

Thanks a lot for your rapid reply. Ya it was the issue of the scikit-learn by changing it to 1.1 worked.
pip install scikit-learn==1.1.0

Thanks a lot.
Kunal

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants