Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to simulate amplicon-seq data? #221

Open
capoony opened this issue Sep 8, 2024 · 0 comments
Open

How to simulate amplicon-seq data? #221

capoony opened this issue Sep 8, 2024 · 0 comments

Comments

@capoony
Copy link

capoony commented Sep 8, 2024

Hi all,

apologies for yet another request! Specifically, I want to simulate amplicon-seq reads of ONT data using NanoSim but fail at the simulation step which does not finish (at least within hours).

I have a reference sequence based on Sanger sequencing of the amplicon (Stor1_cox1.fa). In addition, I have ONT data of the same amplicon (COX1.fastq), which I could use for model training.

Following your suggestion in issue 112, I am using the "transcriptome" method.

conda activate nanosim

read_analysis.py transcriptome \
    -i ${wd}Syrphid/results/demo_ext/data/demultiplexed/Stor-1/COX1.fastq \
    -rg ${wd}simulations/data/Stor1_cox1.fa \
    -rt ${wd}simulations/data/Stor1_cox1.fa \
    -o ${wd}simulations/data/COX1_training \
    --no_intron_retention \
    -t 100

This finisihes without error. However, when I want to use the model for simulations, the script gets stuck even when simulating only 100 reads.

printf  """target_id\test_counts\tpm\nENSStor-1\t1000\t1000\n""" > ${wd}simulations/data/Stor1_cox1.exp

simulator.py transcriptome \
    -rt ${wd}simulations/data/Stor1_cox1.fa \
    -c ${wd}simulations/data/COX1_training \
    -o ${wd}simulations/data/Stor1_cox1_sim \
    -e ${wd}simulations/data/Stor1_cox1.exp \
    -n 100 \
    --no_model_ir \
    -t 4

Can you help me with this?

Moreover, I am wondering if this model can also be used for other amplicons with longer read lengths? I fear not if I understand the logic correctly. What to do in this case (when there is no amplicon-specific Training data available)?

Thanks a lot,

Testdata.zip

Martin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant