You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Generate cDNA copies of transcripts, allowing for the priming of DNA synthesis at transcript-internal sites.
Input:
fasta-formatted file of transcript sequences
gtf-formatted file with potential priming sites for individual transcripts, with associated probabilities
file with the copy number of each unique transcript subjected to the cDNA synthesis
Output:
fasta-formatted file with DNA copies of the transcripts, ending at the one of the possible priming sites for each transcript. Priming sites are sampled in proportion to their probability of being used within a transcript. Each copy of a unique transcript is independently sampled, but only unique DNA sequences are saved to the output file.
Csv-formatted file with the copy number of each unique DNA copy.
Output: fasta-formatted file with DNA copies of the transcripts, ending at the one of the possible priming sites for each transcript. Priming sites are sampled in proportion to their probability of being used within a transcript. Each copy of a unique transcript is independently sampled, but only unique DNA sequences are saved to the output file. Csv-formatted file with the copy number of each unique DNA copy.
Simulating cDNA synthesis This is done by reverse transcribing starting from the primer sequence. For each transcript we have the sequence and the copy number. So we for each copy of the transcript we have to sample a priming site in proportion to its probability, calculated at the previous step. Then the cDNAs will be all the sequences generated from the initial pool of transcripts by copying the initial transcript sequence up to the chosen priming site.
cDNA Generator Design
Extract transcript_sequences, transcript_copy_number, priming_sites and priming_probabilities from input files.
Generate a list of unique_transcripts based on transcript_sequences + priming_sites and add the list to the FASTA output file. mRNA -> cDNA
TTTACGGT…
CCATACGG…
CGGGGCG…
Generate list of copy numbers for each unique transcript based on priming_probabilities + transcript_copy_number
TTTACGGT… 33
CCATACGG… 27
CGGGGCG… 40
Iterate 1-3 and extend lists
Write unique_transcripts output FASTA file and copy_number_transcripts output CSV file
Open questions:
What if the RT-polymerase is breaking off before reaching the 5'-end of the transcript? With the current design we only consider potential start sites (priming sites).
The text was updated successfully, but these errors were encountered:
README description
cDNA Generator module
Generate cDNA based on mRNA transcript sequences and the coresponding priming probabilities.
Example usage
A simple example can be run from the test_files directory:
Installation
Docker
A docker image is available, to fetch this image:
To run a simple example using this image:
License
MIT license, Copyright (c) 2022 Zavolan Lab, Biozentrum, University of Basel
Contributers
Eric Boittier, Bastian Wagner, Quentin Badolle
More info:
Input files
transcript_copies (csv-formatted) containing:
transcript_sequences (fasta-formatted) containing:
priming_sites (gtf-formatted) containing:
Output files
cDNA_sequences (fasta-formatted) containing:
cDNA_counts (csv-formatted) containing:
Original issue description
https://git.scicore.unibas.ch/zavolan_group/pipelines/scrna-seq-simulation/-/issues/5
Generate cDNAs
Generate cDNA copies of transcripts, allowing for the priming of DNA synthesis at transcript-internal sites.
Input:
Output:
Pipeline overview description
https://git.scicore.unibas.ch/zavolan_group/pipelines/scrna-seq-simulation
The possible priming sites are sampled with the probabilities computed at the previous step, to pick a site for generating the complementary DNA.
Project design description
https://git.scicore.unibas.ch/zavolan_group/tools/cdna-generator/-/wikis/Project-Design
Input: fasta-formatted file of transcript sequences gtf-formatted file with potential priming sites for individual transcripts, with associated probabilities file with the copy number of each unique transcript subjected to the cDNA synthesis
Output: fasta-formatted file with DNA copies of the transcripts, ending at the one of the possible priming sites for each transcript. Priming sites are sampled in proportion to their probability of being used within a transcript. Each copy of a unique transcript is independently sampled, but only unique DNA sequences are saved to the output file. Csv-formatted file with the copy number of each unique DNA copy.
Simulating cDNA synthesis This is done by reverse transcribing starting from the primer sequence. For each transcript we have the sequence and the copy number. So we for each copy of the transcript we have to sample a priming site in proportion to its probability, calculated at the previous step. Then the cDNAs will be all the sequences generated from the initial pool of transcripts by copying the initial transcript sequence up to the chosen priming site.
cDNA Generator Design
Open questions:
The text was updated successfully, but these errors were encountered: