-
Notifications
You must be signed in to change notification settings - Fork 0
/
Step3_FromContigToChromosomeAssembly
41 lines (36 loc) · 2.13 KB
/
Step3_FromContigToChromosomeAssembly
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
# Step3
# As the output is gfa format which is not supported by RagTag(a sofeware maps contigs to reference genome to get right path)
# for get the fasta output
awk '/^S/{print ">"$2; printf "%s", $3 | "fold -w 80"; close("fold -w 80"); print ""}' in.gfa > out.fa
# If you want to include GFA tags in fasta header line
awk '/^S/{header=">"$2; for(i=4; i<=NF; i++) {header=header" "$i}; print header; printf "%s", $3 | "fold -w 80"; close("fold -w 80"); print ""}' in.gfa > out.fa
#close() is required to prevent the headers and sequences from getting out of sync or being skipped
#when sequences are very long.
# Sometimes, one may need join contigs/segments together before exporting to fasta you should use a scaffolding tool such as:
# # #GraphUnzip (Detangles graph, can be used before scaffolding)
# # #DENTIST
# # #SAMBA
# # #ntLINK
# # #SLR
# # #SGTK
#However these is not necessary when you already have a reference genome of the same speices.
# A very good choice is too amplify a crossing population, especially in plants, construction of linkage maps will help you know the relative position of each contig
#If karyotype is known, better using FISH-or GISH to verify it by karyotype. Then the Unknow chromosome will be assign into right chromosome.
# some time the hifi read is not enough to generated high quality whole genome sequences that we can only get long contigs
# The RagTag could help, if the some other reference genomes are avaliable within the same species.
# RagTag is a conda package, make sure you have following depencies:
# Minimap2, Unimap or Nucmer and Python 3
#Installation
conda install -c bioconda ragtag
# correct a query assembly
ragtag.py correct ref.fasta query.fasta
# scaffold a query assembly
ragtag.py scaffold ref.fasta query.fasta
# scaffold with multiple references/maps
ragtag.py scaffold -o out_1 ref1.fasta query.fasta
ragtag.py scaffold -o out_2 ref2.fasta query.fasta
ragtag.py merge query.fasta out_*/*.agp other.map.agp
# use Hi-C to resolve conflicts
ragtag.py merge -b hic.bam query.fasta out_*/*.agp other.map.agp
# make joins and fill gaps in target.fa using sequences from query.fa
ragtag.py patch target.fa query.fa