RepeatAfterMe

A package for the extension of repetitive DNA cores

The RepeatAfterMe RAMExtend tool automatically extends TE family fragments that are often generated by de novo repeat identification tools. Given the genomic start/end points of these fragmented instances, either from an existing multiple sequence alignment (MSA) or from other tools, RAMExtend will perform an aligned extension of the flanking sequences. The consensus sequence of both extensions is generated and optionally the full set of extended sequences is output in FASTA format.

The extension algorithm is an enhanced version of the RepeatScout approach developed by Alkes Price, Neil Jones and Pavel Pevzner (See history below). The new algorithm supports multiple scoring schemes, and affine gap penalties. In addition, the tool attempts to detect satellites and avoid extend these sequences beyond one unit.

Robert Hubley 2022 Institute for Systems Biology

RAMExtend

The tool minimally requires two input files. The first is a file the core alignment to extend from. Ranges are supplied in the form of a modified BED-6 format:

  BED-6 field name  : RAMExtend use
  field-1:chrom     : sequence identifier
  field-2:chromStart: lower aligned position ( 0 based )
  field-3:chromEnd  : upper aligned position ( 0 based, half open )
  field-4:name      : left extendable flag ( 0 = no, 1 = yes )
  field-5:score     : right extendable flag 
  field-6:strand    : strand ( '+' = forward, '-' = reverse )

The fields are tab separated. Coordinates are zero-based, half-open. The 'extendable?' flags are used to limit the use of individual sequences in the left or right extension phases. This is useful when sequences in the core MSA are not of uniform length.

The second input file is the genome itself in 2bit format. This is where the tool extracts the flanking regions using the identifiers provided in the BED file.

For example:

./RAMExtend -ranges test/extension-test2.tsv -twobit test/extension-test2.2bit

There are additional options that are displayed if no options are specified.

History

The genesis for this project was with the great work by Alkes Price, Neil Jones and Pavel Pevzner. They developed a method for automatically extending aligned sequences ( abundant exact words ) allowing for the development of a dynamic multiple alignment. The details of their method are described in the following paper:

Price A.L., Jones N.C. and Pevzner P.A. 2005. De novo identification of repeat families in large genomes. To appear in Proceedings of the 13 Annual International conference on Intelligent Systems for Molecular Biology (ISMB-05). Detroit, Michigan.

One of the drawbacks for RepeatScout is the simple scoring system used by the program ( Match/Mismatch/Gap ). It was our initial goal to simply augment the RepeatScout package with custom matrix support and full affine gap penalties. As work progressed and the possibilities of further enhancing the code became clear it was decided that a new project should be created for this effort. RepeatAfterMe is designed as an experimental workbench for the application of this powerful extension algorithm to various types of aligned cores (kmers, alignment fragments etc).

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
kentsrc		kentsrc
minunit		minunit
test		test
util		util
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
bnw_extend.c		bnw_extend.c
bnw_extend.h		bnw_extend.h
build.dat		build.dat
cmd_line_opts.c		cmd_line_opts.c
cmd_line_opts.h		cmd_line_opts.h
common.c		common.c
common.h		common.h
ram_extend.c		ram_extend.c
ram_extend.h		ram_extend.h
report.c		report.c
report.h		report.h
score_system.c		score_system.c
score_system.h		score_system.h
sequence.c		sequence.c
sequence.h		sequence.h
version.c		version.c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RepeatAfterMe

RAMExtend

History

About

Releases 2

Packages

Languages

License

Dfam-consortium/RepeatAfterMe

Folders and files

Latest commit

History

Repository files navigation

RepeatAfterMe

RAMExtend

History

About

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages