Skip to content

Kinggerm/Arachis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Arachis

Introduction

Arachis is a Python library for analyzing genome rearrangements. It allows users to reconstruct ancestral genome gene orders and infer pairwise genome differences or events.

Algorithms & Features

The algorithm for reconstructing ancestral genome gene orders implemented in the script file run_pypmag.py is derived from the ancestral gene order reconstruction module of PMAG+, with modifications:

  1. Circular and gap-containing genomes is allowed as inputs. See modifications on GRIMM below.
  2. Equipped with python multiprocessing.
  3. More flexible in input data format (both tree and GRIMM).

This library defines a new version of the classic GRIMM format with following modifications:

  1. Blocks could be named with letters in "-.0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz|~[]". "-" still means a reverse direction when it appears at the first letter.
  2. The "*" block in a sequence means a gap. 1 2 3 * 4 5 * 6 $ is equivalent to 1 2 3 * -5 -4 * 6 $.
  3. A sequence line without the "$" block in the end stands for a circular chromosome. A circular sequence a b c d e is equivalent to b c d e a, also equivalent to -e -d -c -b -a. But in this case only one chromosome per sample is allowed, and multiple lines without "$" would be regarded as one single chromosome written interleaved. This design is due to the limitation of applying to tsp solver.

The functions in Arachis for inferring pairwise genome differences or events are still at infant stage. If you find any bug or something to improve, please contact [email protected]. New contributors are welcome! Also, users have to bear in mind that do not test data with too many breakpoints (like 10+). At this stage, the function Chromosome.inversion_event_from utilizes an exhausted scheme searching for one best solution. Currently, I am using it to play with highly rearranged plastome data of legumes. It's worth trying more small permutations, like some plant mitochondrial data.

Installation

Download Arachis and install Arachis with:

$ git clone "https://github.com/Kinggerm/Arachis"
$ cd Arachis
$ python setup.py install

To further use run_pypmag.py to reconstruct ancestral genome gene order, you have to install following dependencies:

  • DendroPy The tree parser in Arachis. Get it here.
  • RAxML The reconstruction engine in the algorithm of PMAG+. The single thread version is preferred. Get it here.
  • Concorde The TSP (Traveling Salesman Problem) Solver in the algorithm of PMAG+. Get it here.

Example

  • To check whether two circular permutations, -e -d -c -b -a and a b c d e, are equivalent:

        # open python shell
        >>> from arachis.genomeClass import Chromosome
        >>> seq1 = Chromosome("-e -d -c -b -a")
        >>> seq2 = Chromosome("a b c d e")
        >>> seq1 == seq2
        True
  • If you want to see how many flip-flop configurations (isomers) could be induced by several groups of inverted repeats, or in another similar case, to see how many reasonable paths are there in a complicated assembly graph with repeats that could not be unfolded by short seq-library, try this:

       # open python shell
       >>> from arachis.genomeClass import Chromosome
       >>> Picea = Chromosome("1 2 12 14 13 2 3 4 10 8 15 14 11 4 5 6 7 8 9 -6")
       >>> isomers, changes = Picea.get_isomers()
       >>> print(len(isomers))
       14
    
       >>> for isomer in isomers:
               print(isomer)
       1 2 12 14 13 2 3 4 10 8 15 14 11 4 5 6 -9 -8 -7 -6
       1 2 12 14 13 2 3 4 10 8 9 -6 -5 -4 -11 -14 -15 -8 -7 -6
       1 2 12 14 11 4 5 6 -9 -8 -10 -4 -3 -2 -13 -14 -15 -8 -7 -6
       1 2 12 14 13 2 3 4 5 6 -9 -8 -10 -4 -11 -14 -15 -8 -7 -6
       1 2 3 4 10 8 9 -6 -5 -4 -11 -14 -12 -2 -13 -14 -15 -8 -7 -6
       1 2 12 14 11 4 10 8 9 -6 -5 -4 -3 -2 -13 -14 -15 -8 -7 -6
       1 2 12 14 11 4 5 6 7 8 15 14 13 2 3 4 10 8 9 -6
       1 2 12 14 13 2 3 4 5 6 7 8 15 14 11 4 10 8 9 -6
       1 2 3 4 5 6 -9 -8 -10 -4 -11 -14 -12 -2 -13 -14 -15 -8 -7 -6
       1 2 3 4 10 8 15 14 13 2 12 14 11 4 5 6 -9 -8 -7 -6
       1 2 12 14 11 4 10 8 15 14 13 2 3 4 5 6 -9 -8 -7 -6
       1 2 3 4 5 6 7 8 15 14 13 2 12 14 11 4 10 8 9 -6
       1 2 3 4 10 8 15 14 13 2 12 14 11 4 5 6 7 8 9 -6
       1 2 12 14 11 4 10 8 15 14 13 2 3 4 5 6 7 8 9 -6
    
  • Run run_pypmag.py to reconstruct ancestral genome gene order of test data:

      run_pypmag.py -d test/test_1_grimm.txt -t test/test_1_rooted.tre -o test/test_1_output --seed 12345
    
  • To see parsimonious events along the branch from A1 to sp2 in above test_1 running results:

       # open python shell
       >>> from arachis.genomeClass import GenomeList
       >>> extant_samples = GenomeList("test/test_1_grimm.txt")
       >>> sp2 = extant_samples["sp2"].chromosomes()[0]
       >>> ancestors = GenomeList("test/test_1_output/OutputGeneOrder")
       >>> A1 = ancestors["A1"].chromosomes()[0]
       >>> events = sp2.event_from(A1)
               Breakpoints: 2
                   Round 1: inherited combinations: 1; inversion sites:  2; time: 0.0002s; memory: 0.01G
               Inversions: 1 + 0(iso)
               Total inversion time: 0.0006s
    

Citation

If you use Arachis in your research, you could cite Arachis as:

If you use run_pypmag.py, please cite following papers:

Acknowledgement

I thank Stephen Smith, Joseph Brown, and Caroline Parins-Fukuchi for discussions.

License

GNU General Public License, version 3