-
Notifications
You must be signed in to change notification settings - Fork 0
Motif discovery for DNA sequences using multiobjective optimization and genetic programming.
License
mbelmadani/motifgp
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
=============== = MotifGP 0.2 = =============== MotifGP is a de novo motif discovery tool for discriminatory network expression identification in ChIP-seq datasets. Original author: Manuel Belmadani [email protected] The project is documented by the following publications. Manuel Belmadani and Marcel Turcotte. MotifGP: Using multi-objective evolutionary computing for mining network expressions in DNA sequences. In IEEE International Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB 2016), Chiang Mai, Thailand, October, 5-7, 2016. https://doi.org/10.1109/CIBCB.2016.7758133 Manuel Belmadani. MotifGP: DNA motif discovery using multiobjective evolution. Master of computer science, University of Ottawa, School of Electrical Engineering and Computer Science, 2016. Available from University of Ottawa Research under: http://www.ruor.uottawa.ca/handle/10393/34213 Acknowledgements: MotifGP is using source code from these tools: -hypergeometric.py from the MEME Suite (License and copyright in source file). -altschulEriksonDinuclShuffle.py from Peter Clote - CLOTE Computational Biology LAB, http://clavius.bc.edu/~clotelab/RNAdinucleotideShuffle/ This software was also made using the DEAP - Fortin, F.-A., De Rainville, F.-M., Gardner, M.-A. G., Parizeau, M. & Gagné, C. DEAP: Evolutionary Algorithms Made Easy. J. Mach. Learn. Res. 13, 2171–2175 (2012). ======================================================================================= License: (see LICENSE.txt) ======================================================================================= Installation: (see INSTALL.txt) ======================================================================================= Examples: (see EXAMPLES.txt) ======================================================================================= Usage: motifgp.py [options] Options: -h, --help show this help message and exit -p TRAINING_PATH, --training=TRAINING_PATH Fasta file to use for training (input) sequence data -b BACKGROUND_PATH, --background=BACKGROUND_PATH [Optional] Fasta file to use for background (control) sequence data. If not provided, a the generated control sequences will be written to runtime_tmp/ -m MOO, --moo=MOO Multi-objective optimization [SPEA2, NSGA2, NSGAR, MOEAD]. NSGAR is the NSGA-II_R (NSGA-II Revised) algorithm improvement of NSGA2. -f FITNESS, --fitness=FITNESS Objective fitness function. Available objectives: D=Di scrimination,F=Fisher,I=ScipyFisher,O=OddsRatio,Q=Fals eDiscoveryRate,S=Support,R=ScipyOddsRatio. Each single character in the string represents an objective. Objectives are mapped by the configuration file at config/objectives. Default is 'DF' for [Discrimination,Fisher] (2-objectives). --cxpb=CXPB Probability [0.0 to 1.0] for a crossover during variation. Requires --mutpb to be set to (1.0-cxpb). Default is 0.7. --mutpb=MUTPB Probability [0.0 to 1.0] for a mutation during variation. Requires --cxpb to be set to (1.0-mutpb). Default is 0.3. --short=SHORT Stops reading in after <SHORT> input sequences. --popsize=POPSIZE Size of the population. --revcomp Compile regex with reverse complement --random-seed=RANDOM_SEED Random seed value to set for execution -n NGEN, --num-gen=NGEN Generation where runtime stops (even in the case of resumed checkpoints) --timelimit=TIMELIMIT Time limit on the GP loop execution. --matcher=MATCHER Use a different matcher. Options: 'grep', 'python'. 'grep' is faster on large datasets, while 'python' is a pure python version in case the system doesn't support grep. -o OUTPUT_PATH, --output=OUTPUT_PATH Output directory. Default is ./OUT/ -t TAG, --tag=TAG A tag for the output subdirectory. Use to describes the run and saves it in the tag's subdirectory in the output directory. default is 'default'. -i, --inspector Don't print any files. Can be useful with python -i (interactive mode). --hardmask Replace tandem repeats (lower-case typed nucleotides) by N -g GRAMMAR, --grammar=GRAMMAR Grammar for the STGP [min, iupac, full, ne]. Default is iupac. 'min' only uses nucleotides. 'iupac' is a network expression grammar. 'full' is a network expression grammar with additional regular expression tokens. 'ne' is like iupac, but built with string primitives instead of booleans. -e ERASE, --erase=ERASE Input .nef(t) file to delete from the dataset prior to execution. Used for sequential coverage. --backpad Pads background sequences with consecutive nucleotides (ie. AAAAAAAA,CCCCCCCC,GGGGGGGG,TTTTTTTT) of length 8 every set of 4 sequences. --bg-algo=BG_ALGO Shuffling algorithm for background. Default is 'dinuclShuffle', if no background dataset it provided. Currently, dinuclShuffle is the only implemented method. --ncpu=NCPU Number of CPUs to use when mapping evaluation of solutions. Use an integer, "auto" to automatically dertmine the maximum number. Default is no parallelism. --termination=TERMINATION Use automatic termination algorithm. User 'auto' to used the automatic termination algorithm for MOEAs. --hamming [Experimental] Generates statistics on the hamming distance from a template regex and hof candidates. --seeded-population [Experimental] Use population seeds -c CHECKPOINT_PATH, --checkpoint=CHECKPOINT_PATH [Temporarily disabled] Load a checkpoint at path. -q, --quiet [Unimplemented] don't print status messages to stdout Also consider looking at EXAMPLES.txt for basic examples of MotifGP usage.
About
Motif discovery for DNA sequences using multiobjective optimization and genetic programming.
Topics
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published