Skip to content

Sequence-based predictor of genomic DNA G-quadruplex formation propensity.

Notifications You must be signed in to change notification settings

aleksahak/Quadron

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

---
pdf_document: default
output: pdf_document
html_document:
  highlight: tango
word_document: default
---

# Quadron: predictor of sequence-driven genomic DNA quadruplex formation propensity

<img src="QuadronGUI/Quadron_logo.png" align="middle" height="180" width="160" margin="0 auto" />

The Quadron methodology, developed via gradient boosting machines trained on the experimental G4-seq data ([**Balasubramanian Laboratory**](http://www.ch.cam.ac.uk/group/shankar)), is described and validated in detail at <http://doi.org/10.1038/s41598-017-14017-4>.

## Installation

**1.** Install the latest version of R programming language or skip to the next step. Step-wise instructions on R installation and upgrade for Linux and MacOSX can be found through the following links: [1](http://alexonscience.blogspot.co.uk/2015/10/installing-r-on-linuxunix-machines-from.html), [2](http://alexonscience.blogspot.co.uk/2015/10/upgrade-r-environmentlibraries-mac-osx.html).

**2.** Launch R from the command line and install the following R packages from within R, including *shiny* (required for the graphical user interface), *doMC*, *foreach* and *itertools* (required for a parallel execution of the program).

```{bash, tidy=TRUE, echo=TRUE, eval=FALSE}
$ R
```
```{r, tidy=TRUE, echo=TRUE, eval=FALSE}
> install.packages("plyr")
> install.packages("data.table")
> install.packages("caret")
> install.packages("shiny")
> install.packages("doMC")
> install.packages("foreach")
> install.packages("itertools")
```

**3.** Download the Quadron source code from the [GitHub](http://github.com/aleksahak/Quadron) repository or its interfacing [home page](http://quadron.atgcdynamics.org). You can also do that via a Linux/Unix/OSX command line, given that git is installed, by typing the following:

```{bash, tidy=TRUE, echo=TRUE, eval=FALSE}
$ git clone https://github.com/aleksahak/Quadron
```

The downloaded folder has the following content:

- **lib/** - the subfolder containing all the source files,
- **QuadronGUI/** - the subfolder containing the graphical user interface,
- **Quadron.R** - the interfacing R script used to execute Quadron from command line.

**4.** Install the specific, 0.4-4, version of *xgboost* R library. This specific version is required since there has been a compatibility breach in mapping the model tree structures in subsequent versions of *xgboost*. The easiest way to install this version for R on Linux operating system, with all the necessary compilers available, is through the source code distributed through the native R CRAN archive. To do so, launch R and type the following command:

```{bash, tidy=TRUE, echo=TRUE, eval=FALSE}
$ R
```
```{r, tidy=TRUE, echo=TRUE, eval=FALSE}
# For Linux:
> install.packages(
    "http://cran.r-project.org/src/contrib/Archive/xgboost/xgboost_0.4-4.tar.gz",
    repos=NULL,
    type="source")
```

Theoretically, the above command should work with MacOSX as well, however, some versions of the clang compiler in MacOSX do not support the fopenmp compiler keyword, hence installation from binaries is a safer routh. The pre-compiled binaries of *xgboost_0.4-4* are distributed with Quadron (also accessible from Revolution Analytics MRAN repository). They are located in the **lib/xgboost_0.4_4** subdirectory. For the installation, execute the following commands for MacOSX and Windows operating systems respectively (from command prompt, while located in the **lib/xgboost_0.4_4** subdirectory of the downloaded Quadron folder):

```{bash, tidy=TRUE, echo=TRUE, eval=FALSE}
# For MacOSX:
$ cd lib/xgboost_0.4_4
$ R CMD INSTALL macosx_bin/xgboost_0.4-4.tgz

# For Windows, first launch R:
$ R
```

```{r, tidy=TRUE, echo=TRUE, eval=FALSE}
# Next, from within R (for Windows):
> install.packages(
    "windows_bin/xgboost_0.4-4.zip",
    repos=NULL,
    type="binary")
```

**5.** Finally, you need to bit compile the Quadron package by going into the **lib/** subfolder and executing **bitcompile.R** code from within R.

```{bash, tidy=TRUE, echo=TRUE, eval=FALSE}
$ cd lib/
$ R
```
```{r, tidy=TRUE, echo=TRUE, eval=FALSE}
> source("bitcompile.R")
```

This generates a single file, **Quadron.lib**, which encapsulates the main Quadron code and all its dependencies. At this stage, the subfolder **lib/** can be safely removed. You might, however, want to copy the **test.fasta** file from inside **lib/**, in order to test the Quadron installation.

At this point, the Quadron installation folder should contain:

- **Quadron.lib** - the bit-compiled Quadron program,
- **QuadronGUI/** - the subfolder containing the graphical user interface,
- **Quadron.R** - the interfacing R script used to execute Quadron from command line,

and, if the test sequence file is preserved,

- **test.fasta** - the example DNA sequence fasta file.

## Running Quadron from R

Quadron can be executed as an R program, either from within R, or from the Linux/Unix/OSX command line through R CMD BATCH or Rscript execution. The latter two options allow the usage of Quadron from the scripts written via programming languages other than R.

In R, as exemplified in the **Quadron.R** interfacing script, one should load the **Quadron.lib** bit-compiled file, then execute Quadron via the R function *Quadron()*. The latter accepts three arguments:

- *FastaFile* - the relative or absolute path to the fasta file to analyse,
- *OutFile* - the relative or absolute path to the output file to be saved,
- *nCPU* - the number of CPUs to be used for the calculation, where the larger values can markedly speed up the PQS mapping stage of Quadron predictions for large genomes.

Alternatively, the **Quadron.R** file can be edited to set the desired arguments, and executed from the command line via R CMD BATCH or Rscript.

## Running Quadron from GUI (Graphical User Interface)

Quadron features a browser-based graphical user interface (GUI), written with Shiny that can be executed locally on as many CPUs as desired. To launch the GUI, enter the **QuadronGUI/** subfolder and double click on **QuadronGUI** file. If the file fails to open a browser, make sure that the permissions are correctly set for the file (as executable):

```{bash, tidy=TRUE, echo=TRUE, eval=FALSE}
$ cd QuadronGUI/
$ chmod 700 QuadronGUI # For setting the permissions to open. May require sudo rights.
```

If double clicking on **QuadronGUI** does not launch your browser after the above step, then, most probably, your Rscript utility of R is not installed on the default */usr/bin/Rscript* path. To correct the QuadronGUI setup, first find out the installed path for Rscript by typing the following from the command line:

```{bash, tidy=TRUE, echo=TRUE, eval=FALSE}
$ which Rscript
```

then open the **QuadronGUI** executable file via a usual plain text editor and correct the Rscript path stated at the first line.

Now double click on **QuadronGUI**. As soon as the browser opens, the rest is self explanatory.

## License

You may redistribute the Quadron source code (or its components) and/or modify/translate it under the terms of the GNU General Public License as published by the Free Software Foundation; version 2 of the License (GPL2). You can find the details of GPL2 in the following link: <http://www.gnu.org/licenses/gpl-2.0.html>.

Questions and bug reports to Aleksandr B. Sahakyan via [aleksahak *at* cantab *dot* net].

About

Sequence-based predictor of genomic DNA G-quadruplex formation propensity.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages