CHARLIE (Circrnas in Host And viRuses anaLysis pIpEline)
404 - Not found
\ No newline at end of file
+ CHARLIE (Circrnas in Host And viRuses anaLysis pIpEline)
404 - Not found
\ No newline at end of file
diff --git a/dev/CHANGELOG/index.html b/dev/CHANGELOG/index.html
index 949be32..d49a721 100644
--- a/dev/CHANGELOG/index.html
+++ b/dev/CHANGELOG/index.html
@@ -1 +1 @@
- Changelog - CHARLIE (Circrnas in Host And viRuses anaLysis pIpEline)
Major updates to convert CHARLIE from a biowulf-specific to a platform-agnostic pipeline (#102, @kelly-sovacool):
All rules now use containers instead of envmodules.
Default config and cluster config files are provided for use on biowulf and FRCE.
New entry TEMPDIR in the config file sets the temporary directory location for rules that require transient storage.
New --singcache argument to provide a singularity cache dir location. The singularity cache dir is automatically set inside /data/$USER/ or $WORKDIR/ if --singcache is not provided.
create sense and anti-sense BSJ BAMs and BW for each reference (host+viruses)
find reads which contribute to CIRI BSJs but not on the STAR list of BSJ reads, see if they contribute to novel (not called by STAR) BSJs and append novel BSJs to customBSJ list
circ_quant replaces clear_quant in the CLEAR rule. In order words, we are reusing the STAR alignment file and the circExplorer2 output file for running CLEAR. No need to run HISAT2 and TopHat (fusion-search with Bowtie1). This is much quicker.
Major updates to convert CHARLIE from a biowulf-specific to a platform-agnostic pipeline (#102, @kelly-sovacool):
All rules now use containers instead of envmodules.
Default config and cluster config files are provided for use on biowulf and FRCE.
New entry TEMPDIR in the config file sets the temporary directory location for rules that require transient storage.
New --singcache argument to provide a singularity cache dir location. The singularity cache dir is automatically set inside /data/$USER/ or $WORKDIR/ if --singcache is not provided.
create sense and anti-sense BSJ BAMs and BW for each reference (host+viruses)
find reads which contribute to CIRI BSJs but not on the STAR list of BSJ reads, see if they contribute to novel (not called by STAR) BSJs and append novel BSJs to customBSJ list
circ_quant replaces clear_quant in the CLEAR rule. In order words, we are reusing the STAR alignment file and the circExplorer2 output file for running CLEAR. No need to run HISAT2 and TopHat (fusion-search with Bowtie1). This is much quicker.
\ No newline at end of file
diff --git a/dev/contributing/index.html b/dev/contributing/index.html
index 596c420..63cbdb8 100644
--- a/dev/contributing/index.html
+++ b/dev/contributing/index.html
@@ -1,4 +1,4 @@
- How to contribute - CHARLIE (Circrnas in Host And viRuses anaLysis pIpEline)
If you are a member of CCBR, you can clone this repository to your computer or development environment. Otherwise, you will first need to fork the repo and clone your fork. You only need to do this step once.
gitclonehttps://github.com/CCBR/CHARLIE
+ How to contribute - CHARLIE (Circrnas in Host And viRuses anaLysis pIpEline)
If you are a member of CCBR, you can clone this repository to your computer or development environment. Otherwise, you will first need to fork the repo and clone your fork. You only need to do this step once.
DISCLAIMER: This chart is for v0.8.x may be slightly outdated.
\ No newline at end of file
diff --git a/dev/index.html b/dev/index.html
index ed4bdb2..9c47f57 100644
--- a/dev/index.html
+++ b/dev/index.html
@@ -1,4 +1,5 @@
- CHARLIE (Circrnas in Host And viRuses anaLysis pIpEline)
Reach out to Vishal Koparde for questions/comments/requests.
This circularRNA detection pipeline uses CIRCExplorer2, CIRI2 and many other tools in parallel to detect, quantify and annotate circRNAs. Here is a list of tools that can be run using CHARLIE:
Reach out to Vishal Koparde for questions/comments/requests.
This circularRNA detection pipeline uses CIRCExplorer2, CIRI2 and many other tools in parallel to detect, quantify and annotate circRNAs. Here is a list of tools that can be run using CHARLIE:
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Test data (1 paired-end subsample and 1 single-end subsample) have been including under the .tests/dummy_fastqs folder. After running in -m=init, samples.tsv should be edited to point the copies of the above mentioned samples with the column headers:
sampleName
path_to_R1_fastq
path_to_R2_fastq
Column path_to_R2_fastq will be blank in case of single-end samples.
Test data (1 paired-end subsample and 1 single-end subsample) have been including under the /data/CCBR_Pipeliner/testdata/circRNA/human folder. After running in -m=init, samples.tsv should be edited to point the copies of the above mentioned samples with the column headers:
sampleName
path_to_R1_fastq
path_to_R2_fastq
Column path_to_R2_fastq will be blank in case of single-end samples.
After editing samples.tsv, dry run should be run:
bash<pathtocharlie>-w=<pathtooutputdir>-m=dryrun
This will create the reference fasta and gtf file based on the selections made in the config.yaml.
If -m=dryrun was successful, then simply do -m=run. The output will look something like this
... ... skipping ~1000 lines
...
...
@@ -140,9 +141,10 @@
This was a dry-run (flag -n). The order of jobs does not reflect the order of execution.
Running...
14743440
-
Expected output from the sample data is stored under .tests/expected_output.
More details about running test data can be found here.
DISCLAIMER:
CHARLIE is built to be run only on BIOWULF. A newer HPC-agnostic version of CHARLIE is planned for 2024.
\ No newline at end of file
diff --git a/dev/platforms/index.html b/dev/platforms/index.html
index 482197e..d320fcd 100644
--- a/dev/platforms/index.html
+++ b/dev/platforms/index.html
@@ -1,3 +1,3 @@
- Platforms - CHARLIE (Circrnas in Host And viRuses anaLysis pIpEline)
CHARLIE (Circrnas in Host And viRuses anaLysis pIpEline)
CHARLIE was originally developed to run on biowulf, but it can run on other computing platforms too. There are a few additional steps to configure CHARLIE.
TODO
Clone CHARLIE.
gitclonehttps://github.com/CCBR/charlie
+ Platforms - CHARLIE (Circrnas in Host And viRuses anaLysis pIpEline)
CHARLIE (Circrnas in Host And viRuses anaLysis pIpEline)
CHARLIE was originally developed to run on biowulf, but it can run on other computing platforms too. There are a few additional steps to configure CHARLIE.
TODO
Clone CHARLIE.
gitclonehttps://github.com/CCBR/charlie
Initialize your project working directory.
Create a directory of reference files.
Edit your project's config file.
If you are using a SLURM job scheduler, edit cluster.json and submit_script.sbatch.
\ No newline at end of file
diff --git a/dev/references/index.html b/dev/references/index.html
index 0187b5d..162b4a0 100644
--- a/dev/references/index.html
+++ b/dev/references/index.html
@@ -1,4 +1,4 @@
- References - CHARLIE (Circrnas in Host And viRuses anaLysis pIpEline)
hg38 and mm39 genome builds are chosen to represent hosts. Ribosomal sequences (45S, 5S) are downloaded from NCBI. hg38 and mm39 were masked for rRNA sequence and 45S and 5S sequences from NCBI are appended as separate chromosomes. The following viral sequences were appended to the rRNA masked hg38 reference:
HOSTS:
+ References - CHARLIE (Circrnas in Host And viRuses anaLysis pIpEline)
hg38 and mm39 genome builds are chosen to represent hosts. Ribosomal sequences (45S, 5S) are downloaded from NCBI. hg38 and mm39 were masked for rRNA sequence and 45S and 5S sequences from NCBI are appended as separate chromosomes. The following viral sequences were appended to the rRNA masked hg38 reference:
Snakemake workflow to detect, annotate and quantify (DAQ) host and viral circular RNAs.
Primirarily developed to run on BIOWULF
Reach out to Vishal Koparde for questions/comments/requests.
This circularRNA detection pipeline uses CIRCExplorer2, CIRI2 and many other tools in parallel to detect, quantify and annotate circRNAs. Here is a list of tools that can be run using CHARLIE:
circRNA Detection Tool Aligner(s) Run by default CIRCExplorer2 STAR1 Yes CIRI2 BWA1 Yes CIRCExplorer2 BWA1 Yes CLEAR STAR1 Yes DCC STAR2 Yes circRNAFinder STAR3 Yes find_circ Bowtie2 Yes MapSplice BWA2 No NCLScan NovoAlign No
Note: STAR1, STAR2, STAR3 denote 3 different sets of alignment parameters, etc.
Note: BWA1, BWA2 denote 2 different alignment parameters, etc.
% ./charlie\n\n\n##########################################################################################\n\nWelcome to\n _______ __ __ _______ ______ ___ ___ _______\n| || | | || _ || _ | | | | | | |\n| || |_| || |_| || | || | | | | | ___|\n| || || || |_||_ | | | | | |___\n| _|| || || __ || |___ | | | ___|\n| |_ | _ || _ || | | || || | | |___\n|_______||__| |__||__| |__||___| |_||_______||___| |_______|\n\nC_ircrnas in H_ost A_nd vi_R_uses ana_L_ysis p_I_p_E_line\n\n##########################################################################################\n\nThis pipeline was built by CCBR (https://bioinformatics.ccr.cancer.gov/ccbr)\nPlease contact Vishal Koparde for comments/questions (vishal.koparde@nih.gov)\n\n##########################################################################################\n\nCHARLIE can be used to DAQ(Detect/Annotate/Quantify) circRNAs in hosts and viruses.\n\nHere is the list of hosts and viruses that are currently supported:\n\nHOSTS:\n * hg38 [Human]\n * mm39 [Mouse]\n\nADDITIVES:\n * ERCC [External RNA Control Consortium sequences]\n * BAC16Insert [insert from rKSHV.219-derived BAC clone of the full-length KSHV genome]\n\nVIRUSES:\n * NC_007605.1 [Human gammaherpesvirus 4 (Epstein-Barr virus)]\n * NC_006273.2 [Human betaherpesvirus 5 (Cytomegalovirus )]\n * NC_001664.4 [Human betaherpesvirus 6A (HHV-6A)]\n * NC_000898.1 [Human betaherpesvirus 6B (HHV-6B)]\n * NC_001716.2 [Human betaherpesvirus 7 (HHV-7)]\n * NC_009333.1 [Human gammaherpesvirus 8 (KSHV)]\n * NC_045512.2 [Severe acute respiratory syndrome(SARS)-related coronavirus]\n * MN485971.1 [HIV from Belgium]\n * NC_001806.2 [Human alphaherpesvirus 1 (Herpes simplex virus type 1)](strain 17) (HSV-1)]\n * KT899744.1 [HSV-1 strain KOS]\n * MH636806.1 [MHV68 (Murine herpesvirus 68 strain WUMS)]\n\n##########################################################################################\n\nUSAGE:\n bash /data/Ziegelbauer_lab/Pipelines/circRNA/activeDev/charlie -w/--workdir=<WORKDIR> -m/--runmode=<RUNMODE>\n\nRequired Arguments:\n1. WORKDIR : [Type: String]: Absolute or relative path to the output folder with write permissions.\n\n2. RUNMODE : [Type: String] Valid options:\n * init : initialize workdir\n * dryrun : dry run snakemake to generate DAG\n * run : run with slurm\n * runlocal : run without submitting to sbatch\n ADVANCED RUNMODES (use with caution!!)\n * unlock : unlock WORKDIR if locked by snakemake NEVER UNLOCK WORKDIR WHERE PIPELINE IS CURRENTLY RUNNING!\n * reconfig : recreate config file in WORKDIR (debugging option) EDITS TO config.yaml WILL BE LOST!\n * reset : DELETE workdir dir and re-init it (debugging option) EDITS TO ALL FILES IN WORKDIR WILL BE LOST!\n * printbinds: print singularity binds (paths)\n * local : same as runlocal\n\nOptional Arguments:\n\n--singcache|-c : singularity cache directory. Default is `/data/${USER}/.singularity` if available, or falls back to `${WORKDIR}/.singularity`. Use this flag to specify a different singularity cache directory.\n--host|-g : supply host at command line. hg38 or mm39. (--runmode=init only)\n--additives|-a : supply comma-separated list of additives at command line. ERCC or BAC16Insert or both (--runmode=init only)\n--viruses|-v : supply comma-separated list of viruses at command line (--runmode=init only)\n--manifest|-s : absolute path to samples.tsv. This will be copied to output folder (--runmode=init only)\n--changegrp|-z : change group to \"Ziegelbauer_lab\" before running anything. Biowulf-only. Useful for correctly setting permissions.\n--help|-h : print this help\n\n\nExample commands:\n bash /data/Ziegelbauer_lab/Pipelines/circRNA/activeDev/charlie -w=/my/output/folder -m=init\n bash /data/Ziegelbauer_lab/Pipelines/circRNA/activeDev/charlie -w=/my/output/folder -m=dryrun\n bash /data/Ziegelbauer_lab/Pipelines/circRNA/activeDev/charlie -w=/my/output/folder -m=run\n\n##########################################################################################\n\nVersionInfo:\n python : 3.7\n snakemake : 7.19.1\n pipeline_home : /vf/users/Ziegelbauer_lab/Pipelines/circRNA/activeDev\n git commit/tag : 1ae5ca091976364369784f67adffbbbf1dcdb7d5 v0.8-197-g1ae5ca0\n\n##########################################################################################\n
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Test data (1 paired-end subsample and 1 single-end subsample) have been including under the .tests/dummy_fastqs folder. After running in -m=init, samples.tsv should be edited to point the copies of the above mentioned samples with the column headers:
sampleName
path_to_R1_fastq
path_to_R2_fastq
Column path_to_R2_fastq will be blank in case of single-end samples.
After editing samples.tsv, dry run should be run:
bash <path to charlie> -w=<path to output dir> -m=dryrun\n
This will create the reference fasta and gtf file based on the selections made in the config.yaml.
"},{"location":"#run","title":"Run","text":"
If -m=dryrun was successful, then simply do -m=run. The output will look something like this
Expected output from the sample data is stored under .tests/expected_output.
More details about running test data can be found here.
DISCLAIMER:
CHARLIE is built to be run only on BIOWULF. A newer HPC-agnostic version of CHARLIE is planned for 2024.
"},{"location":"CHANGELOG/","title":"Changelog","text":""},{"location":"CHANGELOG/#charlie-development-version","title":"CHARLIE development version","text":""},{"location":"CHANGELOG/#bug-fixes","title":"bug fixes","text":"
CHARLIE was falsely throwing a file permissions error for tempdir values containing bash variables. (#118, @kelly-sovacool)
Singularity bind paths were not being set properly. (#119, @kelly-sovacool)
Update docker containers to set $PYTHONPATH. (#119, #125, @kelly-sovacool)
Otherwise, this environment variable can be carried over and cause package conflicts when singularity is not run with -C.
Also use python -E to ensure the $PYTHONPATH is not carried over. (#129, @kelly-sovacool)
Fix reconfig to correctly replace variables in the config file. (#121, @kelly-sovacool)
Prevent using excessive memory when copying reference files. (#126, @kelly-sovacool)
Fix missing output files due to file system latency and use real (absolute) paths where possible. (#130, @kelly-sovacool)
Major updates to convert CHARLIE from a biowulf-specific to a platform-agnostic pipeline (#102, @kelly-sovacool):
All rules now use containers instead of envmodules.
Default config and cluster config files are provided for use on biowulf and FRCE.
New entry TEMPDIR in the config file sets the temporary directory location for rules that require transient storage.
New --singcache argument to provide a singularity cache dir location. The singularity cache dir is automatically set inside /data/$USER/ or $WORKDIR/ if --singcache is not provided.
Minor documentation improvements. (#114, @kelly-sovacool)
create sense and anti-sense BSJ BAMs and BW for each reference (host+viruses)
find reads which contribute to CIRI BSJs but not on the STAR list of BSJ reads, see if they contribute to novel (not called by STAR) BSJs and append novel BSJs to customBSJ list
circ_quant replaces clear_quant in the CLEAR rule. In order words, we are reusing the STAR alignment file and the circExplorer2 output file for running CLEAR. No need to run HISAT2 and TopHat (fusion-search with Bowtie1). This is much quicker.
Using picard to estimate duplicates using MarkDuplicates
Generating a per-run multiqc HTML report
Using eulerr R package to generate CIRI-CircExplorer circRNA Venn diagrams and include them in the mulitqc report
Gather per job cluster metadata like queue time, run time, job state etc. Stats are compiled in HPC_summary file
CLEAR pipeline quant.txt file is annotated for known circRNAs
WORKDIR can now be a relative path
bam2bw conversion fix for BSJ and spliced_reads. Issue closed!
"},{"location":"contributing/","title":"Contributing to CHARLIE","text":""},{"location":"contributing/#proposing-changes-with-issues","title":"Proposing changes with issues","text":"
If you want to make a change, it's a good idea to first open an issue and make sure someone from the team agrees that it\u2019s needed.
If you've decided to work on an issue, assign yourself to the issue so others will know you're working on it.
We use GitHub Flow as our collaboration process. Follow the steps below for detailed instructions on contributing changes to CHARLIE.
"},{"location":"contributing/#clone-the-repo","title":"Clone the repo","text":"
If you are a member of CCBR, you can clone this repository to your computer or development environment. Otherwise, you will first need to fork the repo and clone your fork. You only need to do this step once.
"},{"location":"contributing/#if-this-is-your-first-time-cloning-the-repo-you-may-need-to-install-dependencies","title":"If this is your first time cloning the repo, you may need to install dependencies","text":"
Install snakemake and singularity or docker if needed (biowulf already has these available as modules).
Install the python dependencies with pip
pip install .\n
If you're developing on biowulf, you can use our shared conda environment which already has these dependencies installed
Install pre-commit if you don't already have it. Then from the repo's root directory, run
pre-commit install\n
This will install the repo's pre-commit hooks. You'll only need to do this step the first time you clone the repo.
"},{"location":"contributing/#create-a-branch","title":"Create a branch","text":"
Create a Git branch for your pull request (PR). Give the branch a descriptive name for the changes you will make, such as iss-10 if it is for a specific issue.
# create a new branch and switch to it\ngit branch iss-10\ngit switch iss-10\n
Switched to a new branch 'iss-10'
"},{"location":"contributing/#make-your-changes","title":"Make your changes","text":"
Edit the code, write and run tests, and update the documentation as needed.
Changes to the python package code will also need unit tests to demonstrate that the changes work as intended. We write unit tests with pytest and store them in the tests/ subdirectory. Run the tests with python -m pytest.
If you change the workflow, please run the workflow with the test profile and make sure your new feature or bug fix works as intended.
If you have added a new feature or changed the API of an existing feature, you will likely need to update the documentation in docs/.
"},{"location":"contributing/#commit-and-push-your-changes","title":"Commit and push your changes","text":"
If you're not sure how often you should commit or what your commits should consist of, we recommend following the \"atomic commits\" principle where each commit contains one new feature, fix, or task. Learn more about atomic commits here: https://www.freshconsulting.com/insights/blog/atomic-commits/
First, add the files that you changed to the staging area:
git add path/to/changed/files/\n
Then make the commit. Your commit message should follow the Conventional Commits specification. Briefly, each commit should start with one of the approved types such as feat, fix, docs, etc. followed by a description of the commit. Take a look at the Conventional Commits specification for more detailed information about how to write commit messages.
git commit -m 'feat: create function for awesome feature'\n
pre-commit will enforce that your commit message and the code changes are styled correctly and will attempt to make corrections if needed.
Check for added large files..............................................Passed Fix End of Files.........................................................Passed Trim Trailing Whitespace.................................................Failed
hook id: trailing-whitespace
exit code: 1
files were modified by this hook > Fixing path/to/changed/files/file.txt > codespell................................................................Passed style-files..........................................(no files to check)Skipped readme-rmd-rendered..................................(no files to check)Skipped use-tidy-description.................................(no files to check)Skipped
In the example above, one of the hooks modified a file in the proposed commit, so the pre-commit check failed. You can run git diff to see the changes that pre-commit made and git status to see which files were modified. To proceed with the commit, re-add the modified file(s) and re-run the commit command:
git add path/to/changed/files/file.txt\ngit commit -m 'feat: create function for awesome feature'\n
This time, all the hooks either passed or were skipped (e.g. hooks that only run on R code will not run if no R files were committed). When the pre-commit check is successful, the usual commit success message will appear after the pre-commit messages showing that the commit was created.
Check for added large files..............................................Passed Fix End of Files.........................................................Passed Trim Trailing Whitespace.................................................Passed codespell................................................................Passed style-files..........................................(no files to check)Skipped readme-rmd-rendered..................................(no files to check)Skipped use-tidy-description.................................(no files to check)Skipped Conventional Commit......................................................Passed > [iss-10 9ff256e] feat: create function for awesome feature 1 file changed, 22 insertions(+), 3 deletions(-)
Finally, push your changes to GitHub:
git push\n
If this is the first time you are pushing this branch, you may have to explicitly set the upstream branch:
git push --set-upstream origin iss-10\n
Enumerating objects: 7, done. Counting objects: 100% (7/7), done. Delta compression using up to 10 threads Compressing objects: 100% (4/4), done. Writing objects: 100% (4/4), 648 bytes | 648.00 KiB/s, done. Total 4 (delta 3), reused 0 (delta 0), pack-reused 0 remote: Resolving deltas: 100% (3/3), completed with 3 local objects. remote: remote: Create a pull request for 'iss-10' on GitHub by visiting: remote: https://github.com/CCBR/CHARLIE/pull/new/iss-10 remote: To https://github.com/CCBR/CHARLIE > > [new branch] iss-10 -> iss-10 branch 'iss-10' set up to track 'origin/iss-10'.
We recommend pushing your commits often so they will be backed up on GitHub. You can view the files in your branch on GitHub at https://github.com/CCBR/CHARLIE/tree/<your-branch-name> (replace <your-branch-name> with the actual name of your branch).
"},{"location":"contributing/#create-the-pr","title":"Create the PR","text":"
Once your branch is ready, create a PR on GitHub: https://github.com/CCBR/CHARLIE/pull/new/
Select the branch you just pushed:
Edit the PR title and description. The title should briefly describe the change. Follow the comments in the template to fill out the body of the PR, and you can delete the comments (everything between <!-- and -->) as you go. Be sure to fill out the checklist, checking off items as you complete them or striking through any irrelevant items. When you're ready, click 'Create pull request' to open it.
Optionally, you can mark the PR as a draft if you're not yet ready for it to be reviewed, then change it later when you're ready.
"},{"location":"contributing/#wait-for-a-maintainer-to-review-your-pr","title":"Wait for a maintainer to review your PR","text":"
We will do our best to follow the tidyverse code review principles: https://code-review.tidyverse.org/. The reviewer may suggest that you make changes before accepting your PR in order to improve the code quality or style. If that's the case, continue to make changes in your branch and push them to GitHub, and they will appear in the PR.
Once the PR is approved, the maintainer will merge it and the issue(s) the PR links will close automatically. Congratulations and thank you for your contribution!
"},{"location":"contributing/#after-your-pr-has-been-merged","title":"After your PR has been merged","text":"
After your PR has been merged, update your local clone of the repo by switching to the main branch and pulling the latest changes:
git checkout main\ngit pull\n
It's a good idea to run git pull before creating a new branch so it will start from the most recent commits in main.
"},{"location":"contributing/#helpful-links-for-more-information","title":"Helpful links for more information","text":"
CHARLIE was originally developed to run on biowulf, but it can run on other computing platforms too. There are a few additional steps to configure CHARLIE.
TODO
Clone CHARLIE.
git clone https://github.com/CCBR/charlie\n
Initialize your project working directory.
\n
Create a directory of reference files.
Edit your project's config file.
If you are using a SLURM job scheduler, edit cluster.json and submit_script.sbatch.
hg38 and mm39 genome builds are chosen to represent hosts. Ribosomal sequences (45S, 5S) are downloaded from NCBI. hg38 and mm39 were masked for rRNA sequence and 45S and 5S sequences from NCBI are appended as separate chromosomes. The following viral sequences were appended to the rRNA masked hg38 reference:
Location: The entire resource bundle is available at /data/CCBR_Pipeliner/db/PipeDB/charlie/fastas_gtfs on BIOWULF. This location also have additional bash scripts required for aggregating annotations and building indices required by different aligners.
When -m=dryrun is run for the first time after initialization (-m=init), the appropriate host+additives+viruses fasta and gtf files are created on the fly, which are then used to build aligner reference indexes automatically.
The exacts versions listed here may changed as newer versions are added. Also, the dev version is pointing to the most recent untagged version of the pipeline (use at own risk!)
##########################################################################################\n\nWelcome to charlie(v0.10.0-dev)\n _______ __ __ _______ ______ ___ ___ _______\n| || | | || _ || _ | | | | | | |\n| || |_| || |_| || | || | | | | | ___|\n| || || || |_||_ | | | | | |___\n| _|| || || __ || |___ | | | ___|\n| |_ | _ || _ || | | || || | | |___\n|_______||__| |__||__| |__||___| |_||_______||___| |_______|\n\nC_ircrnas in H_ost A_nd vi_R_uses ana_L_ysis p_I_p_E_line\n\n##########################################################################################\n\nThis pipeline was built by CCBR (https://bioinformatics.ccr.cancer.gov/ccbr)\nPlease contact Vishal Koparde for comments/questions (vishal.koparde@nih.gov)\n\n##########################################################################################\n\nCHARLIE can be used to DAQ(Detect/Annotate/Quantify) circRNAs in hosts and viruses.\n\nHere is the list of hosts and viruses that are currently supported:\n\nHOSTS:\n * hg38 [Human]\n * mm39 [Mouse]\n\nADDITIVES:\n * ERCC [External RNA Control Consortium sequences]\n * BAC16Insert [insert from rKSHV.219-derived BAC clone of the full-length KSHV genome]\n\nVIRUSES:\n * NC_007605.1 [Human gammaherpesvirus 4 (Epstein-Barr virus)]\n * NC_006273.2 [Human betaherpesvirus 5 (Cytomegalovirus )]\n * NC_001664.4 [Human betaherpesvirus 6A (HHV-6A)]\n * NC_000898.1 [Human betaherpesvirus 6B (HHV-6B)]\n * NC_001716.2 [Human betaherpesvirus 7 (HHV-7)]\n * NC_009333.1 [Human gammaherpesvirus 8 (KSHV)]\n * NC_045512.2 [Severe acute respiratory syndrome(SARS)-related coronavirus]\n * MN485971.1 [HIV from Belgium]\n * NC_001806.2 [Human alphaherpesvirus 1 (Herpes simplex virus type 1)](strain 17) (HSV-1)]\n * KT899744.1 [HSV-1 strain KOS]\n * MH636806.1 [MHV68 (Murine herpesvirus 68 strain WUMS)]\n\n##########################################################################################\n\nUSAGE:\n bash /data/Ziegelbauer_lab/Pipelines/circRNA/v0.10.0-dev/charlie -w/--workdir=<WORKDIR> -m/--runmode=<RUNMODE>\n\nRequired Arguments:\n1. WORKDIR : [Type: String]: Absolute or relative path to the output folder with write permissions.\n\n2. RUNMODE : [Type: String] Valid options:\n * init : initialize workdir\n * dryrun : dry run snakemake to generate DAG\n * run : run with slurm\n * runlocal : run without submitting to sbatch\n ADVANCED RUNMODES (use with caution!!)\n * unlock : unlock WORKDIR if locked by snakemake NEVER UNLOCK WORKDIR WHERE PIPELINE IS CURRENTLY RUNNING!\n * reconfig : recreate config file in WORKDIR (debugging option) EDITS TO config.yaml WILL BE LOST!\n * reset : DELETE workdir dir and re-init it (debugging option) EDITS TO ALL FILES IN WORKDIR WILL BE LOST!\n * printbinds: print singularity binds (paths)\n * local : same as runlocal\n\nOptional Arguments:\n\n--singcache|-c : singularity cache directory. Default is `/data/${USER}/.singularity` if available, or falls back to `${WORKDIR}/.singularity`. Use this flag to specify a different singularity cache directory.\n--host|-g : supply host at command line. hg38 or mm39. (--runmode=init only)\n--additives|-a : supply comma-separated list of additives at command line. ERCC or BAC16Insert or both (--runmode=init only)\n--viruses|-v : supply comma-separated list of viruses at command line (--runmode=init only)\n--manifest|-s : absolute path to samples.tsv. This will be copied to output folder (--runmode=init only)\n--changegrp|-z : change group to \"Ziegelbauer_lab\" before running anything. Biowulf-only. Useful for correctly setting permissions.\n--help|-h : print this help\n\n\nExample commands:\n bash /data/Ziegelbauer_lab/Pipelines/circRNA/v0.10.0-dev/charlie -w=/my/output/folder -m=init\n bash /data/Ziegelbauer_lab/Pipelines/circRNA/v0.10.0-dev/charlie -w=/my/output/folder -m=dryrun\n bash /data/Ziegelbauer_lab/Pipelines/circRNA/v0.10.0-dev/charlie -w=/my/output/folder -m=run\n\n##########################################################################################\n\nVersionInfo:\n python : 3.7\n snakemake : 7.19.1\n pipeline_home : /vf/users/Ziegelbauer_lab/Pipelines/circRNA/v0.10.0-dev\n git commit/tag : b2cf2f089788651041b16bf4378c2c5172c13cb2 v0.10.0-2-gb2cf2f0\n\n##########################################################################################\n
NOTE: You can replace v0.10.0 in the above command with the latest version to use a newer version. run_circrna_daq.sh was called test.sh in versions older than v0.4.0.
To initial the working directory run:
% bash <path to charlie> -w=<path to output dir> -m=init\n
This assumes that <path to output dir> does not exist before running the above command and is at a location where write permissions are available.
The above command creates <path to output dir> folder and creates 2 subfolders logs and stats inside that folder along with config.yaml and samples.tsv files.
Once the samples.tsv file has been edited appropriately to include the desired samples, it is a good idea to dryrun the pipeline to ensure that everything will work as desired. Dryrun can be run as follows:
bash <path to charlie> -w=<path to output dir> -m=dryrun\n
This will create the reference fasta and gtf file based on the selections made in the config.yaml. Hence, can take a few minutes to run.
In this example, 14743440 is the jobid returned by the slurm job scheduler on biowulf. This means that the job was successfully submitted, it will spawn off other subjobs which in-turn will be run and outputs will be moved to the results folder created inside the working directory supplied at command line. You can check the status of your queue of jobs in biowulf running:
% squeue -u `whoami`\n
output:
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)\n 14743440 ccr,norm circRNA kopardev PD 0:00 1 (None)\n
ST in the above results stands for Status and PD means Pending. The status will change from pending(PD) to running(R) to completed as jobs are run on the cluster.
Next, just sit tight until the pipeline finishes. You can keep monitoring the queue as shown above. If there are no jobs running on biowulf, then your pipeline has finished (or errored out!)
Once completed the output should something like this:
The above command also gives .err and .out log files which can give further insights on reasons for failure and changes required to be made for a successful run.
Expected output from the sample data is stored under .tests/expected_output.
"}]}
\ No newline at end of file
+{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"CHARLIE","text":"
Circrnas in Host And viRuses anaLysis pIpEline
See the website for detailed information, documentation, and examples: https://ccbr.github.io/CHARLIE/
"},{"location":"#table-of-contents","title":"Table of Contents","text":"
Snakemake workflow to detect, annotate and quantify (DAQ) host and viral circular RNAs.
Primirarily developed to run on BIOWULF
Reach out to Vishal Koparde for questions/comments/requests.
This circularRNA detection pipeline uses CIRCExplorer2, CIRI2 and many other tools in parallel to detect, quantify and annotate circRNAs. Here is a list of tools that can be run using CHARLIE:
circRNA Detection Tool Aligner(s) Run by default CIRCExplorer2 STAR1 Yes CIRI2 BWA1 Yes CIRCExplorer2 BWA1 Yes CLEAR STAR1 Yes DCC STAR2 Yes circRNAFinder STAR3 Yes find_circ Bowtie2 Yes MapSplice BWA2 No NCLScan NovoAlign No
Note: STAR1, STAR2, STAR3 denote 3 different sets of alignment parameters, etc.
Note: BWA1, BWA2 denote 2 different alignment parameters, etc.
charlie\n\n\n##########################################################################################\n\nWelcome to\n _______ __ __ _______ ______ ___ ___ _______\n| || | | || _ || _ | | | | | | |\n| || |_| || |_| || | || | | | | | ___|\n| || || || |_||_ | | | | | |___\n| _|| || || __ || |___ | | | ___|\n| |_ | _ || _ || | | || || | | |___\n|_______||__| |__||__| |__||___| |_||_______||___| |_______|\n\nC_ircrnas in H_ost A_nd vi_R_uses ana_L_ysis p_I_p_E_line\n\n##########################################################################################\n\nThis pipeline was built by CCBR (https://bioinformatics.ccr.cancer.gov/ccbr)\nPlease contact Vishal Koparde for comments/questions (vishal.koparde@nih.gov)\n\n##########################################################################################\n\nCHARLIE can be used to DAQ(Detect/Annotate/Quantify) circRNAs in hosts and viruses.\n\nHere is the list of hosts and viruses that are currently supported:\n\nHOSTS:\n * hg38 [Human]\n * mm39 [Mouse]\n\nADDITIVES:\n * ERCC [External RNA Control Consortium sequences]\n * BAC16Insert [insert from rKSHV.219-derived BAC clone of the full-length KSHV genome]\n\nVIRUSES:\n * NC_007605.1 [Human gammaherpesvirus 4 (Epstein-Barr virus)]\n * NC_006273.2 [Human betaherpesvirus 5 (Cytomegalovirus )]\n * NC_001664.4 [Human betaherpesvirus 6A (HHV-6A)]\n * NC_000898.1 [Human betaherpesvirus 6B (HHV-6B)]\n * NC_001716.2 [Human betaherpesvirus 7 (HHV-7)]\n * NC_009333.1 [Human gammaherpesvirus 8 (KSHV)]\n * NC_045512.2 [Severe acute respiratory syndrome(SARS)-related coronavirus]\n * MN485971.1 [HIV from Belgium]\n * NC_001806.2 [Human alphaherpesvirus 1 (Herpes simplex virus type 1)](strain 17) (HSV-1)]\n * KT899744.1 [HSV-1 strain KOS]\n * MH636806.1 [MHV68 (Murine herpesvirus 68 strain WUMS)]\n\n##########################################################################################\n\nUSAGE:\n charlie -w/--workdir=<WORKDIR> -m/--runmode=<RUNMODE>\n\nRequired Arguments:\n1. WORKDIR : [Type: String]: Absolute or relative path to the output folder with write permissions.\n\n2. RUNMODE : [Type: String] Valid options:\n * init : initialize workdir\n * dryrun : dry run snakemake to generate DAG\n * run : run with slurm\n * runlocal : run without submitting to sbatch\n ADVANCED RUNMODES (use with caution!!)\n * unlock : unlock WORKDIR if locked by snakemake NEVER UNLOCK WORKDIR WHERE PIPELINE IS CURRENTLY RUNNING!\n * reconfig : recreate config file in WORKDIR (debugging option) EDITS TO config.yaml WILL BE LOST!\n * reset : DELETE workdir dir and re-init it (debugging option) EDITS TO ALL FILES IN WORKDIR WILL BE LOST!\n * printbinds: print singularity binds (paths)\n * local : same as runlocal\n\nOptional Arguments:\n\n--singcache|-c : singularity cache directory. Default is `/data/${USER}/.singularity` if available, or falls back to `${WORKDIR}/.singularity`. Use this flag to specify a different singularity cache directory.\n--host|-g : supply host at command line. hg38 or mm39. (--runmode=init only)\n--additives|-a : supply comma-separated list of additives at command line. ERCC or BAC16Insert or both (--runmode=init only)\n--viruses|-v : supply comma-separated list of viruses at command line (--runmode=init only)\n--manifest|-s : absolute path to samples.tsv. This will be copied to output folder (--runmode=init only)\n--changegrp|-z : change group to \"Ziegelbauer_lab\" before running anything. Biowulf-only. Useful for correctly setting permissions.\n--help|-h : print this help\n\n\nExample commands:\n charlie -w=/my/output/folder -m=init\n charlie -w=/my/output/folder -m=dryrun\n charlie -w=/my/output/folder -m=run\n\n##########################################################################################\n\nVersionInfo:\n python : 3\n snakemake : 7\n pipeline_home : /gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/CHARLIE/.v0.11.1\n git commit/tag : 613fb617f1ed426fb8900f98e599ca0497a67cc0 v0.11.0-49-g613fb61\n\n##########################################################################################\n
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Test data (1 paired-end subsample and 1 single-end subsample) have been including under the /data/CCBR_Pipeliner/testdata/circRNA/human folder. After running in -m=init, samples.tsv should be edited to point the copies of the above mentioned samples with the column headers:
sampleName
path_to_R1_fastq
path_to_R2_fastq
Column path_to_R2_fastq will be blank in case of single-end samples.
After editing samples.tsv, dry run should be run:
bash <path to charlie> -w=<path to output dir> -m=dryrun\n
This will create the reference fasta and gtf file based on the selections made in the config.yaml.
"},{"location":"#run","title":"Run","text":"
If -m=dryrun was successful, then simply do -m=run. The output will look something like this
Expected output from the sample data is stored under .tests/expected_output.
More details about running test data can be found here.
DISCLAIMER:
CHARLIE is built to be run only on BIOWULF. A newer HPC-agnostic version of CHARLIE is planned for 2024.
"},{"location":"CHANGELOG/","title":"Changelog","text":""},{"location":"CHANGELOG/#charlie-development-version","title":"CHARLIE development version","text":"
CHARLIE was falsely throwing a file permissions error for tempdir values containing bash variables. (#118, @kelly-sovacool)
Singularity bind paths were not being set properly. (#119, @kelly-sovacool)
Update docker containers to set $PYTHONPATH. (#119, #125, @kelly-sovacool)
Otherwise, this environment variable can be carried over and cause package conflicts when singularity is not run with -C.
Also use python -E to ensure the $PYTHONPATH is not carried over. (#129, @kelly-sovacool)
Fix reconfig to correctly replace variables in the config file. (#121, @kelly-sovacool)
Prevent using excessive memory when copying reference files. (#126, @kelly-sovacool)
Fix missing output files due to file system latency and use real (absolute) paths where possible. (#130, @kelly-sovacool)
Update documentation to reflect biowulf usage and improved test dataset. (#132, @kelly-sovacool)
Major updates to convert CHARLIE from a biowulf-specific to a platform-agnostic pipeline (#102, @kelly-sovacool):
All rules now use containers instead of envmodules.
Default config and cluster config files are provided for use on biowulf and FRCE.
New entry TEMPDIR in the config file sets the temporary directory location for rules that require transient storage.
New --singcache argument to provide a singularity cache dir location. The singularity cache dir is automatically set inside /data/$USER/ or $WORKDIR/ if --singcache is not provided.
Minor documentation improvements. (#114, @kelly-sovacool)
create sense and anti-sense BSJ BAMs and BW for each reference (host+viruses)
find reads which contribute to CIRI BSJs but not on the STAR list of BSJ reads, see if they contribute to novel (not called by STAR) BSJs and append novel BSJs to customBSJ list
circ_quant replaces clear_quant in the CLEAR rule. In order words, we are reusing the STAR alignment file and the circExplorer2 output file for running CLEAR. No need to run HISAT2 and TopHat (fusion-search with Bowtie1). This is much quicker.
Using picard to estimate duplicates using MarkDuplicates
Generating a per-run multiqc HTML report
Using eulerr R package to generate CIRI-CircExplorer circRNA Venn diagrams and include them in the mulitqc report
Gather per job cluster metadata like queue time, run time, job state etc. Stats are compiled in HPC_summary file
CLEAR pipeline quant.txt file is annotated for known circRNAs
WORKDIR can now be a relative path
bam2bw conversion fix for BSJ and spliced_reads. Issue closed!
"},{"location":"contributing/","title":"Contributing to CHARLIE","text":""},{"location":"contributing/#proposing-changes-with-issues","title":"Proposing changes with issues","text":"
If you want to make a change, it's a good idea to first open an issue and make sure someone from the team agrees that it\u2019s needed.
If you've decided to work on an issue, assign yourself to the issue so others will know you're working on it.
We use GitHub Flow as our collaboration process. Follow the steps below for detailed instructions on contributing changes to CHARLIE.
"},{"location":"contributing/#clone-the-repo","title":"Clone the repo","text":"
If you are a member of CCBR, you can clone this repository to your computer or development environment. Otherwise, you will first need to fork the repo and clone your fork. You only need to do this step once.
"},{"location":"contributing/#if-this-is-your-first-time-cloning-the-repo-you-may-need-to-install-dependencies","title":"If this is your first time cloning the repo, you may need to install dependencies","text":"
Install snakemake and singularity or docker if needed (biowulf already has these available as modules).
Install the python dependencies with pip
pip install .\n
If you're developing on biowulf, you can use our shared conda environment which already has these dependencies installed
Install pre-commit if you don't already have it. Then from the repo's root directory, run
pre-commit install\n
This will install the repo's pre-commit hooks. You'll only need to do this step the first time you clone the repo.
"},{"location":"contributing/#create-a-branch","title":"Create a branch","text":"
Create a Git branch for your pull request (PR). Give the branch a descriptive name for the changes you will make, such as iss-10 if it is for a specific issue.
# create a new branch and switch to it\ngit branch iss-10\ngit switch iss-10\n
Switched to a new branch 'iss-10'
"},{"location":"contributing/#make-your-changes","title":"Make your changes","text":"
Edit the code, write and run tests, and update the documentation as needed.
Changes to the python package code will also need unit tests to demonstrate that the changes work as intended. We write unit tests with pytest and store them in the tests/ subdirectory. Run the tests with python -m pytest.
If you change the workflow, please run the workflow with the test profile and make sure your new feature or bug fix works as intended.
If you have added a new feature or changed the API of an existing feature, you will likely need to update the documentation in docs/.
"},{"location":"contributing/#commit-and-push-your-changes","title":"Commit and push your changes","text":"
If you're not sure how often you should commit or what your commits should consist of, we recommend following the \"atomic commits\" principle where each commit contains one new feature, fix, or task. Learn more about atomic commits here: https://www.freshconsulting.com/insights/blog/atomic-commits/
First, add the files that you changed to the staging area:
git add path/to/changed/files/\n
Then make the commit. Your commit message should follow the Conventional Commits specification. Briefly, each commit should start with one of the approved types such as feat, fix, docs, etc. followed by a description of the commit. Take a look at the Conventional Commits specification for more detailed information about how to write commit messages.
git commit -m 'feat: create function for awesome feature'\n
pre-commit will enforce that your commit message and the code changes are styled correctly and will attempt to make corrections if needed.
Check for added large files..............................................Passed Fix End of Files.........................................................Passed Trim Trailing Whitespace.................................................Failed
hook id: trailing-whitespace
exit code: 1
files were modified by this hook > Fixing path/to/changed/files/file.txt > codespell................................................................Passed style-files..........................................(no files to check)Skipped readme-rmd-rendered..................................(no files to check)Skipped use-tidy-description.................................(no files to check)Skipped
In the example above, one of the hooks modified a file in the proposed commit, so the pre-commit check failed. You can run git diff to see the changes that pre-commit made and git status to see which files were modified. To proceed with the commit, re-add the modified file(s) and re-run the commit command:
git add path/to/changed/files/file.txt\ngit commit -m 'feat: create function for awesome feature'\n
This time, all the hooks either passed or were skipped (e.g. hooks that only run on R code will not run if no R files were committed). When the pre-commit check is successful, the usual commit success message will appear after the pre-commit messages showing that the commit was created.
Check for added large files..............................................Passed Fix End of Files.........................................................Passed Trim Trailing Whitespace.................................................Passed codespell................................................................Passed style-files..........................................(no files to check)Skipped readme-rmd-rendered..................................(no files to check)Skipped use-tidy-description.................................(no files to check)Skipped Conventional Commit......................................................Passed > [iss-10 9ff256e] feat: create function for awesome feature 1 file changed, 22 insertions(+), 3 deletions(-)
Finally, push your changes to GitHub:
git push\n
If this is the first time you are pushing this branch, you may have to explicitly set the upstream branch:
git push --set-upstream origin iss-10\n
Enumerating objects: 7, done. Counting objects: 100% (7/7), done. Delta compression using up to 10 threads Compressing objects: 100% (4/4), done. Writing objects: 100% (4/4), 648 bytes | 648.00 KiB/s, done. Total 4 (delta 3), reused 0 (delta 0), pack-reused 0 remote: Resolving deltas: 100% (3/3), completed with 3 local objects. remote: remote: Create a pull request for 'iss-10' on GitHub by visiting: remote: https://github.com/CCBR/CHARLIE/pull/new/iss-10 remote: To https://github.com/CCBR/CHARLIE > > [new branch] iss-10 -> iss-10 branch 'iss-10' set up to track 'origin/iss-10'.
We recommend pushing your commits often so they will be backed up on GitHub. You can view the files in your branch on GitHub at https://github.com/CCBR/CHARLIE/tree/<your-branch-name> (replace <your-branch-name> with the actual name of your branch).
"},{"location":"contributing/#create-the-pr","title":"Create the PR","text":"
Once your branch is ready, create a PR on GitHub: https://github.com/CCBR/CHARLIE/pull/new/
Select the branch you just pushed:
Edit the PR title and description. The title should briefly describe the change. Follow the comments in the template to fill out the body of the PR, and you can delete the comments (everything between <!-- and -->) as you go. Be sure to fill out the checklist, checking off items as you complete them or striking through any irrelevant items. When you're ready, click 'Create pull request' to open it.
Optionally, you can mark the PR as a draft if you're not yet ready for it to be reviewed, then change it later when you're ready.
"},{"location":"contributing/#wait-for-a-maintainer-to-review-your-pr","title":"Wait for a maintainer to review your PR","text":"
We will do our best to follow the tidyverse code review principles: https://code-review.tidyverse.org/. The reviewer may suggest that you make changes before accepting your PR in order to improve the code quality or style. If that's the case, continue to make changes in your branch and push them to GitHub, and they will appear in the PR.
Once the PR is approved, the maintainer will merge it and the issue(s) the PR links will close automatically. Congratulations and thank you for your contribution!
"},{"location":"contributing/#after-your-pr-has-been-merged","title":"After your PR has been merged","text":"
After your PR has been merged, update your local clone of the repo by switching to the main branch and pulling the latest changes:
git checkout main\ngit pull\n
It's a good idea to run git pull before creating a new branch so it will start from the most recent commits in main.
"},{"location":"contributing/#helpful-links-for-more-information","title":"Helpful links for more information","text":"
CHARLIE was originally developed to run on biowulf, but it can run on other computing platforms too. There are a few additional steps to configure CHARLIE.
TODO
Clone CHARLIE.
git clone https://github.com/CCBR/charlie\n
Initialize your project working directory.
\n
Create a directory of reference files.
Edit your project's config file.
If you are using a SLURM job scheduler, edit cluster.json and submit_script.sbatch.
hg38 and mm39 genome builds are chosen to represent hosts. Ribosomal sequences (45S, 5S) are downloaded from NCBI. hg38 and mm39 were masked for rRNA sequence and 45S and 5S sequences from NCBI are appended as separate chromosomes. The following viral sequences were appended to the rRNA masked hg38 reference:
Location: The entire resource bundle is available at /data/CCBR_Pipeliner/db/PipeDB/charlie/fastas_gtfs on BIOWULF. This location also have additional bash scripts required for aggregating annotations and building indices required by different aligners.
When -m=dryrun is run for the first time after initialization (-m=init), the appropriate host+additives+viruses fasta and gtf files are created on the fly, which are then used to build aligner reference indexes automatically.
##########################################################################################\n\nWelcome to charlie(v0.10.0-dev)\n _______ __ __ _______ ______ ___ ___ _______\n| || | | || _ || _ | | | | | | |\n| || |_| || |_| || | || | | | | | ___|\n| || || || |_||_ | | | | | |___\n| _|| || || __ || |___ | | | ___|\n| |_ | _ || _ || | | || || | | |___\n|_______||__| |__||__| |__||___| |_||_______||___| |_______|\n\nC_ircrnas in H_ost A_nd vi_R_uses ana_L_ysis p_I_p_E_line\n\n##########################################################################################\n\nThis pipeline was built by CCBR (https://bioinformatics.ccr.cancer.gov/ccbr)\nPlease contact Vishal Koparde for comments/questions (vishal.koparde@nih.gov)\n\n##########################################################################################\n\nCHARLIE can be used to DAQ (Detect/Annotate/Quantify) circRNAs in hosts and viruses.\n\nHere is the list of hosts and viruses that are currently supported:\n\nHOSTS:\n * hg38 [Human]\n * mm39 [Mouse]\n\nADDITIVES:\n * ERCC [External RNA Control Consortium sequences]\n * BAC16Insert [insert from rKSHV.219-derived BAC clone of the full-length KSHV genome]\n\nVIRUSES:\n * NC_007605.1 [Human gammaherpesvirus 4 (Epstein-Barr virus)]\n * NC_006273.2 [Human betaherpesvirus 5 (Cytomegalovirus )]\n * NC_001664.4 [Human betaherpesvirus 6A (HHV-6A)]\n * NC_000898.1 [Human betaherpesvirus 6B (HHV-6B)]\n * NC_001716.2 [Human betaherpesvirus 7 (HHV-7)]\n * NC_009333.1 [Human gammaherpesvirus 8 (KSHV)]\n * NC_045512.2 [Severe acute respiratory syndrome(SARS)-related coronavirus]\n * MN485971.1 [HIV from Belgium]\n * NC_001806.2 [Human alphaherpesvirus 1 (Herpes simplex virus type 1)](strain 17) (HSV-1)]\n * KT899744.1 [HSV-1 strain KOS]\n * MH636806.1 [MHV68 (Murine herpesvirus 68 strain WUMS)]\n\n##########################################################################################\n\nUSAGE:\n charlie -w/--workdir=<WORKDIR> -m/--runmode=<RUNMODE>\n\nRequired Arguments:\n1. WORKDIR : [Type: String]: Absolute or relative path to the output folder with write permissions.\n\n2. RUNMODE : [Type: String] Valid options:\n * init : initialize workdir\n * dryrun : dry run snakemake to generate DAG\n * run : run with slurm\n * runlocal : run without submitting to sbatch\n ADVANCED RUNMODES (use with caution!!)\n * unlock : unlock WORKDIR if locked by snakemake NEVER UNLOCK WORKDIR WHERE PIPELINE IS CURRENTLY RUNNING!\n * reconfig : recreate config file in WORKDIR (debugging option) EDITS TO config.yaml WILL BE LOST!\n * reset : DELETE workdir dir and re-init it (debugging option) EDITS TO ALL FILES IN WORKDIR WILL BE LOST!\n * printbinds: print singularity binds (paths)\n * local : same as runlocal\n\nOptional Arguments:\n\n--singcache|-c : singularity cache directory. Default is `/data/${USER}/.singularity` if available, or falls back to `${WORKDIR}/.singularity`. Use this flag to specify a different singularity cache directory.\n--host|-g : supply host at command line. hg38 or mm39. (--runmode=init only)\n--additives|-a : supply comma-separated list of additives at command line. ERCC or BAC16Insert or both (--runmode=init only)\n--viruses|-v : supply comma-separated list of viruses at command line (--runmode=init only)\n--manifest|-s : absolute path to samples.tsv. This will be copied to output folder (--runmode=init only)\n--changegrp|-z : change group to \"Ziegelbauer_lab\" before running anything. Biowulf-only. Useful for correctly setting permissions.\n--help|-h : print this help\n\n\nExample commands:\n charlie -w=/my/output/folder -m=init\n charlie -w=/my/output/folder -m=dryrun\n charlie -w=/my/output/folder -m=run\n\n##########################################################################################\n\nVersionInfo:\n python : 3\n snakemake : 7\n pipeline_home : /gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/CHARLIE/.v0.11.1\n git commit/tag : 613fb617f1ed426fb8900f98e599ca0497a67cc0 v0.11.0-49-g613fb61\n\n##########################################################################################\n
NOTE: You can replace v0.10.0 in the above command with the latest version to use a newer version. run_circrna_daq.sh was called test.sh in versions older than v0.4.0.
To initial the working directory run:
charlie -w=<path to output dir> -m=init\n
This assumes that <path to output dir> does not exist before running the above command and is at a location where write permissions are available.
The above command creates <path to output dir> folder and creates 2 subfolders logs and stats inside that folder along with config.yaml and samples.tsv files.
Once the samples.tsv file has been edited appropriately to include the desired samples, it is a good idea to dryrun the pipeline to ensure that everything will work as desired. Dryrun can be run as follows:
charlie -w=<path to output dir> -m=dryrun\n
This will create the reference fasta and gtf file based on the selections made in the config.yaml. Hence, can take a few minutes to run.
In this example, 14743440 is the jobid returned by the slurm job scheduler on biowulf. This means that the job was successfully submitted, it will spawn off other subjobs which in-turn will be run and outputs will be moved to the results folder created inside the working directory supplied at command line. You can check the status of your queue of jobs in biowulf running:
squeue -u `whoami`\n
output:
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)\n 14743440 ccr,norm circRNA kopardev PD 0:00 1 (None)\n
ST in the above results stands for Status and PD means Pending. The status will change from pending(PD) to running(R) to completed as jobs are run on the cluster.
Next, just sit tight until the pipeline finishes. You can keep monitoring the queue as shown above. If there are no jobs running on biowulf, then your pipeline has finished (or errored out!)
Once completed the output should something like this:
The above command also gives .err and .out log files which can give further insights on reasons for failure and changes required to be made for a successful run.
The exacts versions listed here may changed as newer versions are added. Also, the dev version is pointing to the most recent untagged version of the pipeline (use at own risk!)
NOTE: You can replace v0.10.0 in the above command with the latest version to use a newer version. run_circrna_daq.sh was called test.sh in versions older than v0.4.0.
To initial the working directory run:
%bash<pathtocharlie>-w=<pathtooutputdir>-m=init
-
This assumes that <path to output dir> does not exist before running the above command and is at a location where write permissions are available.
The above command creates <path to output dir> folder and creates 2 subfolders logs and stats inside that folder along with config.yaml and samples.tsv files.
Once the samples.tsv file has been edited appropriately to include the desired samples, it is a good idea to dryrun the pipeline to ensure that everything will work as desired. Dryrun can be run as follows:
Upon verifying that dryrun is successful. You can then submit the job to the cluster using the following command:
bash<pathtocharlie>-w=<pathtooutputdir>-m=run
+
NOTE: You can replace v0.10.0 in the above command with the latest version to use a newer version. run_circrna_daq.sh was called test.sh in versions older than v0.4.0.
To initial the working directory run:
charlie-w=<pathtooutputdir>-m=init
+
This assumes that <path to output dir> does not exist before running the above command and is at a location where write permissions are available.
The above command creates <path to output dir> folder and creates 2 subfolders logs and stats inside that folder along with config.yaml and samples.tsv files.
Once the samples.tsv file has been edited appropriately to include the desired samples, it is a good idea to dryrun the pipeline to ensure that everything will work as desired. Dryrun can be run as follows:
charlie-w=<pathtooutputdir>-m=dryrun
+
This will create the reference fasta and gtf file based on the selections made in the config.yaml. Hence, can take a few minutes to run.
Upon verifying that dryrun is successful. You can then submit the job to the cluster using the following command:
charlie-w=<pathtooutputdir>-m=run
which will produce something like this:
... ... skipping ~1000 lines
...
...
@@ -159,10 +147,10 @@
This was a dry-run (flag -n). The order of jobs does not reflect the order of execution.
Running...
14743440
-
In this example, 14743440 is the jobid returned by the slurm job scheduler on biowulf. This means that the job was successfully submitted, it will spawn off other subjobs which in-turn will be run and outputs will be moved to the results folder created inside the working directory supplied at command line. You can check the status of your queue of jobs in biowulf running:
%squeue-u`whoami`
+
In this example, 14743440 is the jobid returned by the slurm job scheduler on biowulf. This means that the job was successfully submitted, it will spawn off other subjobs which in-turn will be run and outputs will be moved to the results folder created inside the working directory supplied at command line. You can check the status of your queue of jobs in biowulf running:
ST in the above results stands for Status and PD means Pending. The status will change from pending(PD) to running(R) to completed as jobs are run on the cluster.
Next, just sit tight until the pipeline finishes. You can keep monitoring the queue as shown above. If there are no jobs running on biowulf, then your pipeline has finished (or errored out!)
Once completed the output should something like this:
%tree<pathtooutputdir>
+
ST in the above results stands for Status and PD means Pending. The status will change from pending(PD) to running(R) to completed as jobs are run on the cluster.
Next, just sit tight until the pipeline finishes. You can keep monitoring the queue as shown above. If there are no jobs running on biowulf, then your pipeline has finished (or errored out!)
Once completed the output should something like this:
The above command also gives .err and .out log files which can give further insights on reasons for failure and changes required to be made for a successful run.
The main output file is results/circRNA_master_counts.tsv.gz. Here are the top 3 tiles from an example output:
Column_number
Column_title
Example_1
Example_2
Example_3
1
chrom
GL000220.1
GL000220.1
GL000220.1
2
start
107635
112482
118578
3
end
151634
156427
118759
4
circExplorer_strand
-1
-1
-1
5
circExplorer_bwa_strand
.
.
.
6
ciri_strand
-1
-1
-1
7
dcc_strand
-1
-1
-1
8
circrnafinder_strand
-1
-1
-1
9
flankingsites+
CC##GC
GC##CC
CC##GC
10
flankingsites-
GG##GC
GC##GG
GG##GC
11
sample_name
GI1_N
GI1_N
GI1_N
12
ntools
1
1
1
13
HQ
N
N
N
14
circExplorer_read_count
-1
-1
-1
15
circExplorer_found_BSJcounts
-1
-1
-1
16
circExplorerfound_linear_BSJ+_counts
-1
-1
-1
17
circExplorerfound_linear_spliced_BSJ+_counts
-1
-1
-1
18
circExplorerfound_linear_BSJ-_counts
-1
-1
-1
19
circExplorerfound_linear_spliced_BSJ-_counts
-1
-1
-1
20
circExplorerfound_linear_BSJ._counts
-1
-1
-1
21
circExplorerfound_linear_spliced_BSJ._counts
-1
-1
-1
22
ciri_read_count
-1
-1
-1
23
ciri_linear_read_count
-1
-1
-1
24
circExplorer_bwa_read_count
3
7
3
25
dcc_read_count
-1
-1
-1
26
dcc_linear_read_count
-1
-1
-1
27
circrnafinder_read_count
-1
-1
-1
28
hqcounts
1
1
1
29
nonhqcounts
0
0
0
30
circExplorer_annotation
Unknown
Unknown
Unknown
31
ciri_annotation
Unknown
Unknown
Unknown
32
circExplorer_bwa_annotation
novel
novel
novel
33
dcc_gene
Unknown
Unknown
Unknown
34
dcc_junction_type
Unknown
Unknown
Unknown
35
dcc_annotation
Unknown
Unknown
Unknown
Expected output from the sample data is stored under .tests/expected_output.
\ No newline at end of file
+
The above command also gives .err and .out log files which can give further insights on reasons for failure and changes required to be made for a successful run.