-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathhow-to-set-up-a-bioinformatics-environment.html
167 lines (141 loc) · 11 KB
/
how-to-set-up-a-bioinformatics-environment.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
<!doctype html>
<html lang="en">
<head>
<!-- Required meta tags -->
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<title> How-to: Set up a Bioinformatics Environment | Computational Approaches to Biological Problems
</title>
<link rel="canonical" href="./how-to-set-up-a-bioinformatics-environment.html">
<link rel="stylesheet" href="./theme/css/bootstrap.min.css">
<link rel="stylesheet" href="./theme/css/fontawesome.min.css">
<link rel="stylesheet" href="./theme/css/pygments/default.min.css">
<link rel="stylesheet" href="./theme/css/theme.css">
<meta name="description" content="Time and again I've set up a new Linux environment (for various reasons) and I've found that while there are innumerable ways to go about doing this, I have a few standard steps that I always follow. And the one program to rule them all is: conda !! While most probably …">
</head>
<body>
<header class="header">
<div class="container">
<div class="row">
<div class="col-sm-4">
<a href="./">
<img class="img-fluid rounded" src=./images/LucasBoatwrightHeadShot-161x215.jpeg alt="Computational Approaches to Biological Problems">
</a>
</div>
<div class="col-sm-8">
<h1 class="title"><a href="./">Computational Approaches to Biological Problems</a></h1>
<p class="text-muted">and other biological inklings</p>
<ul class="list-inline">
<li class="list-inline-item"><a href="./index.html">Home</a></li>
<li class="list-inline-item"><a href="./archives.html">posts</a></li>
<li class="list-inline-item"><a href="https://github.com/jlboat">GitHub</a></li>
<li class="list-inline-item"><a href="https://scholar.google.com/citations?hl=en&user=q5G2bJEAAAAJ&view_op=list_works&gmla=AJsN-F4gfKvkjBIT6Nv8Xy97IIUHuhFi9jF8x5ntZveyPGo_xO2ws-_5NjCw-_9z9CD2mWJnlbL1DeMd-0l-IUNYPsQ1NR4wu8s_l6YjZYlsb9mTQ1d8TX8">Pubs</a></li>
<li class="list-inline-item text-muted">|</li>
<li class="list-inline-item"><a href="./pages/about-me.html">About me</a></li>
</ul>
</div>
</div> </div>
</header>
<div class="main">
<div class="container">
<h1> How-to: Set up a Bioinformatics Environment
</h1>
<hr>
<article class="article">
<header>
<ul class="list-inline">
<li class="list-inline-item text-muted" title="2018-09-28T00:00:00-05:00">
<i class="fas fa-clock"></i>
Fri 28 September 2018
</li>
<li class="list-inline-item">
<i class="fas fa-folder-open"></i>
<a href="./category/how-to.html">How-to</a>
</li>
<li class="list-inline-item">
<i class="fas fa-user"></i>
<a href="./author/j-lucas-boatwright.html">J. Lucas Boatwright</a> </li>
<li class="list-inline-item">
<i class="fas fa-tag"></i>
<a href="./tag/bioinformatics.html">#bioinformatics</a>, <a href="./tag/education.html">#education</a>, <a href="./tag/bash.html">#bash</a>, <a href="./tag/anaconda.html">#anaconda</a>, <a href="./tag/r.html">#R</a> </li>
</ul>
</header>
<div class="content">
<p>Time and again I've set up a new Linux environment (for various reasons) and I've found that while there are innumerable ways to go about doing this, I have a few standard steps that I always follow. </p>
<p>And the one program to rule them all is: <a href="https://conda.io/docs/user-guide/install/index.html">conda</a> !!</p>
<p>While most probably associate conda with python package management, conda goes well beyond that and serves as a general environment/package manager for scientific libraries.</p>
<p>As it says on conda's site:</p>
<div class="highlight"><pre><span></span><code><span class="err">A conda environment is a directory that contains a specific collection of conda packages that you have installed. For example, you may have one environment with NumPy 1.7 and its dependencies, and another environment with NumPy 1.6 for legacy testing. If you change one environment, your other environments are not affected. You can easily activate or deactivate environments, which is how you switch between them. You can also share your environment with someone by giving them a copy of your environment.yaml file.</span>
</code></pre></div>
<p>First, download the <a href="https://conda.io/docs/user-guide/getting-started.html">Anaconda installer</a> for your system (or select <a href="https://conda.io/miniconda.html">Mini-conda</a> if you need a smaller install) and follow your system-specific directions. Here, I will walk through the Linux install and setup.</p>
<div class="highlight"><pre><span></span><code>bash Anaconda-2.x.x-Linux-x86<span class="o">[</span>_64<span class="o">]</span>.sh
</code></pre></div>
<p>You'll want to add important channels. This essentially adds repositories. The order matters such that the later the channel is added the higher the priority (it looks in conda-forge first). Do note, however, that while adding these channels will increase the number of identified packages, it will also increase the time it takes to look through all of those packages for updates/dependency resolution/etc.</p>
<div class="highlight"><pre><span></span><code>conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
</code></pre></div>
<p>From here, it's easy to install desired packages.</p>
<div class="highlight"><pre><span></span><code>conda install samtools <span class="c1"># Note: a C-based package</span>
conda install pysam <span class="c1"># a python package</span>
conda install gffutils <span class="c1"># another python package</span>
</code></pre></div>
<p>Now, I don't recommend you install all of your packages to a single environment. For instance, I like to have one environment for python3, one for python2, one for R, etc.</p>
<p>As an example, let's create an R environment. Now, my by base environment is python3 (since it's my primary tool), but you may make R your primary and python a secondary. <em>Warning: Installing R in conda works great except if you already have R/R libraries installed the traditional way. It seems that the R libraries are the biggest issue and removing them allows the conda R to work properly.</em></p>
<div class="highlight"><pre><span></span><code>conda create -n mro_env r-essentials mro-base
</code></pre></div>
<p>You may want to check that you've properly set up the environment (or if you ever forget what you called it).</p>
<div class="highlight"><pre><span></span><code>conda env list
</code></pre></div>
<p>On my Linux system, I had to add one line to by .bashrc file so I could activate my environments. You can try activating without adding this line, and conda will tell you if you need it. Conda used to add a path export to your .bashrc, but that is no longer recommended.</p>
<div class="highlight"><pre><span></span><code><span class="nb">echo</span> <span class="s2">". /home/lboat/anaconda3/etc/profile.d/conda.sh"</span> >> ~/.bashrc
<span class="c1"># Now activate your new environment</span>
conda activate mro_env
</code></pre></div>
<p>This has changed the paths to all of the installed conda libraries such that you are now only looking at those programs/packages installed in this environment. This is very helpful if there are package conflicts or if a program requires an older version that you don't typically use.</p>
<p>From here, you can install any library you like (even ones you installed in a different environment). Since this is an example of my R environment, I'm going to install some Bioconductor packages.</p>
<div class="highlight"><pre><span></span><code><span class="c1"># Install Bioconductor -- if you set the channels above the channel parameter (-c) is optional</span>
<span class="c1"># NOTE: installing bioconductor this way actually caused me some problems</span>
<span class="c1"># read further to see a better way to install packages</span>
conda install -c bioconda bioconductor-biocinstaller
</code></pre></div>
<p>Now, install the R packages you want:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># conda install -c bioconda bioconductor-{package_name}</span>
conda install -c bioconda bioconductor-edgeR
</code></pre></div>
<p>As with other packages and programs, conda will identify dependencies and resolve versions to the best of its ability.</p>
<p>Now, in my case, the edgeR package had an outdated limma dependencies that prevented me from loading edgeR. This provides a great opportunity to describe an alternate way to install packages.</p>
<p>Most people work with RStudio if they work in R. It can also be installed using conda.</p>
<div class="highlight"><pre><span></span><code>conda install rstudio
rstudio
</code></pre></div>
<p>Now, you can run RStudio and install any package you like as you normally would (the library location specific to your conda installation should automatically be the default install location).</p>
<div class="highlight"><pre><span></span><code><span class="c1"># Whether that be from CRAN</span>
<span class="nf">install.packages</span><span class="p">(</span><span class="s">"ggplot2"</span><span class="p">)</span>
<span class="c1"># Or Bioconductor</span>
<span class="c1"># lib path included as an example</span>
<span class="nf">biocLite</span><span class="p">(</span><span class="s">"edgeR"</span><span class="p">,</span> <span class="n">lib</span><span class="o">=</span><span class="s">"~/anaconda3/envs/mro_env/lib/R/library"</span><span class="p">)</span>
</code></pre></div>
<p>Your environment path may be different so be sure to check. As such, conda is capable of managing just about all of my major dependencies, from C-libraries to python and R. As an added benefit, this can all be done on a computer where you don't have sudo/admin access. So, if you work at a university where you have to request package installation, conda can save you a ton of headache.</p>
<p>One last tip, you can also put your entire conda environment into a Singularity container and run it anywhere! See my post on <a href="./how-to-containerize-an-application-with-singularity">Singularity containers</a>.</p>
</div>
</article>
</div>
</div>
<footer class="footer">
<div class="container">
<div class="row">
<ul class="col-sm-6 list-inline">
<li class="list-inline-item"><a href="./authors.html">Authors</a></li>
<li class="list-inline-item"><a href="./archives.html">Archives</a></li>
<li class="list-inline-item"><a href="./categories.html">Categories</a></li>
<li class="list-inline-item"><a href="./tags.html">Tags</a></li>
</ul>
<p class="col-sm-6 text-sm-right text-muted">
Generated by <a href="https://github.com/getpelican/pelican" target="_blank">Pelican</a>
/ <a href="https://github.com/nairobilug/pelican-alchemy" target="_blank">✨</a>
</p>
</div> </div>
</footer>
</body>
</html>