forked from duty-machine/duty-machine
-
Notifications
You must be signed in to change notification settings - Fork 32
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
单细胞上游分析(四):cellranger 分析优化——利用 parallel 的并行批量处理
- Loading branch information
Showing
1 changed file
with
15 additions
and
0 deletions.
There are no files selected for viewing
15 changes: 15 additions & 0 deletions
15
docs/2023-10/单细胞上游分析_四__cellranger_分析优化__利用_parallel_的并行批量处理.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
--- | ||
title: "单细胞上游分析(四):cellranger 分析优化——利用 parallel 的并行批量处理" | ||
date: 2023-10-25T00:56:51Z | ||
draft: ["false"] | ||
tags: [ | ||
"fetched", | ||
"TOP生物信息" | ||
] | ||
categories: ["Acdemic"] | ||
--- | ||
单细胞上游分析(四):cellranger 分析优化——利用 parallel 的并行批量处理 by TOP生物信息 | ||
------ | ||
<div><section data-tool="mdnice编辑器" data-website="https://www.mdnice.com"><p data-tool="mdnice编辑器">之前,我们公共号发表了一篇使用cellranger进行上游处理的推文:</p><p data-tool="mdnice编辑器"><a href="https://mp.weixin.qq.com/s?__biz=MzkzMzE5NTM4NA==&mid=2247483708&idx=2&sn=4a9176132c3aa9c5ea531505ef9e6f96&chksm=c2517a2bf526f33d918215c9fe213c8462bee45780f3800fc28ed512245ff5c1fb2b48e2b6c6&scene=21&cur_album_id=1773112625992646659#wechat_redirect" data-linktype="2">单细胞分析实录(2): 使用Cell Ranger得到表达矩阵</a></p><p data-tool="mdnice编辑器">但是小编在做自己的项目时发现,如果样本数目比较多,那么一个一个样本处理起来就比较费时费力。</p><p data-tool="mdnice编辑器">因此,很多公众号都教大家使用循环来进行批量操作。但是,如果使用循环代码的话(例如使用cat+while的循环结构),样本都是一个个排队处理,不能实现同时、批量地处理数据。</p><p data-tool="mdnice编辑器">为了解决这一问题,基于我们公众号既往的推文,小编提出一个使用cellranger的批量处理10x下机的文件的方法,供大家参考。</p><h1 data-tool="mdnice编辑器"><span></span><span>1. 数据准备</span><span></span></h1><ol data-tool="mdnice编辑器"><li><section>将公司/公共数据库提供的原始数据整理按照样本名整理在1个文件夹中(我后文将用<code>parent_folder</code>来描述这个文件夹),可以使用<code>ln -s</code>或者直接<code>mv</code>到一个目录下。例如我这样整理的话,<code>Rawdata</code>文件夹即为<code>parent_folder</code></section></li></ol><figure data-tool="mdnice编辑器"><img data-ratio="0.3798767967145791" data-src="https://mmbiz.qpic.cn/mmbiz_png/WThoCmvVu2bZjNL5Pag1uX7mYxHeWuIhkibuqlzx4dxt6XKZH6sUmGMBKsZ2B1lm8v1icRF3JY55Kr7Igvy7ic7XA/640?wx_fmt=png" data-type="png" data-w="487" src="https://mmbiz.qpic.cn/mmbiz_png/WThoCmvVu2bZjNL5Pag1uX7mYxHeWuIhkibuqlzx4dxt6XKZH6sUmGMBKsZ2B1lm8v1icRF3JY55Kr7Igvy7ic7XA/640?wx_fmt=png"></figure><ol start="2" data-tool="mdnice编辑器"><li><section>下载cellranger和参考基因集</section></li></ol><ul data-tool="mdnice编辑器"><li><section><p>cellranger官网:https://www.10xgenomics.com/support/software/cell-ranger/downloads</p></section></li><li><section><p>参考基因集官网:https://www.10xgenomics.com/support/software/cell-ranger/downloads#reference-downloads</p></section></li></ul><figure data-tool="mdnice编辑器"><img data-ratio="0.6445283018867924" data-src="https://mmbiz.qpic.cn/mmbiz_png/WThoCmvVu2bZjNL5Pag1uX7mYxHeWuIh7nOaI4qOKmYC9JluNWTmWmOeWEaBEC4YibUku33pOGsicHy3iaZybaOTg/640?wx_fmt=png" data-type="png" data-w="1325" src="https://mmbiz.qpic.cn/mmbiz_png/WThoCmvVu2bZjNL5Pag1uX7mYxHeWuIh7nOaI4qOKmYC9JluNWTmWmOeWEaBEC4YibUku33pOGsicHy3iaZybaOTg/640?wx_fmt=png"><figcaption>cellranger官网下载页</figcaption></figure><pre data-tool="mdnice编辑器"><span></span><code><span>#</span><span><span>### 下载</span></span><br>wget -O cellranger-7.2.0.tar.gz "https://cf.10xgenomics.com/releases/cell-exp/cellranger-7.2.0.tar.gz?Expires=1698105429&Key-Pair-Id=APKAI7S6A5RYOXBWRPDA&Signature=GR~dMFWyxdKKFnkucEqmTzYmLLj9PjCPNSUfbOuOJTQyHp9UIF1PxIOiNmcl3ri2X54k5Bgz-k3WuZCu6JdxfsYttZjwTRO897G4MeDaIu53OjtLv390Oc4~dryNNFAPoww7Vf8~WjYiCeSe5Gmt6I8tWdbk569NCKWMt9Q3-Aic3lFbpxW7r2EkKq3xYXv9oPCzveBocpYTTHETJMKj4-XUKSuHII4l8Zj8e1HvfuInr2nRlQsYa4Gr1rcMKHFUqh41erWTmHwD1W8IK3m12mEpy9c9ZZIkbLCHuNkmNyGYS1wfMuR7yuI-Gvo6vejGNluSHLA1qioxTSq0kJZH1Q__"<br><span><br>#</span><span><span>### 解压</span></span><br>tar -zxvf cellranger-7.2.0.tar.gz<br></code></pre><p data-tool="mdnice编辑器">数据集下载:</p><pre data-tool="mdnice编辑器"><span></span><code><span>#</span><span><span>### 下载</span></span><br>wget "https://cf.10xgenomics.com/supp/cell-exp/refdata-gex-GRCh38-2020-A.tar.gz"<br><span><br>#</span><span><span>### 解压</span></span><br>tar -zxvf refdata-gex-GRCh38-2020-A.tar.gz<br></code></pre><ol start="3" data-tool="mdnice编辑器"><li><section>安装conda环境</section></li></ol><pre data-tool="mdnice编辑器"><span></span><code><span>#</span><span><span>### 安装并激活cellranger环境</span></span><br>conda create -n cellranger<br>conda activate cellranger<br></code></pre><p data-tool="mdnice编辑器">将cellranger添加到环境变量:</p><pre data-tool="mdnice编辑器"><span></span><code><span>#</span><span> 法一:进入到解压之后的文件夹 然后用bash命令激活 </span><br>cd ~/biosoft/cellranger/cellranger-7.2.0<br>ls<br><span>#</span><span> LICENSE bin builtwith.json cellranger external lib mro sourceme.bash sourceme.csh target_panels</span><br><span>#</span><span> 激活环境</span><br>source sourceme.bash<br><span><br>#</span><span> 法二:</span><br><span>#</span><span> 添加临时变量(每次使用都需要添加,推荐,防止软件冲突)</span><br><span>#</span><span> 注意,这里需要填写从根目录开始的路径</span><br>pwd<br>export PATH=/home/mailab003/biosoft/cellranger-7.2.0:$PATH<br><span><br>#</span><span> 永久添加环境变量 (一劳永逸,后续可能容易报错)</span><br>echo 'export PATH=/home/mailab003/biosoft/cellranger-7.2.0:$PATH' >> ~/.bashrc<br>source ~/.bashrc<br><span><br>#</span><span> 检查是否激活成功</span><br>cellranger -h<br></code></pre><ol start="4" data-tool="mdnice编辑器"><li><section>安装 parallel</section></li></ol><p data-tool="mdnice编辑器">管理员安装参考:<code>sudo apt-get install parallel</code></p><p data-tool="mdnice编辑器">非管理员安装参考:https://www.jianshu.com/p/936d0bc52ad5</p><h1 data-tool="mdnice编辑器"><span></span><span>2. 写脚本命令</span><span></span></h1><h2 data-tool="mdnice编辑器"><span></span><span>2.1 写单个脚本命令</span><span></span></h2><pre data-tool="mdnice编辑器"><span></span><code>vi run-cellranger.sh<br></code></pre><p data-tool="mdnice编辑器">注意,下面的 bin 和 refpath 要换成自己的路径</p><pre data-tool="mdnice编辑器"><span></span><code>bin={your_path}/cellranger-7.2.0/bin/cellranger<br>refpath={your_path}/customref-GRCh38-2020-A<br><span><br>#</span><span><span># 输入路径(即rawdata_folder)</span></span><br>fq_dir= $parent_folder<br><span><br>#</span><span><span># 输出路径,即你想把cellranger的定量结果输出到哪个文件夹</span></span><br>outdir= ${your_folder}<br><br>cd ${outdir}<br><span><br>$</span><span>bin count --id=<span>$1</span> \</span><br>--localcores=8 \<br>--transcriptome=$refpath \<br>--fastqs=$fq_dir/$1 \<br>--sample=$1 \<br>--nosecondary --no-bam <br><span>#</span><span> 如果想加快速度,指定--no-bam可以不输出bam文件,可以节省大量空间,但是这样会导致没法做RNA速率的分析。个人推荐加上no-bam参数先拿到定量数据,然后重新跑一次拿到bam数据</span><br></code></pre><p data-tool="mdnice编辑器">授予执行<span>权限</span>:</p><pre data-tool="mdnice编辑器"><span></span><code>chmod +x run-cellranger.sh<br></code></pre><h2 data-tool="mdnice编辑器"><span></span><span>2.2 生成所有样本名</span><span></span></h2><pre data-tool="mdnice编辑器"><span></span><code><span>#</span><span><span>### 指定 parent_folder 路径</span></span><br>parent_folder=<br><span><br>#</span><span><span>### 读取全部样本名</span></span><br>ls $parent_folder > sample_name.txt<br><span><br>#</span><span><span>### cat 查看一下样本名</span></span><br>cat sample_name.txt<br></code></pre><h2 data-tool="mdnice编辑器"><span></span><span>2.3 写并行运行的脚本</span><span></span></h2><pre data-tool="mdnice编辑器"><span></span><code>vi run-multiple-cellranger.sh<br></code></pre><pre data-tool="mdnice编辑器"><span></span><code><span>#</span><span>!/bin/bash</span><br><span><br>#</span><span> Specify the number of concurrent runs</span><br>concurrent_runs=4<br><span><br>#</span><span> Read the sample names from sample_name.txt</span><br>mapfile -t sample_names < sample_name.txt<br><span><br>#</span><span> Function to run cellranger <span>for</span> a single sample</span><br>run_cellranger() {<br> sample=$1<br> ./run-cellranger.sh $sample<br>}<br><br>export -f run_cellranger # Export the function to make it available to GNU Parallel<br><span><br>#</span><span> Run cellranger <span>for</span> each sample <span>in</span> parallel</span><br>parallel -j $concurrent_runs run_cellranger ::: "${sample_names[@]}"<br></code></pre><p data-tool="mdnice编辑器">授予执行权限:</p><pre data-tool="mdnice编辑器"><span></span><code>chmod +x run-multiple-cellranger.sh<br></code></pre><p data-tool="mdnice编辑器">简单解释一下,这个脚本会读取当前文件夹下一个叫做 sample_names 的 txt 文件,然后对这些文件分别执行 run-cellranger.sh 的定量,并且同时执行 4 个。</p><p data-tool="mdnice编辑器">注意,vi run-cellranger.sh、sample_name.txt、run-multiple-cellranger.sh 三个文件需要放在同一个文件夹下</p><h2 data-tool="mdnice编辑器"><span></span><span>2.4 运行命令</span><span></span></h2><pre data-tool="mdnice编辑器"><span></span><code>nohup bash run-multiple-cellranger.sh > run_cellranger.log &<br></code></pre><h1 data-tool="mdnice编辑器"><span></span><span>3. 拓展</span><span></span></h1><p data-tool="mdnice编辑器">虽然这里的批量运行命令我使用的是批量运行 cellranger,但是他同样适用于跑其他的任务,我们只需要改一下中间的 Function 就可以了,非常好用!</p><p data-tool="mdnice编辑器">好了,这一期的分享就到这里了,如果大家有疑问,欢迎大家发邮件给我([email protected])或者后台留言,我们下期再见!</p></section><p><br></p><p><mp-style-type data-value="3"></mp-style-type></p></div> | ||
<hr> | ||
<a href="https://mp.weixin.qq.com/s/LVRqPdyHPBLD6mzHx9WNvA",target="_blank" rel="noopener noreferrer">原文链接</a> |