forked from duty-machine/duty-machine
-
Notifications
You must be signed in to change notification settings - Fork 32
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
15 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
--- | ||
title: "一行代码就能发生信顶刊的GPTCelltype做单细胞亚群注释" | ||
date: 2024-07-15T09:31:11Z | ||
draft: ["false"] | ||
tags: [ | ||
"fetched", | ||
"生信技能树" | ||
] | ||
categories: ["Acdemic"] | ||
--- | ||
一行代码就能发生信顶刊的GPTCelltype做单细胞亚群注释 by 生信技能树 | ||
------ | ||
<div><section data-tool="mdnice编辑器" data-website="https://www.mdnice.com"><blockquote data-tool="mdnice编辑器"><p>在朋友圈刷到了一个(Published: 25 March 2024)的文章:《Assessing GPT-4 for cell type annotation in single-cell RNA-seq analysis》,题目短小精悍,就两个作者,关键是发表在《Nature Methods》杂志,算是生信顶刊了。</p></blockquote><h3 data-tool="mdnice编辑器"><span>GPTCelltype做单细胞亚群注释流程</span></h3><p data-tool="mdnice编辑器">其实<strong>文章所演示的研究者们开发的GPTCelltype做单细胞亚群注释这个过程,我们自己也是在chatGPT界面操作过,就是拿到了各个单细胞亚群的基因后的跟chatGPT普通的对话而已</strong>:</p><p><img data-galleryid="" data-imgfileid="100045489" data-ratio="1.4487704918032787" data-s="300,640" data-src="https://mmbiz.qpic.cn/mmbiz_png/cZNhZQ6j4wxsyv7mDrVVicquqlgBMiaIPDxybmmShRSpuuicRTmKLU2EJwIOsYXzeL9XibVl5faeAOl6NZYlTjCdiag/640?wx_fmt=png&from=appmsg" data-type="png" data-w="976" src="https://mmbiz.qpic.cn/mmbiz_png/cZNhZQ6j4wxsyv7mDrVVicquqlgBMiaIPDxybmmShRSpuuicRTmKLU2EJwIOsYXzeL9XibVl5faeAOl6NZYlTjCdiag/640?wx_fmt=png&from=appmsg"></p><figure data-tool="mdnice编辑器"><figcaption>chatGPT普通的对话</figcaption></figure><p data-tool="mdnice编辑器"><strong>基本上,如果是大家背诵了足够多的基因,是完全没有必要去借助于chatGPT这样的网页工具数据库资源的,人工即可命名。</strong>所以文章里面就对比了chatGPT辅助的单细胞亚群注释跟另外的主流的人工注释或者软件(比如 ScType and SingleR)自动化注释的区别:</p><p><img data-galleryid="" data-imgfileid="100045488" data-ratio="1.625287356321839" data-s="300,640" data-src="https://mmbiz.qpic.cn/mmbiz_png/cZNhZQ6j4wxsyv7mDrVVicquqlgBMiaIPDn7JNrbLLOd5B7O272xQqHalnVp2tn3QwicgrgtPcnDqF60QZOkVoIMg/640?wx_fmt=png&from=appmsg" data-type="png" data-w="870" src="https://mmbiz.qpic.cn/mmbiz_png/cZNhZQ6j4wxsyv7mDrVVicquqlgBMiaIPDn7JNrbLLOd5B7O272xQqHalnVp2tn3QwicgrgtPcnDqF60QZOkVoIMg/640?wx_fmt=png&from=appmsg"></p><figure data-tool="mdnice编辑器"><figcaption>多种注释策略的区别</figcaption></figure><p data-tool="mdnice编辑器">全文就是描述这3种策略的区别,来强调chatGPT辅助的单细胞亚群注释的优缺点而已。如果大家打开文献附带的GPTCelltype包代码地址:</p><ul data-tool="mdnice编辑器"><li><section>https://github.com/Winnie09/GPTCelltype</section></li></ul><p data-tool="mdnice编辑器">就可以很清晰的看到它的核心就一句话代码而已:<code>res <- gptcelltype(markers, model = 'gpt-4')</code></p><p data-tool="mdnice编辑器">要实现这句话,首先是需要安装和加载它的包:</p><pre data-tool="mdnice编辑器"><code>install.packages(<span>"openai"</span>)<br>remotes::install_github(<span>"Winnie09/GPTCelltype"</span>)<br><span># IMPORTANT! Assign your OpenAI API key. See Vignette for details</span><br>Sys.setenv(OPENAI_API_KEY = <span>'your_openai_API_key'</span>)<br><br><span># Load packages</span><br><span>library</span>(GPTCelltype)<br><span>library</span>(openai)<br><br></code></pre><p data-tool="mdnice编辑器">麻烦的是这个时候需要打开自己的r运行代码的时候设置一下自己的OPENAI_API_KEY值哦,需要登录chatGPT官网注册账号并且获取 OPENAI_API_KEY 即可。接下来就是自己的单细胞转录组数据的降维聚类分群后的FindAllMarkers拿到每个亚群的基因列表,就可以运行它的核心就一句话代码而已:<code>res <- gptcelltype(markers, model = 'gpt-4')</code>,把命名好的亚群给回去我们的seurat对象即可,如下所示:</p><pre data-tool="mdnice编辑器"><code><span># Assume you have already run the Seurat pipeline https://satijalab.org/seurat/</span><br><span># "obj" is the Seurat object; "markers" is the output from FindAllMarkers(obj)</span><br><span># Cell type annotation by GPT-4</span><br>res <- gptcelltype(markers, model = <span>'gpt-4'</span>)<br><br><span># Assign cell type annotation back to Seurat object</span><br>[email protected]$celltype <- as.factor(res[as.character(Idents(obj))])<br><br><span># Visualize cell type annotation on UMAP</span><br>DimPlot(obj,group.by=<span>'celltype'</span>)<br></code></pre><p data-tool="mdnice编辑器">而且如果你打开这个GPTCelltype包源代码,发现其实确实是很简单的,就是把在r里面的各个单细胞亚群的基因组合起来后通过api接口给到chatGPT而已,并不需要你去复制粘贴基因列表去chatGPT官网啦:</p><p><img data-galleryid="" data-imgfileid="100045487" data-ratio="0.25555555555555554" data-s="300,640" data-src="https://mmbiz.qpic.cn/mmbiz_png/cZNhZQ6j4wxsyv7mDrVVicquqlgBMiaIPDhJVSiaynWqgdqUCMOSFibZRDGwzEaRYBfTbybawvjzj9z0uOa6kjVjDw/640?wx_fmt=png&from=appmsg" data-type="png" data-w="1080" src="https://mmbiz.qpic.cn/mmbiz_png/cZNhZQ6j4wxsyv7mDrVVicquqlgBMiaIPDhJVSiaynWqgdqUCMOSFibZRDGwzEaRYBfTbybawvjzj9z0uOa6kjVjDw/640?wx_fmt=png&from=appmsg"></p><figure data-tool="mdnice编辑器"><figcaption><span>通过api接口</span></figcaption></figure><h3 data-tool="mdnice编辑器"><span><span>如果你也想试试看chatGPT,不妨看看:<a target="_blank" href="http://mp.weixin.qq.com/s?__biz=MzAxMDkxODM1Ng==&mid=2247526548&idx=1&sn=d5652f85c6dc53584380a909e37388b2&chksm=9b4b282fac3ca139f5b60a2aa83848bcfb029d3f6e1e11f143d3ece9db4658c86ed5ccfcfce6&scene=21#wechat_redirect" textvalue="让chatGPT做你的24小时生信教练" linktype="text" imgurl="" imgdata="null" data-itemshowtype="0" tab="innerlink" data-linktype="2"><span>让chatGPT做你的24小时生信教练</span></a></span></span><span><br></span><span>普通思路</span></h3><p data-tool="mdnice编辑器">如果你仔细看它这个思路最早是在预印本,Hou, W. and Ji, Z., 2023. Assessing GPT-4 for cell type annotation in single-cell RNA-seq analysis. bioRxiv, pp.2023-04, doi: https://doi.org/10.1101/2023.04.16.537094.,也就是说是 2023.04.16 ,但是我朋友圈的小伙伴早一个多月就测试了同样的思路:</p><p><img data-galleryid="" data-imgfileid="100045490" data-ratio="0.7814814814814814" data-s="300,640" data-src="https://mmbiz.qpic.cn/mmbiz_png/cZNhZQ6j4wxsyv7mDrVVicquqlgBMiaIPDa7lFhp38ZTq1eBmPaxccG6Dk3CT8HNZ0HzgFRiaO1SicVGxhYQfbnciaw/640?wx_fmt=png&from=appmsg" data-type="png" data-w="1080" src="https://mmbiz.qpic.cn/mmbiz_png/cZNhZQ6j4wxsyv7mDrVVicquqlgBMiaIPDa7lFhp38ZTq1eBmPaxccG6Dk3CT8HNZ0HzgFRiaO1SicVGxhYQfbnciaw/640?wx_fmt=png&from=appmsg"></p><figure data-tool="mdnice编辑器"><figcaption>早一个多月就测试了同样的思路</figcaption></figure><p data-tool="mdnice编辑器">也就是说,思路本身并不高级,但是人家整理成为了文章而且丢预印本啦,而且最终成功发表在了 《Nature Methods》杂志,而我们只能说是发一个朋友圈或者一个公众号推文。</p><p data-tool="mdnice编辑器">因为这个单细胞亚群的注释是刚需,常规的单细胞转录组降维聚类分群代码可以看 链接: https://pan.baidu.com/s/1bIBG9RciAzDhkTKKA7hEfQ?pwd=y4eh ,基本上大家只需要读入表达量矩阵文件到r里面就可以使用Seurat包做全部的流程,但是初始情况下只能说是拿到如下所示的图:</p><p><img data-galleryid="" data-imgfileid="100045491" data-ratio="1.0046296296296295" data-s="300,640" data-src="https://mmbiz.qpic.cn/mmbiz_png/cZNhZQ6j4wxsyv7mDrVVicquqlgBMiaIPD77mTgxwqzQ0gSsc607fxzpTPBfXRmOEdcmK1DGCiaMWXAC150vFV72A/640?wx_fmt=png&from=appmsg" data-type="png" data-w="1080" src="https://mmbiz.qpic.cn/mmbiz_png/cZNhZQ6j4wxsyv7mDrVVicquqlgBMiaIPD77mTgxwqzQ0gSsc607fxzpTPBfXRmOEdcmK1DGCiaMWXAC150vFV72A/640?wx_fmt=png&from=appmsg"></p><figure data-tool="mdnice编辑器"><figcaption>单细胞转录组降维聚类分群图</figcaption></figure><p data-tool="mdnice编辑器"><strong>因为算法只能说是给出来顺序编号的亚群名字,仍然是需要大家自己背诵那些每个单细胞亚群的特异性基因后才能是给出来生物学命名。而且我们背诵的基因是有限的,有些时候部分亚群会漏掉</strong>,我们也可以找出来每个顺序编号的亚群的top基因,而不仅仅是是特异性基因,比如:</p><pre data-tool="mdnice编辑器"><code>cluster0:CD3D,CD3G,CD3E,IL7R,IL32,LTB<br> cluster1:FGFBP2,GNLY,GZMB,GPR56,CX3CR1,SPON2<br> cluster2:XCL1,KLRC1,XCL2,IL2RB,CD160,TXK<br> cluster3:RP11-290F20.3,LILRB2,LYPD2,CDKN1C,LILRA5,FCN1<br> cluster4:TNF,DNAJB1,JUN,SLC4A10,HSPA1B,IL7R<br> cluster5:VCAN,S100A12,RP11-1143G9.4,EREG,THBS1,C19orf59<br> cluster6:CD1C,CLEC10A,CLEC9A,CD1E,FCER1A,IDO1<br> cluster7:CD5L,VCAM1,SDC3,FOLR2,MARCO,CXCL12<br> cluster8:FCGR3B,CMTM2,PTGS2,CXCR2,S100P,G0S2<br> cluster9:AFM,RP11-548L20.1,TENM2,SLC28A1,ACSM2B,ACSM2A<br> cluster10:PTPRB,STAB2,OIT3,ELTD1,CRHBP,AKAP12<br> cluster11:MS4A1,CD79A,LINC00926,BANK1,VPREB3,RP11-693J15.5<br> cluster12:LILRA4,MAP1A,SCT,LAMP5,IL3RA,LRRC26<br> cluster13:TNFRSF17,MZB1,RP11-731F5.2,RP11-16E12.2,DERL3,IGJ<br> cluster14:PTH1R,LAMA2,CCBE1,PDE1A,PRKG1,PDZRN4<br> cluster15:DCDC2,CHST4,SLC5A1,BICC1,CTNND2,KCNJ16<br> cluster16:TPSAB1,CPA3,CTSG,HPGDS,HDC,CMA1<br> cluster17:CTAG2,CYP1B1,FCAR,TMEM176A,SAP30,VSTM1<br></code></pre><p data-tool="mdnice编辑器">然后针对这些每个顺序编号的亚群的top基因去查询它们的单细胞亚群生物学名字!通常我们拿到了肿瘤相关的单细胞转录组的表达量矩阵后的第一层次降维聚类分群通常是:</p><ul data-tool="mdnice编辑器"><li><section>immune (CD45+,PTPRC),</section></li><li><section>epithelial/cancer (EpCAM+,EPCAM),</section></li><li><section>stromal (CD10+,MME,fibro or CD31+,PECAM1,endo)</section></li></ul><p data-tool="mdnice编辑器">参考我前面介绍过 <a href="https://mp.weixin.qq.com/s?__biz=MzI1Njk4ODE0MQ==&mid=2247488940&idx=1&sn=1cc8a8a74715087939b9721c0881775d&scene=21#wechat_redirect" data-linktype="2">CNS图表复现08—肿瘤单细胞数据第一次分群通用规则</a>,这3大单细胞亚群构成了肿瘤免疫微环境的复杂。绝大部分文章都是抓住免疫细胞亚群进行细分,包括淋巴系(T,B,NK细胞)和髓系(单核,树突,巨噬,粒细胞)的两大类作为第二次细分亚群。但是也有不少文章是抓住stromal 里面的 fibro 和endo进行细分,并且编造生物学故事的。如下所示:</p><p><img data-galleryid="" data-imgfileid="100045493" data-ratio="0.7240740740740741" data-s="300,640" data-src="https://mmbiz.qpic.cn/mmbiz_png/cZNhZQ6j4wxsyv7mDrVVicquqlgBMiaIPDe0aVHZKSicCEuLV13ewbEq3J0jKgR6lWs6xLV3QZ7es7SicDibWQTMsFA/640?wx_fmt=png&from=appmsg" data-type="png" data-w="1080" src="https://mmbiz.qpic.cn/mmbiz_png/cZNhZQ6j4wxsyv7mDrVVicquqlgBMiaIPDe0aVHZKSicCEuLV13ewbEq3J0jKgR6lWs6xLV3QZ7es7SicDibWQTMsFA/640?wx_fmt=png&from=appmsg"></p><figure data-tool="mdnice编辑器"><figcaption>出来生物学命名</figcaption></figure><p data-tool="mdnice编辑器">其实更麻烦的地方在于,我们第一层次降维聚类分群的时候往往是比较容易复现的:</p><p><img data-galleryid="" data-imgfileid="100045492" data-ratio="0.4305555555555556" data-s="300,640" data-src="https://mmbiz.qpic.cn/mmbiz_png/cZNhZQ6j4wxsyv7mDrVVicquqlgBMiaIPD7YJn2YGLq8MqlyNy0tRCnLAm7MJX6gbHf6lwpH4sM8IVtuicoNkmsPg/640?wx_fmt=png&from=appmsg" data-type="png" data-w="1080" src="https://mmbiz.qpic.cn/mmbiz_png/cZNhZQ6j4wxsyv7mDrVVicquqlgBMiaIPD7YJn2YGLq8MqlyNy0tRCnLAm7MJX6gbHf6lwpH4sM8IVtuicoNkmsPg/640?wx_fmt=png&from=appmsg"></p><figure data-tool="mdnice编辑器"><figcaption>比较容易复现的</figcaption></figure><p data-tool="mdnice编辑器">比如上面的顺序编号的0,1,2,4都是t或者nk细胞,但是它们在第一层次降维聚类分群的UMAP的二维坐标是很难有清晰界限的。也就是说 细分亚群的时候,其实是需要重新跑降维聚类分群了在每个子集细胞亚群里面:</p><p><img data-galleryid="" data-imgfileid="100045494" data-ratio="0.5435185185185185" data-s="300,640" data-src="https://mmbiz.qpic.cn/mmbiz_png/cZNhZQ6j4wxsyv7mDrVVicquqlgBMiaIPD9rWhWc4uYMnOompIRw9jqn7W17fmr0sJticXgfSYPXP2bFiaib62cficGQ/640?wx_fmt=png&from=appmsg" data-type="png" data-w="1080" src="https://mmbiz.qpic.cn/mmbiz_png/cZNhZQ6j4wxsyv7mDrVVicquqlgBMiaIPD9rWhWc4uYMnOompIRw9jqn7W17fmr0sJticXgfSYPXP2bFiaib62cficGQ/640?wx_fmt=png&from=appmsg"></p><figure data-tool="mdnice编辑器"><figcaption>重新跑降维聚类分群了在每个子集细胞亚群里面</figcaption></figure><h3 data-tool="mdnice编辑器"><span>生物信息学期刊分级整理</span></h3><p data-tool="mdnice编辑器">主要是依据好久之前看到了知乎上面的一个生物信息学期刊整理,总结整理的很好好,感觉确实看过的<strong>大部分文献都在这8种期刊里面</strong>,如下所示:<img data-imgfileid="100045495" data-ratio="1.8907407407407408" data-src="https://mmbiz.qpic.cn/mmbiz_png/cZNhZQ6j4wzXs0bgNGDLbicyyURS5rkMBwAGylldSib3VbCJQUFdC4FENtpfaK89u76icV1YQxkpxIlmQWk1tF0wg/640?wx_fmt=png&wxfrom=13&tp=wxpic" data-w="1080" src="https://mmbiz.qpic.cn/mmbiz_png/cZNhZQ6j4wzXs0bgNGDLbicyyURS5rkMBwAGylldSib3VbCJQUFdC4FENtpfaK89u76icV1YQxkpxIlmQWk1tF0wg/640?wx_fmt=png&wxfrom=13&tp=wxpic"></p><p data-tool="mdnice编辑器">主要是因为纯粹的生物信息学专业领域期刊的影响因子都不怎么高,所以大家才会去去综合性期刊投递生物信息学文章,传统生物信息学期刊主要是:</p><ul data-tool="mdnice编辑器"><li><section><strong>Bioinformatics</strong></section></li><li><section><strong>Briefings in Bioinformatics</strong></section></li><li><section><strong>PLOS Computational Biology</strong></section></li><li><section><strong>BMC Bioinformatics</strong></section></li><li><section>genome biology</section></li></ul><p data-tool="mdnice编辑器">如果是普通的生物信息学数据挖掘文章,就只能说是在3大出版社那边了,<strong>Frontiers 旗下的</strong> 以及 MDPI旗下的 </p></section><h4 data-tool="mdnice编辑器">文末友情宣传</h4><p data-tool="mdnice编辑器">强烈建议你推荐给身边的<strong>博士后以及年轻生物学PI</strong>,多一点数据认知,让他们的科研上一个台阶:</p><ul data-tool="mdnice编辑器"><li><section><a target="_blank" href="http://mp.weixin.qq.com/s?__biz=MzAxMDkxODM1Ng==&mid=2247529099&idx=1&sn=fe3be2d43a6284a36c15625c23dc9a3e&chksm=9b4b3230ac3cbb26b875bd0a294f24dfbd41a2b59996fbfe79087330d267c4ec70882683c3bd&scene=21#wechat_redirect" textvalue="生物信息学马拉松授课(买一得五)" linktype="text" imgurl="" imgdata="null" data-itemshowtype="0" tab="innerlink" data-linktype="2" hasload="1">生物信息学马拉松授课(买一得五)</a> ,你的生物信息学入门课</section></li><li><section><a target="_blank" href="http://mp.weixin.qq.com/s?__biz=MzAxMDkxODM1Ng==&mid=2247528924&idx=1&sn=d5d3e68e67b8000b322a4fef6b683bc2&chksm=9b4b3167ac3cb871527c6f2b2d141404fbe49021b54656cb3d45eeb0f7dfca2bdc6fa759601c&scene=21#wechat_redirect" textvalue="生信十周年分享会上海地区预热场" linktype="text" imgurl="" imgdata="null" data-itemshowtype="0" tab="innerlink" data-linktype="2" hasload="1">生信十周年分享会上海地区预热场</a><br></section><section><br></section></li><li><section> <a target="_blank" href="http://mp.weixin.qq.com/s?__biz=MzAxMDkxODM1Ng==&mid=2247528363&idx=1&sn=5e02f3e9b2e148191e23ebc2c0d780e7&chksm=9b4b2f10ac3ca606c1c4bac8cf112bb9b0f18e3c4262f5f2b8c0dba3bfedf2ba201507247005&scene=21#wechat_redirect" textvalue="2024的共享服务器交个朋友福利价仍然是800" linktype="text" imgurl="" imgdata="null" data-itemshowtype="0" tab="innerlink" data-linktype="2" hasload="1">2024的共享服务器交个朋友福利价仍然是800</a></section></li><li><section><a target="_blank" href="http://mp.weixin.qq.com/s?__biz=MzAxMDkxODM1Ng==&mid=2247519765&idx=1&sn=ce5a8c8182f854c88043059f8c2cb9ff&chksm=9b4bceaeac3c47b88c19941d43dbb1401f3a92206481a0afc41159927868199643f795d62a7e&scene=21#wechat_redirect" textvalue="千呼万唤始出来的独享生物信息学云服务器" linktype="text" imgurl="" imgdata="null" data-itemshowtype="0" tab="innerlink" data-linktype="2" hasload="1">千呼万唤始出来的独享生物信息学云服务器</a></section></li></ul><p><span><span> </span></span><span></span></p><p><mp-style-type data-value="3"></mp-style-type></p></div> | ||
<hr> | ||
<a href="https://mp.weixin.qq.com/s/fKZVW1DpmnwLRCFi_KMTVg",target="_blank" rel="noopener noreferrer">原文链接</a> |