You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi dear Zilu,
Thanks for developing a very useful tool!
When I follow the methods on the original paper to perform quality control on REAP-PBMC dataset, I can't get the same data as in the Supplementary Table 1. So, can you give more detailed quality control guidelines about REAP-PBMC dataset. Thank you!!!
Best,
Li
The text was updated successfully, but these errors were encountered:
Sorry for the delay. I lost track of this github for a while...
Here was the original analysis python code:
X = pd.read_csv('../../reap-seq-data/GSM2685239_mRNA_3_PBMCs_denoised.csv',sep=',')
y = pd.read_csv('../../reap-seq-data/GSM2685244_protein_3_PBMCs_matrix.txt',sep='\t')
gene=X.index.tolist()
protein = y['Protein'].tolist()
y=y.drop(columns=y.columns[0])
y.index=protein
# sanity check
Xcellname=X.columns
X.columns=[x.replace('.','-') for x in Xcellname]
(X.columns==y.columns).all()
mitoGene=list(filter(lambda x:re.search(r'^MT-', x), gene))
mitoPercent=X.loc[mitoGene,:].sum(axis=0)/X.sum(axis=0)
mitoPercent.hist()
X=X.loc[:,mitoPercent<0.2]
y=y.loc[:,mitoPercent<0.2]
# Remove low expressed genes
X.sum(axis=1).plot.hist(bins=50)
plt.show()
X=X.loc[X.sum(axis=1)>10,:]
gene=X.index.tolist()
# Remove unnecessary protein information
mouseP=list(filter(lambda x:re.search(r'(Mouse|Rat|Blank)', x), protein))
protein=set(protein) - set(mouseP)
protein=['CD8_CAATCCCT', 'CD154_GGTAATGT', 'CD152_GTCCATTG', 'CD45RA_GTGATAGT', 'CD45_TCTCGACT', 'CD223_TGGACCCT', 'CD69_GTTGCATG', 'CD40_ATATGAGA', 'CD4_CACGATTC', 'CD28_TTGCGTCG', 'CD14_AATTGAAC', 'CD137_CCGTTATG', 'HLA-DRA_TAGACGAC', 'CD4_GTCCAGGC', 'CD73_GCTACTTC', 'CD11b_AGGGCGTT', 'CD197_ACGCTTGG', 'CD155_TAAATCGT', 'CD279_GAACCCGG', 'CD9_CACTCAAC', 'CD8a_ACCCGCAC', 'CD357_TCGTAGAT', 'CD27_GCTGTGTA', 'CD19_CTATACGC', 'CD335_GTTTGTGG', 'CD45RO_TGATATCG', 'CD66b_CGCTATCC', 'CD274_CTTGTACC', 'CD20_ACGCGGAA', 'CD272_ACCGAACA', 'CD56_TACATAAG', 'CD8_GTATCGAG', 'CD273_AGCAGTTA', 'CD3_AGGATCGA', 'CD278_CGACTCTT', 'CD127_TGGGAGCT', 'CD158E1_TAAGCCAC', 'CD68_CACCTCAG', 'CD25_GCTCGTCA', 'CD134_ATCCGGCA', 'CD45_ACGAGTAG', 'FOXP3_ATCGCCAT', 'TIGIT_CATGCGTA', 'CD33_CCAGTGGA']
y=y.loc[protein,:]
protein = [ x[:-9] for x in protein]
y.index=protein
We applied some traditional QC, including filter out cells with high mitocontrial gene expression (>20%, likely not normal cell) and low expressed genes (<10 UMI count).
Hi dear Zilu,
Thanks for developing a very useful tool!
When I follow the methods on the original paper to perform quality control on REAP-PBMC dataset, I can't get the same data as in the Supplementary Table 1. So, can you give more detailed quality control guidelines about REAP-PBMC dataset. Thank you!!!
Best,
Li
The text was updated successfully, but these errors were encountered: