I have some questions about REAP-PBMC dataset #25

CCNU-LCY · 2022-04-28T13:52:07Z

Hi dear Zilu,
Thanks for developing a very useful tool!

When I follow the methods on the original paper to perform quality control on REAP-PBMC dataset, I can't get the same data as in the Supplementary Table 1. So, can you give more detailed quality control guidelines about REAP-PBMC dataset. Thank you!!!

Best,
Li

zhouzilu · 2022-07-12T05:39:04Z

Hi Li,

Sorry for the delay. I lost track of this github for a while...

Here was the original analysis python code:

X = pd.read_csv('../../reap-seq-data/GSM2685239_mRNA_3_PBMCs_denoised.csv',sep=',')
y = pd.read_csv('../../reap-seq-data/GSM2685244_protein_3_PBMCs_matrix.txt',sep='\t')

gene=X.index.tolist()
protein = y['Protein'].tolist()
y=y.drop(columns=y.columns[0])
y.index=protein

# sanity check
Xcellname=X.columns
X.columns=[x.replace('.','-') for x in Xcellname]
(X.columns==y.columns).all()

mitoGene=list(filter(lambda x:re.search(r'^MT-', x), gene))
mitoPercent=X.loc[mitoGene,:].sum(axis=0)/X.sum(axis=0)

mitoPercent.hist()

X=X.loc[:,mitoPercent<0.2]
y=y.loc[:,mitoPercent<0.2]

# Remove low expressed genes
X.sum(axis=1).plot.hist(bins=50)
plt.show()
X=X.loc[X.sum(axis=1)>10,:]
gene=X.index.tolist()

# Remove unnecessary protein information
mouseP=list(filter(lambda x:re.search(r'(Mouse|Rat|Blank)', x), protein))

protein=set(protein) - set(mouseP)

protein=['CD8_CAATCCCT', 'CD154_GGTAATGT', 'CD152_GTCCATTG', 'CD45RA_GTGATAGT', 'CD45_TCTCGACT', 'CD223_TGGACCCT', 'CD69_GTTGCATG', 'CD40_ATATGAGA', 'CD4_CACGATTC', 'CD28_TTGCGTCG', 'CD14_AATTGAAC', 'CD137_CCGTTATG', 'HLA-DRA_TAGACGAC', 'CD4_GTCCAGGC', 'CD73_GCTACTTC', 'CD11b_AGGGCGTT', 'CD197_ACGCTTGG', 'CD155_TAAATCGT', 'CD279_GAACCCGG', 'CD9_CACTCAAC', 'CD8a_ACCCGCAC', 'CD357_TCGTAGAT', 'CD27_GCTGTGTA', 'CD19_CTATACGC', 'CD335_GTTTGTGG', 'CD45RO_TGATATCG', 'CD66b_CGCTATCC', 'CD274_CTTGTACC', 'CD20_ACGCGGAA', 'CD272_ACCGAACA', 'CD56_TACATAAG', 'CD8_GTATCGAG', 'CD273_AGCAGTTA', 'CD3_AGGATCGA', 'CD278_CGACTCTT', 'CD127_TGGGAGCT', 'CD158E1_TAAGCCAC', 'CD68_CACCTCAG', 'CD25_GCTCGTCA', 'CD134_ATCCGGCA', 'CD45_ACGAGTAG', 'FOXP3_ATCGCCAT', 'TIGIT_CATGCGTA', 'CD33_CCAGTGGA']

y=y.loc[protein,:]

protein = [ x[:-9] for x in protein]
y.index=protein

We applied some traditional QC, including filter out cells with high mitocontrial gene expression (>20%, likely not normal cell) and low expressed genes (<10 UMI count).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I have some questions about REAP-PBMC dataset #25

I have some questions about REAP-PBMC dataset #25

CCNU-LCY commented Apr 28, 2022

zhouzilu commented Jul 12, 2022

I have some questions about REAP-PBMC dataset #25

I have some questions about REAP-PBMC dataset #25

Comments

CCNU-LCY commented Apr 28, 2022

zhouzilu commented Jul 12, 2022