Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I have some questions about REAP-PBMC dataset #25

Open
CCNU-LCY opened this issue Apr 28, 2022 · 1 comment
Open

I have some questions about REAP-PBMC dataset #25

CCNU-LCY opened this issue Apr 28, 2022 · 1 comment

Comments

@CCNU-LCY
Copy link

Hi dear Zilu,
Thanks for developing a very useful tool!

When I follow the methods on the original paper to perform quality control on REAP-PBMC dataset, I can't get the same data as in the Supplementary Table 1. So, can you give more detailed quality control guidelines about REAP-PBMC dataset. Thank you!!!

Best,
Li

@zhouzilu
Copy link
Owner

Hi Li,

Sorry for the delay. I lost track of this github for a while...

Here was the original analysis python code:

X = pd.read_csv('../../reap-seq-data/GSM2685239_mRNA_3_PBMCs_denoised.csv',sep=',')
y = pd.read_csv('../../reap-seq-data/GSM2685244_protein_3_PBMCs_matrix.txt',sep='\t')

gene=X.index.tolist()
protein = y['Protein'].tolist()
y=y.drop(columns=y.columns[0])
y.index=protein

# sanity check
Xcellname=X.columns
X.columns=[x.replace('.','-') for x in Xcellname]
(X.columns==y.columns).all()

mitoGene=list(filter(lambda x:re.search(r'^MT-', x), gene))
mitoPercent=X.loc[mitoGene,:].sum(axis=0)/X.sum(axis=0)

mitoPercent.hist()

X=X.loc[:,mitoPercent<0.2]
y=y.loc[:,mitoPercent<0.2]

# Remove low expressed genes
X.sum(axis=1).plot.hist(bins=50)
plt.show()
X=X.loc[X.sum(axis=1)>10,:]
gene=X.index.tolist()

# Remove unnecessary protein information
mouseP=list(filter(lambda x:re.search(r'(Mouse|Rat|Blank)', x), protein))

protein=set(protein) - set(mouseP)

protein=['CD8_CAATCCCT', 'CD154_GGTAATGT', 'CD152_GTCCATTG', 'CD45RA_GTGATAGT', 'CD45_TCTCGACT', 'CD223_TGGACCCT', 'CD69_GTTGCATG', 'CD40_ATATGAGA', 'CD4_CACGATTC', 'CD28_TTGCGTCG', 'CD14_AATTGAAC', 'CD137_CCGTTATG', 'HLA-DRA_TAGACGAC', 'CD4_GTCCAGGC', 'CD73_GCTACTTC', 'CD11b_AGGGCGTT', 'CD197_ACGCTTGG', 'CD155_TAAATCGT', 'CD279_GAACCCGG', 'CD9_CACTCAAC', 'CD8a_ACCCGCAC', 'CD357_TCGTAGAT', 'CD27_GCTGTGTA', 'CD19_CTATACGC', 'CD335_GTTTGTGG', 'CD45RO_TGATATCG', 'CD66b_CGCTATCC', 'CD274_CTTGTACC', 'CD20_ACGCGGAA', 'CD272_ACCGAACA', 'CD56_TACATAAG', 'CD8_GTATCGAG', 'CD273_AGCAGTTA', 'CD3_AGGATCGA', 'CD278_CGACTCTT', 'CD127_TGGGAGCT', 'CD158E1_TAAGCCAC', 'CD68_CACCTCAG', 'CD25_GCTCGTCA', 'CD134_ATCCGGCA', 'CD45_ACGAGTAG', 'FOXP3_ATCGCCAT', 'TIGIT_CATGCGTA', 'CD33_CCAGTGGA']

y=y.loc[protein,:]

protein = [ x[:-9] for x in protein]
y.index=protein

We applied some traditional QC, including filter out cells with high mitocontrial gene expression (>20%, likely not normal cell) and low expressed genes (<10 UMI count).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants