Skip to content

Project (Soongsil University Graduate Study) and TIL on single cell

License

Notifications You must be signed in to change notification settings

brian604/singlecellstudy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Singlecellstudy

Project (Soongsil University Graduate Study) and TIL(Today I Learned) on single cell

COVID project

  • Author: Jiwon Kim
  • For Soongsil University Single Cell sequencing analysis graduate course (under Prof. Junil Kim)
  • Paper/ Data Reference:
    Single-cell landscampe of bronchoalveolar immune cells in patients with COVID-19
  • Data that is ready to be analyzed is in data/ in h5 format
    • TCR-seq is out of scope in this analysis report.
    • A normal subject (GSM3660650) has not been included in the analysis. Meta data containing patient and barcode information is in .txt file, named all.cell.annotation.meta.txt or meta.txt
  • Metadata containing patient (stratification into healthy/moderate/severe) and barcode are in the data/ folder.
  • Folder: image directory is broken down into the followings:
    • pre_qc : Pre-QCed samples
    • post_qc: Post-QCed samples

Introduction of scRNA sequencing analysis


Introduction

COVID-19 is a global pandemic known to be originated from mainland China. SARS-CoV2 virus is known to infect respiratory tract, first. Infected individuals have exhibited a large range of symptoms. Those symptoms are known to point to differential immune response (Paces et al. 2020). However, a high-resolution respiratory immune landscape is largely unknown. Since brochoalveolar alvage fluid (BAL) mirrors local immune landscape, they have attempted to do single-cell sequencing on it to three subject populations: healthy control, moderately illed, and severely- illed. With the carefully laid-out definitions of severeity, they have sequenced BAL of a total of 13 patients including 3 healthy control subjects.

Method

QC & Integration


  • QC was done accordingly to the paper
    • Minimum RNA Features = 200
    • Maximum RNA Features = 6000
    • Required counts of RNA = 1000
    • Maximum mito. cut-off = 10
  • Integration was also done accordingly to the paper (first 50 dimensions)

# Of which have been QC(filtered), 
all <- c(healthy.df.filtered, moderate.df.filtered, severe.df.filtered)
nCoV <- FindIntegrationAnchors(object.list = all, dims = 1:50)
nCoV.integrated <- IntegrateData(anchorset = nCoV, dims = 1:50,features.to.integrate = rownames(nCoV))

Clustering


  • Clustering was done accordingly to the paper
    • Normalization using 'LogNormalize' method
    • 'vst' method to identify top 2000 variable genes
    • Scaling was done with variables 'nCount_RNA' and 'percent'mito'.

Results:



Observation of Pre-QC

After downloading the files (h5 format), I have looked at the raw distribution of (1) number of RNA features (2) number of RNA (reads) counts (3) percentage of mitochondria across samples.

I have noticed the followings:

Normal people (corresponding to the C51, C52, and C100) all have distinct bimodal distribution of RNA feature numbers. On the other hands, COVID-19 infected people have variable distributions. Although it would be nice to do statistical test on the distribution shape (manifold?), I have not done the analysis.

For three moderately infected patients (C141, C142, C144), I have noted that highly diverse distributions of RNA features; from bimodal to long-tailed distribution really close to 0. The rest of samples (N = 6) are severely illed patients. Note the general skewed distribution close to 0. Also, RNAs (in terms of both frequencies and kinds) are very diverse, observing from a lot of points concentrated at the bottom for severely illed patients.

Figure 1.1: Pre-QC of representative normal subject (C51) Figure 1.1: Pre-QC of representative normal subject (C51) Figure 1.2: Pre-QC of representative moderatly illed subject (C142) Figure 1.2: Pre-QC of representative moderately illed subject (C142) Figure 1.3: Pre-QC of representative severely illed subject (C145) Figure 1.3: Pre-QC of representative severely illed subject (C145)

Observation of Integration and Post-QC

After post-QC and integration, a landscape of RNA features and read counts across subjects has changed greatly. Distinct bi-modal distribution of RNA feature numbers has disappeared from the normal subjects data. Except for one subject (i.e. C100), fairly uniform distributions for both RNA feature numbers and frequencies were observed. On the other hands, bi-modal distribution was seen on moderately illed patients for both RNA feature numbers and frequencies (except for one subject). Increased diversities had been observed in severely illed patients.


Figure 2: Post-QC of all subjects Figure 2: Post-QC of all subjects

Identifying distinct distribution patterns upon patient group (i.e. severeity) and COVID-19 presence


Using the metadata to differentiate and identify clusters has failed due to several reasons: (1) barcode number mismatch (2) technical difficulties. Jumping into conclusion, resolving subject information into Seurat object has been successful.

Hence, I have assessed the UMAP projection of all cells to look at the distinct clusters of which specfic subgroups (i.e. COVID-19 and non-COVID19). Figure 3 and Figure 4 have shown the distinct patterns do patients have. To delve more to see what clusters are specific to the subgroup, a proportion of clusters upon disease presence (Figure 5) has identified which clusters deserve more attention. Clusters have been selected upon (1) relative contribution of clusters that form different disease groups, and (2) modest convolutedness of clusters. To be specific, I have refrained to select clusters admixed with the different samples (Figure 6).

Finally, I have decided to look into cluster 0, cluster 2, and cluster 12. Cluster 0 seems very specific to COVID-19 patients; Cluster 2 to healthy control; cluster 12 indicates to mostly healthy controls, but moderately admixed with COVID-19 infected subjects.

Figure 3: UMAP projection upon groups composed of Healthy Control(HC), Moderate (M), and Severe (S) Figure 3: UMAP projection upon groups composed of Healthy Control(HC), Moderate (M), and Severe (S)

Figure 4: UMAP projection across samples starting with group names. Figure 4: UMAP projection across samples starting with group names

Figure 5: Proportion of clusters upon disease presence. Figure 4: UMAP projection across samples starting with group names

Figure 6: UMAP projection with a total of 29 clusters Figure 6: UMAP projection with a total of 29 clusters

Finding differential markers on the Clusters 0, 2, 12

Cluster 0 (COVID19- dominant)

Figure 7 shows dot-plot expression for the most variable genes (Figure 8) across subjects from Severe (0_S) to Healthy Control (0_HC) in cluster 0. To be an instance, WFDC2 gene has been increasingly expressed upon the severeity of disease. 0_NA label most likely correspond to the Moderate (0_M) patients (not shown). There is a list of curated gene by the original authors (i.e. {MARCO, CD48, FCGR3A, TREM2, FCN1, SPP1, FAB4, and CD8A}). Figure 9 shows dot-plot expression for the curated gene lists across subjects. In Cluster 0, FABP4 gene is notable because of the trend of decreasing expression against severeity of disease.

Cluster 2 (Healthy Control- dominant)

Figure 10 shows dot-plot expression for the most variable genes (Figure 8) across subjects from Severe (0_S) to Healthy Control (0_HC) in cluster 2. A trend of increasing WFDC2 expression against severety of disease has been also observed. Also, dot-plot expression for the curated gene lists has been done (Figure 11). Similar pattern has been observed where FABP4 decreases over the severeity of the disease.

Cluster 12 (Admixed)

Figure 12 shows dot-plot expression for the most variable genes (Figure 8) across subjects from Severe (0_S) to Healthy Control (0_HC) in cluster 12. Those genes are generally expressed exclusively for the COVID-19 patients (0_S, 0_M). No pattern has been observed in the curated gene lists (not shown).

Figure 7: Dot-plot expression in cluster 0 Figure 7

Figure 8: Most-variable genes Figure 8

Figure 9: Dot-plot expression in cluster 0 for curated genes Figure 9

Figure 10: Dot-plot expression in cluster 2 Figure 10

Figure 11: Dot-plot expression in cluster 2 for curated genes Figure 11

Figure 12: Dot-plot expression in cluster 12 Figure 12

Discussion:



Results show the differential expression of WFDC2 gene in the COVID-19 patients. WFDC2 encodes the protein HE4 (Human Epididymis Protein 4). HE4 is known to be a secretory protein that serves as biomarker for various diseases from epithelial ovarian cancers to chronic kidney disease (Li et al. 2020). Normally, HE4 expression is highest in normal human trachea and salivary gland and modest in the lung. Upon an insult, HE4 immunoreactivity is found with corresponding heightened gene expression. (Galgano et al. 2006). Parallel to such results, Wei and the team have observed the correlation between the HE4 protein level and COVID-19 severeity (Wei et al. 2020; Schirinzi et al. 2020).

Conclusion:



A high-definition landscape of respiratory system of the COVID19 has been laid out. This analysis has confirmed the biomarker (HE4) for COVID19.

Reference besides this paper:



1. Galgano, M. T., Hampton, G. M., & Frierson, H. F. (2006). Comprehensive analysis of HE4 expression in normal and malignant human tissues. Modern Pathology, 19(6), 847–853. https://doi.org/10.1038/modpathol.3800612
  1. Li, L., Yao, Y., Liang, J., Zhan, X., Wang, F., Yue, C., Wu, B.-Q., Hu, S., Liu, M., Wan, J., & Luo, J. (2020). Serum human epididymis protein 4 concentrations are associated with severity of patients with pulmonary tuberculosis. Clinica Chimica Acta, 502, 255–260. https://doi.org/10.1016/j.cca.2019.11.009

  2. Paces, J., Strizova, Z., Smrz, D., & Cerny, J. (2020). COVID-19 and the Immune System. Physiological Research, 379–388. https://doi.org/10.33549/physiolres.934492

  3. Schirinzi, A., Cazzolla, A. P., Lovero, R., Lo Muzio, L., Testa, N. F., Ciavarella, D., Palmieri, G., Pozzessere, P., Procacci, V., Di Serio, F., & Santacroce, L. (2020). New Insights in Laboratory Testing for COVID-19 Patients: Looking for the Role and Predictive Value of Human epididymis secretory protein 4 (HE4) and the Innate Immunity of the Oral Cavity and Respiratory Tract. Microorganisms, 8(11), 1718. https://doi.org/10.3390/microorganisms8111718

About

Project (Soongsil University Graduate Study) and TIL on single cell

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published