Skip to content

Latent Dirichlet Allocation with application on Longitudinal Flow Cytometry Data

Notifications You must be signed in to change notification settings

xiyupeng/topic_modeling

Repository files navigation

Latent Dirichlet Allocation with application on Longitudinal Flow Cytometry Data

This repository contains all codes used in the paper: “Uncovering the hidden structure of dynamic T cell composition in peripheral blood during cancer immunotherapy: a topic modeling approach”

Prerequisites

Below are three core R packages we used for LDA analysis

library(topicmodels)
library(slam)
library(tidytext)

We use the R package topicmodels for model inference, R package slam for preparing the required input data, R package tidytext for extracting the output data. The three R packages are important for reproducing the LDA analysis we described in the paper on other datasets.

There are also other R packages we used in the script, like Seurat package for UMAP visualization and clustering, ComplexHeatmap for heatmaps. You can also use other clustering methods, like FlowSOM, or other visualization tools instead.

Preparing input

In text mining, LDA requires the input as a document-by-term count matrix, where each row represent each document,each column represent each term, each entry in the matrix is the number of occurrences of each term (a word is a single occurrence of a term). Motivated by the similarities between text data mining and single-cell analysis, for single-cell analysis, LDA consider cells as words, cell types as terms, patient samples as documents, biological processes as topics.

Before applying the LDA model to single-cell datasets, we need to prepare the cell type-by-sample count matrix as the input of LDA. One common approach to obtain the cell type count matrix is to pool all cells together and do the clustering (Before doing the pooled clustering, you may want to check if there is a batch effect between samples). X50_single_cell_analysis.R contains all codes that we used to cluster the 17M+ cells. The Louvain method in Seurat package was used to cluster cells and prepare the cell type-by-sample matrix. Cell types were manually annotated based on their marker expression.

Usage

For more detailed usage of the LDA method, please check the tutorial, where we show two examples of the application of LDA on single-cell dataset:

  • scRNA-seq data of liver cancer patients. (Data from a Nature paper)
  • Longitudinal flow cytometry data of melanoma patients. (this paper)

Data

Additional data used to generate figures are provided in:

Peng, Xiyu (2023), “flow cytometry dataset of melanoma PBMC”, Mendeley Data, V1, doi: 10.17632/d7nkgfhc8z.1

Citation

If you find the method useful, please cite:

Peng, X., Lee, J., Adamow, M., Maher, C., Postow, M. A., Callahan, M. K., Panageas, K. S., & Shen, R. (2023). A topic modeling approach reveals the dynamic T cell composition of peripheral blood during cancer immunotherapy. Cell Reports Methods, 3(8), 100546. https://doi.org/10.1016/j.crmeth.2023.100546

Contact us

If you have any problems, please contact:

Xiyu Peng ([email protected], [email protected])

About

Latent Dirichlet Allocation with application on Longitudinal Flow Cytometry Data

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages