Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do you handle merging samples across different tissue arrays, leading to redundant coordinates? #249

Open
AmosFong1 opened this issue Oct 25, 2024 · 4 comments

Comments

@AmosFong1
Copy link

AmosFong1 commented Oct 25, 2024

Hello, how does CellChat handle redundancy in coordinates when using a combined dataset across multiple tissue arrays? For example I am working with CosMx data. I have specified samples as a string combining the flow cell name and the FOV to allow CellChat to treat each FOV and flow cell as a separate sample. When I run CellChat on a single tissue array with a non-redundant coordinate system, I get significant results. However, when I run CellChat on a merged dataset consisting of multiple tissue arrays with a redundant coordinate system, I get zero results. @sqjin

@sqjin
Copy link
Member

sqjin commented Dec 12, 2024

@AmosFong1 Have you solved this issue? I think you should assign different cell barcodes to different samples, and provide the batch labels.

@AmosFong1
Copy link
Author

Hi @sqjin, I have not resolved this issue yet. I have already assigned unique barcodes to each sample, and provided batch labels as column samples. I noticed the issue is that computeCellDistance() returns all NA values when there are redundant coordinates. Additionally, inspecting the cell chat object after running with redundant coordinates, all the cell distances stored are represented as NAs.

I have a hacky solution where I shift cells from different TMAs by a number * 10e6.

Below is my current working implementation:

# write function
cellchat <- function(seurat_object, name) {
  # get counts
  counts <- GetAssayData(seurat_object, layer = "data", assay = "RNA") 
  
  # get meta data
  meta_data <- data.frame(labels = factor(seurat_object$labels), samples = factor(seurat_object$samples))
  rownames(meta_data) <- colnames(counts)

  # get coordinates
  coordinates <- select([email protected], CenterX_global_px, CenterY_global_px, flow_cell_name)
  idx_dict <- distinct(coordinates, flow_cell_name)
  idx_dict <- mutate(idx_dict, idx = row_number())
  coordinates <- left_join(coordinates, idx_dict, by = "flow_cell_name")
  coordinates <- mutate(coordinates, CenterX_global_px = CenterX_global_px + (idx - 1) * 1e6)
  coordinates <- mutate(coordinates, CenterY_global_px = CenterY_global_px + (idx - 1) * 1e6)
  coordinates <- select(coordinates, CenterX_global_px, CenterY_global_px)
  
  # get spatial factors
  ratio = 0.121
  cell_distances <- list()
  for (i in unique(seurat_object$flow_cell_name)) {
    m <- filter([email protected], flow_cell_name == i)
    c <- select(m, CenterX_global_px, CenterY_global_px)
    cell_distances[[i]] = computeCellDistance(c)
  }
  min_distances <- lapply(cell_distances, min)
  spot_size <- min(unlist(min_distances)) * ratio
  spatial_factors = data.frame(ratio = rep(ratio, length(unique(seurat_object$samples))), tol = rep(spot_size / 2, length(unique(seurat_object$samples))))
  rownames(spatial_factors) <- unique(seurat_object$samples)
  
  # create cellchat object
  cellchat_object <- createCellChat(object = counts, meta = meta_data, group.by = "labels", datatype = "spatial", coordinates = coordinates, spatial.factors = spatial_factors)
  
  # add database
  cellchat_object@DB <- subsetDB(CellChatDB.human, search = c("Secreted Signaling", "Cell-Cell Contact"))
  
  # subset cellchat object
  cellchat_object <- subsetData(cellchat_object)

  # identify over expressed genes
  cellchat_object <- identifyOverExpressedGenes(cellchat_object)

  # identify over expressed interactions
  cellchat_object <- identifyOverExpressedInteractions(cellchat_object)
  
  # compute communication probability
  cellchat_object <- computeCommunProb(cellchat_object, type = "truncatedMean", distance.use = FALSE, interaction.range = 250, scale.distance = NULL, contact.range = 100)
  
  # filter communication
  cellchat_object <- filterCommunication(cellchat_object, min.cells = 10)
  
  # subset communication
  communications <- subsetCommunication(cellchat_object)
  
  # add evidence
  communications <- mutate(communications, evidence = gsub(",", " ", evidence))
  
  # save communication
  write.table(communications, file = file.path(project_dir, "data", "cellchat", paste0("cosmx_", tolower(gsub("[^a-zA-Z]", "", name)), "_communications.csv")), quote = FALSE, sep = ",", row.names = FALSE, col.names = TRUE)
}

@sqjin
Copy link
Member

sqjin commented Dec 16, 2024

@AmosFong1 Did you mean that there are some cells that have the exact same coordinates across different FOVs? Are you working on the CoxMx data? If so, I suggest to set contact.range = 10 instead of contact.range = 100

@AmosFong1
Copy link
Author

@sqjin Thanks for the suggestion I will use contact.range = 10. Yes some of the cells have the exact same coordinates across different FOVs, because my dataset incorporates ~14 different slides.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants