DR-SC: simulation

Wei Liu

2024-03-19

Generate the simulated data

First, we generate the spatial transcriptomics data with lattice neighborhood, i.e. ST platform by using the function gendata_RNAExp in DR.SC package, which is a Seurat object format. It is noted that the meta.data must include spatial coordinates in columns named “row” (x coordinates) and “col” (y coordinates)!

library(DR.SC)
seu <- gendata_RNAExp(height=30, width=30,p=500, K=4)
head(seu@meta.data)

Fit DR-SC using simulated data

Data preprocessing

This preprocessing includes Log-normalization and feature selection. Here we select highly variable genes for example first. The selected genes’ names are saved in “

Fit DR-SC based on highly variable genes(HVGs)

For function DR.SC, users can specify the number of clusters \(K\) or set K to be an integer vector by using modified BIC(MBIC) to determine \(K\). First, we try using user-specified number of clusters. Then we show the version chosen by MBIC.

After finishing model fitting, we use ajusted rand index (ARI) to check the performance of clustering

Next, we show the application of DR-SC in visualization. First, we can visualize the clusters from DR-SC on the spatial coordinates.

We can also visualize the clusters from DR-SC on the two-dimensional tSNE based on the extracted features from DR-SC.

Show the UMAP plot based on the extracted features from DR-SC.

Use MBIC to choose number of clusters:

Fit DR-SC based on spatially variable genes(SVGs)

First, we select the spatilly variable genes using funciton FindSVGs.

Using ARI to check the performance of clustering

DR-SC can enhance visualization

Show the spatial scatter plot for clusters

Show the tSNE plot based on the extracted features from DR-SC.

Show the UMAP plot based on the extracted features from DR-SC.

DR-SC can automatically determine the number of clusters

Use MBIC to choose number of clusters:

DR-SC can help differentially expression analysis

Conduct visualization of marker gene expression. ### Ridge plots Visualize single cell expression distributions in each cluster from Seruat.

dat <- FindAllMarkers(seu2)
suppressPackageStartupMessages(library(dplyr) )
# Find the top 1 marker genes, user can change n to access more marker genes
dat %>%group_by(cluster) %>%
    top_n(n = 1, wt = avg_log2FC) -> top
genes <- top$gene
RidgePlot(seu2, features = genes, ncol = 2)

Violin plot

Visualize single cell expression distributions in each cluster

Feature plot

We extract tSNE based on the features from DR-SC and then visualize feature expression in the low-dimensional space

Dot plots

The size of the dot corresponds to the percentage of cells expressing the feature in each cluster. The color represents the average expression level

Heatmap plot

Single cell heatmap of feature expression

Session information

sessionInfo()