Contributors Forks Stargazers Issues GPL License


Logo

SignacX 2.2.3

Get the most out of your single cell data.
Explore the docs »

View Demo · Report Bug · Request Feature

Table of Contents

What is SignacX?

SignacX is software developed by the Savova lab at Sanofi with a focus on single cell genomics for clinical applications. SignacX classifies the cellular phenotype for each individual cell in single cell RNA-sequencing data using neural networks trained with sorted bulk gene expression data from the Human Primary Cell Atlas. In this R implementation, we provide functions and vignettes that demonstrate how to: integrate single cell data (mapping cells from one data set to another), classify non-human data, identify novel cell types, and classify single cell data across many tissues, diseases and technologies. To learn more, check out the pre-print here.

Data portal

Here, we provide interactive access to data from the pre-print with SPRING Viewer. Just click the “Explore” links below, and search your favorite gene:

Links Tissue Disease Number of cells Number of samples Source Signac version
Explore Kidney Cancer 48,037 47 Stewart et al. 2019 v2.0.7
Explore Kidney and urine Lupus nephritis and healthy 5,886 39 Arazi et al. 2019 v2.0.7
Explore Lung Cancer 42,844 18 Zilionis et al. 2020 v2.0.7
Explore Lung Fibrosis 96,461 31 Habermann et al. 2020 v2.0.7
Explore Lung Fibrosis 109,421 16 Reyfman et al. 2019 v2.0.7
Explore Monkey PBMCs Healthy 5,491 1 Chamberlain et al. 2021 v2.0.7
Explore Monkey PBMCs Healthy 5,220 1 Chamberlain et al. 2021 v2.0.7
Explore Monkey T cells Healthy 5,496 1 Chamberlain et al. 2021 v2.0.7
Explore PBMCs Cancer 14,048 8 Zilionis et al. 2020 v2.0.7
Explore PBMCs Healthy 7,902 1 10X Genomics v2.0.7
Explore PBMCs Healthy 4,784 1 10X Genomics v2.0.7
Explore Skin Atopic dermatitis 36,690 17 He et al. 2020 v2.0.7
Explore Synovium Rheumatoid arthritis and osteoarthritis 8,920 26 Zhang et. al 2019 v2.0.7

Note: * Cell type annotations are provided at four levels (immune, celltypes, cellstates and novel celltypes). * When available, we also provided information about sample covariates (i.e., disease, age, gender, FACs etc.). * Cell type annotations for all 13 data sets were generated with the Signac function with the default settings without changing any settings or parameters.

Special thanks to Allon Klein’s lab (particularly Caleb Weinreb and Sam Wolock) for hosting the data.

Getting Started

To install SignacX in R, simply do:

Installation

install.packages("SignacX")

Quick start

The main functions in Signac are:

# load the library
library(SignacX)

# Generate initial labels
labels = Signac(E = your_data_here)

# Get cell type labels
celltypes = GenerateLabels(labels, E = your_data_here)

Sometimes we don’t have time to run Signac, and need a quick solution. Although Signac scales fine with large data sets (>300,000 cells), we developed SignacFast to quickly classify single cell data:

# load the library
library(SignacX)

# generate labels with pre-trained model
labels_fast <- SignacFast(E = your_data_here, num.cores = 4)
celltypes_fast = GenerateLabels(labels_fast, E = your_data_here)

Usage

To make life easier, SignacX was integrated with Seurat (versions 3 and 4), and with SPRING. We provide a few vignettes:

SPRING

In the pre-print, we often used Signac integrated with SPRING. To reproduce our findings and to generate new results with SPRING, please visit the SPRING repository which has example notebooks and installation instructions, particularly for processing CITE-seq and scRNA-seq data from 10X Genomics. Briefly, Signac is integrated seamlessly with the output files of SPRING in R, requiring only a few functions:

# load the Signac library
library(SignacX)

# dir points to the "FullDataset_v1" directory generated by the SPRING Jupyter notebook
dir = "./FullDataset_v1" 

# load the expression data
E = CID.LoadData(dir)

# generate cellular phenotype labels
labels = Signac(E, spring.dir = dir)
celltypes = GenerateLabels(labels, E = E, spring.dir = dir)

# write cell types and Louvain clusters to SPRING
dat <- CID.writeJSON(celltypes, spring.dir = dir)

After running the above functions, cellular phenotypes and Louvain clusters are ready to be visualized with SPRING Viewer, which can be setup locally as described here.

Seurat

Another way to use Signac is with Seurat. In this vignette, we performed multi-modal analysis of CITE-seq PBMCs from 10X Genomics using Signac integrated with Seurat.

Note: * This same data set was also processed using SPRING in this notebook, and subsequently classified with Signac, which was used to generate SPRING layouts for these data in the pre-print (Figures 2-4), which is available for interactive exploration here.

MASC

Sometimes, we have single cell genomics data with disease information, and we want to know which cellular phenotypes are enriched for disease. In this vignette, we applied Signac to classify cellular phenotypes in healthy and lupus nephritis kidney cells, and then we used MASC to identify which cellular phenotypes were disease-enriched.

Note: * MASC typically requires equal numbers of cells and samples between case and control: an unequal number might skew the clustering of cells towards one sample (i.e., a “batch effect”), which could cause spurious disease enrichment in the mixed effect model. Since Signac classifies each cell independently (without using clusters), Signac annotations can be used with MASC without a priori balancing samples or cells, unlike cluster-based annotation methods.

Non-human data

In Supplemental Figure 8 of the pre-print, we classified single cell data for a model organism (cynomolgus monkey) for which flow-sorted datasets were generally lacking without any additional species-specific training. Instead, we mapped homologous genes from the Macaca fascicularis genome to the human genome in the single cell data, and then performed cell type classification with Signac. We demonstrate how we mapped the gene symbols here.

Note: * This code can be used for to identify homologous genes between any two species. * Monkey data used in Supplemental Figure 8 are available for interactive exploration in the table listed above.

Genes of interest

In Figure 6 of the pre-print, we compiled data from three source (CellPhoneDB, GWAS catalog and Fang et al. 2020) to find genes of immunological / pharmacological interest. These genes and their annotations can be accessed internally from within Signac:

# load the library
library(SignacX)

# See ?Genes_Of_Interest
data("Genes_Of_Interest")

Learning from single cell data

In Figure 4 of the pre-print, we demonstrated that Signac mapped cell type labels from one single cell data set to another; learning CD56bright NK cells from CITE-seq data. Here, we provide a vignette for reproducing this analysis, which can be used to map cell populations (or clusters of cells) from one data set to another. We also provide interactive access to the single cell data that were annotated with the CD56bright NK cell-model (Note: the CD56bright NK cells appear in the “CellStates” annotation layer as red cells).

Links Tissue Disease Number of cells Number of samples Source Signac version
Explore Kidney Cancer 48,037 47 Stewart et al. 2019 v2.0.7 + CD56bright NK
Explore Kidney and urine Lupus nephritis and healthy 5,886 39 Arazi et al. 2019 v2.0.7 + CD56bright NK
Explore Lung Cancer 42,844 18 Zilionis et al. 2020 v2.0.7 + CD56bright NK
Explore Lung Fibrosis 96,461 31 Habermann et al. 2020 v2.0.7 + CD56bright NK
Explore Lung Fibrosis 109,421 16 Reyfman et al. 2019 v2.0.7 + CD56bright NK
Explore Monkey PBMCs Healthy 5,491 1 Chamberlain et al. 2021 v2.0.7 + CD56bright NK
Explore Monkey PBMCs Healthy 5,220 1 Chamberlain et al. 2021 v2.0.7 + CD56bright NK
Explore Monkey T cells Healthy 5,496 1 Chamberlain et al. 2021 v2.0.7 + CD56bright NK
Explore PBMCs Cancer 14,048 8 Zilionis et al. 2020 v2.0.7 + CD56bright NK
Explore PBMCs Healthy 4,784 1 10X Genomics v2.0.7 + CD56bright NK
Explore Skin Atopic dermatitis 36,690 17 He et al. 2020 v2.0.7 + CD56bright NK
Explore Synovium Rheumatoid arthritis and osteoarthritis 8,920 26 Zhang et. al 2019 v2.0.7 + CD56bright NK

Fast Signac

Sometimes we don’t have time to run Signac and need a faster solution. Although Signac scales fine with large data sets (>300,000 cells) and even for large data, typically takes less than an hour, we developed SignacFast to quickly classify single cell data:

# load the library
library(SignacX)

# generate labels with pre-trained model
labels_fast <- SignacFast(E = your_data_here, num.cores = 4)
celltypes_fast = GenerateLabels(labels_fast, E = your_data_here)

Unlike Signac, SignacFast uses a pre-trained ensemble of neural network models generated from the HPCA reference data, speeding classsification time ~5-10x fold. These models were generated from the HPCA training data like so:

# load the library
library(SignacX)

# load pre-trained neural network ensemble model
ref = GetTrainingData_HPCA()

# generate models
Models_HPCA = ModelGenerator(R = training_HPCA, N = 100, num.cores = 4)

The “Models_HPCA” are accessed from within the R package:

# load the library
library(SignacX)

# load pre-trained neural network ensemble model
Models = GetModels_HPCA()

We demonstrate how to use SignacFast in this vignette, which shows that the results are broadly consistent with running Signac.

Note: * For proper use; if the concern is only major cell types (i.e., TNK and MPh cells), then SignacFast is a fine alternative to Signac.

Benchmarking

CITE-seq

In Figure 2-3 of the pre-print, we validated Signac with CITE-seq PBMCs. Here, we reproduced that analysis with SPRING (in this vignette; as was performed in the pre-print) and additionally with Seurat (in this vignette), and provide interactive access to the data here.

Flow-sorted synovial cells

In Figure 3 of the pre-print, we validated Signac with flow cytometry and compared Signac to SingleR. We reproduced that analysis using Seurat in this vignette, and provide interactive access to the data here.

PBMCs

In Table 1 of the pre-print, we benchmarked Signac across seven different technologies: CEL-seq, Drop-Seq, inDrop, 10X (v2), 10X (v3), Seq-Well and Smart-Seq2; this analysis was reproduced here.

Roadmap

See the open issues for a list of proposed features (and known issues).

Contributing

Any contributions you make are greatly appreciated.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

You can also open a pull request to commit to the master branch.

License

Distributed under the GPL v3.0 License. See LICENSE for more information.

Contact

Mathew Chamberlain - chamberlainphd@gmail.com

Project Link: https://github.com/mathewchamberlain/SignacX