A new platform for automated cell type identification

Researchers at FIMM have developed an effective computational approach that can automatically identify various cell types based on single-cell RNA-sequencing data. The method has a huge potential for unbiased profiling of mixtures of cells, both for large-scale projects and samples derived from a single patient.

It is not yet known how many different cell types there are in the human body. Previous estimates have suggested the number is between 200 and 300, but the recent large-scale sequencing projects such as the Human Cell Atlas are constantly revealing new cell types.

Cells can be defined by the activity of their ~20,000 genes. Rapid advancements in next-generation sequencing methodology have made profiling of cells at individual level feasible. Of the many single-cell analysis methods, single-cell RNA sequencing (scRNA-seq), which profiles gene expression, is the most common technique. 

With the new scRNA-seq methods, researchers can now profile hundreds of thousands of cells in human specimens. This new field has a huge scientific potential by allowing the study of cell-to-cell variation within a complex tissue. However, the downstream analyses of such data are complicated and computationally heavy.

A group led by Professor Tero Aittokallio consisting of researchers from the Institute for Molecular Medicine Finland FIMM (University of Helsinki) and the Helsinki Institute of Information Technology HIIT, (Aalto University) has developed a computational platform that can make this tedious work much easier.

In their recent Nature Communications publication, the team describes their newly developed and freely available tool, called ScType, which enables accurate cell type identification by guaranteeing the specificity of positive and negative marker genes both across cell clusters and cell types.

An infographic demonstrating the various steps of the ScType cell identification process: Single cell RNA-seq, Cell clustering, ScType Database, Automated cell type annotation, Single cell SNV calling to separate healthy and malignant cells.

ScType platform enables fast and accurate cell type identification, and distinguishing between healthy and malignant cell populations, based on single-cell calling of single-nucleotide variants (SNVs). Credits: Aleksandr Ianevski

“The existing cell type identification methods are mainly based on unsupervised clustering of cells based on the similarity of their scRNA-seq profiles, followed by manual annotation of cell clusters using established marker genes. This is a time-consuming process that may lead to sub-optimal results”, explains Doctoral Researcher Aleksandr Ianevski, the fist author of the study and the main developer of the method.

“By contrasts, ScType platform enables data-driven, fully-automated and ultra-fast cell-type identification based solely on given scRNA-seq data, combined with a comprehensive cell marker database as background information”, says Anil K Giri, another lead author of the work.

The team demonstrated the feasibility of the method by re-analyzing six scRNA-seq datasets representing both human and mouse tissues. The results showed that ScType platform correctly annotated a total of 72 out of 73 cell-types (almost 99% accuracy), including eight newly-reannotated cell-types that were incorrectly or non-specifically annotated in the original studies. 

Furthermore, ScType also enables distinguishing between healthy and malignant cell populations, making it a versatile tool for exploration and use of single-cell transcriptomic data for anticancer applications.

“We anticipate the ScType platform will accelerate unbiased phenotypic profiling of cells when applied either to large-scale single-cell sequencing projects or smaller-scale profiling of patient-derived samples”, said Professor Tero Aittokallio.

To promote its wide application, either as a stand-alone tool or together with other popular single-cell data analysis software, the group has deployed ScType both as an interactive web-platform, and as an open-source R-package, connected with a comprehensive ScType database of specific markers.

Original publication: Aleksandr Ianevski, Anil K. Giri & Tero Aittokallio. Fully-automated and ultra-fast cell-type identification using specific marker combinations from single-cell transcriptomic data. Nature Communications 13, Article number: 1246 (2022).


Further information:

Aleksandr Ianevski, FIMM PhD student

Institute for Molecular Medicine Finland FIMM, HiLIFE, University of Helsinki

E-mail: aleksandr.ianevski@helsinki.fi


Tero Aittokallio, PhD, professor

Institute for Molecular Medicine Finland FIMM, HiLIFE, University of Helsinki

E-mail: tero.aittokallio@helsinki.fi