Cited by Analytical Workflows for Single‐Cell Multiomic Data Using the BD Rhapsody Platform

Best practices for single-cell analysis across modalities DOI

Lukas Heumos, Anna C. Schaar, Christopher Lance

и другие.

Nature Reviews Genetics, Год журнала: 2023, Номер 24(8), С. 550 - 572

Опубликована: Март 31, 2023

Язык: Английский

Процитировано

545

Comparison of transformations for single-cell RNA-seq data DOI

Constantin Ahlmann-Eltze, Wolfgang Huber

Nature Methods, Год журнала: 2023, Номер 20(5), С. 665 - 672

Опубликована: Апрель 10, 2023

Abstract The count table, a numeric matrix of genes × cells, is the basic input data structure in analysis single-cell RNA-sequencing data. A common preprocessing step to adjust counts for variable sampling efficiency and transform them so that variance similar across dynamic range. These steps are intended make subsequent application generic statistical methods more palatable. Here, we describe four transformation approaches based on delta method, model residuals, inferred latent expression state factor analysis. We compare their strengths weaknesses find latter three have appealing theoretical properties; however, benchmarks using simulated real-world data, it turns out rather simple approach, namely, logarithm with pseudo-count followed by principal-component analysis, performs as well or better than sophisticated alternatives. This result highlights limitations current assessed bottom-line performance benchmarks.

Язык: Английский

Процитировано

Aitchison’s Compositional Data Analysis 40 Years on: A Reappraisal DOI

Michael Greenacre, Eric Grunsky, John Bacon‐Shone

и другие.

Statistical Science, Год журнала: 2023, Номер 38(3)

Опубликована: Март 22, 2023

The development of John Aitchison's approach to compositional data analysis is followed since his paper read the Royal Statistical Society in 1982. logratio approach, which was proposed solve problematic aspects working with a fixed-sum constraint, summarized and reappraised. It maintained that properties on this originally built, main one being subcompositional coherence, are not required be satisfied exactly—quasi-coherence sufficient, near enough coherent for all practical purposes. This opens up field using simpler transformations, such as power permit zero values data. additional property exact isometry, subsequently introduced original conception, imposed use isometric but these complicated interpret, involving ratios geometric means. If regarded important certain analytical contexts, example, unsupervised learning, it can relaxed by showing regular pairwise logratios, well alternative quasi-coherent also quasi-isometric, meaning they close isometry concluded related transformations pivot logratios prerequisite good practice, although many authors insist their obligatory use. conclusion fully supported here case studies geochemistry genomics, where performance demonstrated Aitchison, or Box–Cox transforms compositions no replacements necessary.

Язык: Английский

Процитировано

Single-cell RNA-seq differential expression tests within a sample should use pseudo-bulk data of pseudo-replicates DOI

Christoph Hafemeister, Florian Halbritter

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2023, Номер unknown

Опубликована: Март 29, 2023

Abstract Single-cell RNA sequencing (scRNA-seq) has become a standard approach to investigate molecular differences between cell states. Comparisons of bioinformatics methods for the count matrix transformation (normalization) and differential expression (DE) analysis these data have already highlighted recommendations effective between-sample comparisons visualization. Here, we examine two remaining open questions: (i) What are best combinations transformations statistical test methods, (ii) how do pseudo-bulk approaches perform in single-sample designs? We evaluated performance 343 DE pipelines (combinations eight types ten tests) on simulated real-world data, terms precision, sensitivity, false discovery rate. confirm superior without prior transformation. For within-sample comparisons, advise use three pseudo-replicates, provide simple R package DElegate facilitate application this approach.

Язык: Английский

Процитировано

Studying stochastic systems biology of the cell with single-cell genomics data DOI

Gennady Gorin, John J. Vastola, Lior Pachter

и другие.

Cell Systems, Год журнала: 2023, Номер 14(10), С. 822 - 843.e22

Опубликована: Сен. 25, 2023

Язык: Английский

Процитировано

Deep Learning in Single-cell Analysis DOI

Dylan Molho, Jiayuan Ding, Wenzhuo Tang

и другие.

ACM Transactions on Intelligent Systems and Technology, Год журнала: 2024, Номер 15(3), С. 1 - 62

Опубликована: Янв. 26, 2024

Single-cell technologies are revolutionizing the entire field of biology. The large volumes data generated by single-cell high dimensional, sparse, and heterogeneous have complicated dependency structures, making analyses using conventional machine learning approaches challenging impractical. In tackling these challenges, deep often demonstrates superior performance compared to traditional methods. this work, we give a comprehensive survey on in analysis. We first introduce background their development, as well fundamental concepts including most popular architectures. present an overview analytic pipeline pursued research applications while noting divergences due sources or specific applications. then review seven tasks spanning different stages analysis pipeline, multimodal integration, imputation, clustering, spatial domain identification, cell-type deconvolution, cell segmentation, annotation. Under each task, describe recent developments classical methods discuss advantages disadvantages. Deep tools benchmark datasets also summarized for task. Finally, future directions challenges. This will serve reference biologists computer scientists, encouraging collaborations.

Язык: Английский

Процитировано

scLENS: data-driven signal detection for unbiased scRNA-seq data analysis DOI

Hyun Kim, Won Chang,

Seok Joo Chae

и другие.

Nature Communications, Год журнала: 2024, Номер 15(1)

Опубликована: Апрель 27, 2024

Abstract High dimensionality and noise have limited the new biological insights that can be discovered in scRNA-seq data. While reduction tools been developed to extract signals from data, they often require manual determination of signal dimension, introducing user bias. Furthermore, a common data preprocessing method, log normalization, unintentionally distort Here, we develop scLENS, tool circumvents long-standing issues distortion input. Specifically, identify primary cause during normalization effectively address it by uniformizing cell vector lengths with L2 normalization. utilize random matrix theory-based filtering robustness test enable data-driven threshold for dimensions. Our method outperforms 11 widely used performs particularly well challenging datasets high sparsity variability. To facilitate use provide user-friendly package automates accurate detection without time-consuming tuning.

Язык: Английский

Процитировано

ClusterDE: a post-clustering differential expression (DE) method robust to false-positive inflation caused by double dipping DOI

Dongyuan Song, Siqi Chen,

Christy Lee

и другие.

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2023, Номер unknown

Опубликована: Июль 25, 2023

Double dipping is a well-known pitfall in single-cell and spatial transcriptomics data analysis: after clustering algorithm finds clusters as putative cell types or domains, statistical tests are applied to the same identify differentially expressed (DE) genes potential cell-type spatial-domain markers. Because that contribute inherently likely be identified DE genes, double can result false-positive markers, especially when spurious, leading ambiguously defined domains. To address this challenge, we propose ClusterDE, method designed post-clustering reliable markers of while controlling false discovery rate (FDR) regardless quality. The core ClusterDE involves generating synthetic null an silico negative control contains only one type domain, allowing for detection removal spurious discoveries caused by dipping. We demonstrate controls FDR identifies canonical top distinguishing them from housekeeping genes. ClusterDE's ability discover absence such used determine whether two ambiguous should merged. Additionally, compatible with state-of-the-art analysis pipelines like Seurat Scanpy.

Язык: Английский

Процитировано

A systematic evaluation of highly variable gene selection methods for single-cell RNA-sequencing DOI

Ruzhang Zhao, Jiuyao Lu, Weiqiang Zhou

и другие.

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2024, Номер unknown

Опубликована: Авг. 26, 2024

Abstract Background Selecting highly variable features is a crucial step in most analysis pipelines of single-cell RNA-sequencing (scRNA-seq) data. Despite numerous methods proposed recent years, systematic understanding the best solution still lacking. Results Here, we systematically evaluate 47 gene (HVG) selection methods, consisting 21 baseline developed based on different data transformations and mean-variance adjustment techniques 26 hybrid mixtures methods. Across 19 diverse benchmark datasets, 18 objective evaluation criteria per method, 5,358 settings, observe that no single method consistently outperforms others across all datasets criteria. However, as group robustly outperform individual Based these findings, new HVG approach, mixture (mixHVG), incorporates top-ranked from multiple better to selection. An open source R package mixhvg enable convenient use mixHVG its integration into users’ pipelines. Conclusion Our study not only provides comparison existing leading solution, but also creates pipeline resource for evaluating future.

Язык: Английский

Процитировано

Erasure of Biologically Meaningful Signal by Unsupervised scRNAseq Batch-correction Methods DOI

Scott R. Tyler, Ernesto Guccione, Eric E. Schadt

и другие.

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2021, Номер unknown

Опубликована: Ноя. 19, 2021

Abstract Single cell RNAseq (scRNAseq) batches range from technical-replicates to multi-tissue atlases, thus requiring robust batch-correction methods that operate effectively across this spectrum of between-batch similarity. Commonly employed benchmarks quantify removal batch effects and preservation within-batch variation, the biologically meaningful differences between has been under-researched. Here, we address these gaps, quantifying at level cluster composition along overlapping topologies through introduction two new measures. We discovered standard approaches scRNAseq erase cell-type cell-state variation in real-world biological datasets, single gene expression silico experiments. highlight examples showing issues may create artefactual appearance external validation/replication findings. Our results demonstrate either effects, if known, must be balanced (like bulk-techniques), or technical vary explicitly modeled prevent erasure by unsupervised correction approaches.

Язык: Английский

Процитировано