
Cell Genomics, Journal Year: 2024, Volume and Issue: unknown, P. 100701 - 100701
Published: Dec. 1, 2024
Language: Английский
Cell Genomics, Journal Year: 2024, Volume and Issue: unknown, P. 100701 - 100701
Published: Dec. 1, 2024
Language: Английский
Nature Reviews Genetics, Journal Year: 2023, Volume and Issue: 24(8), P. 550 - 572
Published: March 31, 2023
Language: Английский
Citations
513Nature Methods, Journal Year: 2023, Volume and Issue: 20(5), P. 665 - 672
Published: April 10, 2023
Abstract The count table, a numeric matrix of genes × cells, is the basic input data structure in analysis single-cell RNA-sequencing data. A common preprocessing step to adjust counts for variable sampling efficiency and transform them so that variance similar across dynamic range. These steps are intended make subsequent application generic statistical methods more palatable. Here, we describe four transformation approaches based on delta method, model residuals, inferred latent expression state factor analysis. We compare their strengths weaknesses find latter three have appealing theoretical properties; however, benchmarks using simulated real-world data, it turns out rather simple approach, namely, logarithm with pseudo-count followed by principal-component analysis, performs as well or better than sophisticated alternatives. This result highlights limitations current assessed bottom-line performance benchmarks.
Language: Английский
Citations
84Statistical Science, Journal Year: 2023, Volume and Issue: 38(3)
Published: March 22, 2023
The development of John Aitchison's approach to compositional data analysis is followed since his paper read the Royal Statistical Society in 1982. logratio approach, which was proposed solve problematic aspects working with a fixed-sum constraint, summarized and reappraised. It maintained that properties on this originally built, main one being subcompositional coherence, are not required be satisfied exactly—quasi-coherence sufficient, near enough coherent for all practical purposes. This opens up field using simpler transformations, such as power permit zero values data. additional property exact isometry, subsequently introduced original conception, imposed use isometric but these complicated interpret, involving ratios geometric means. If regarded important certain analytical contexts, example, unsupervised learning, it can relaxed by showing regular pairwise logratios, well alternative quasi-coherent also quasi-isometric, meaning they close isometry concluded related transformations pivot logratios prerequisite good practice, although many authors insist their obligatory use. conclusion fully supported here case studies geochemistry genomics, where performance demonstrated Aitchison, or Box–Cox transforms compositions no replacements necessary.
Language: Английский
Citations
35bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2023, Volume and Issue: unknown
Published: March 29, 2023
Abstract Single-cell RNA sequencing (scRNA-seq) has become a standard approach to investigate molecular differences between cell states. Comparisons of bioinformatics methods for the count matrix transformation (normalization) and differential expression (DE) analysis these data have already highlighted recommendations effective between-sample comparisons visualization. Here, we examine two remaining open questions: (i) What are best combinations transformations statistical test methods, (ii) how do pseudo-bulk approaches perform in single-sample designs? We evaluated performance 343 DE pipelines (combinations eight types ten tests) on simulated real-world data, terms precision, sensitivity, false discovery rate. confirm superior without prior transformation. For within-sample comparisons, advise use three pseudo-replicates, provide simple R package DElegate facilitate application this approach.
Language: Английский
Citations
20Cell Systems, Journal Year: 2023, Volume and Issue: 14(10), P. 822 - 843.e22
Published: Sept. 25, 2023
Language: Английский
Citations
19ACM Transactions on Intelligent Systems and Technology, Journal Year: 2024, Volume and Issue: 15(3), P. 1 - 62
Published: Jan. 26, 2024
Single-cell technologies are revolutionizing the entire field of biology. The large volumes data generated by single-cell high dimensional, sparse, and heterogeneous have complicated dependency structures, making analyses using conventional machine learning approaches challenging impractical. In tackling these challenges, deep often demonstrates superior performance compared to traditional methods. this work, we give a comprehensive survey on in analysis. We first introduce background their development, as well fundamental concepts including most popular architectures. present an overview analytic pipeline pursued research applications while noting divergences due sources or specific applications. then review seven tasks spanning different stages analysis pipeline, multimodal integration, imputation, clustering, spatial domain identification, cell-type deconvolution, cell segmentation, annotation. Under each task, describe recent developments classical methods discuss advantages disadvantages. Deep tools benchmark datasets also summarized for task. Finally, future directions challenges. This will serve reference biologists computer scientists, encouraging collaborations.
Language: Английский
Citations
7Nature Communications, Journal Year: 2024, Volume and Issue: 15(1)
Published: April 27, 2024
Abstract High dimensionality and noise have limited the new biological insights that can be discovered in scRNA-seq data. While reduction tools been developed to extract signals from data, they often require manual determination of signal dimension, introducing user bias. Furthermore, a common data preprocessing method, log normalization, unintentionally distort Here, we develop scLENS, tool circumvents long-standing issues distortion input. Specifically, identify primary cause during normalization effectively address it by uniformizing cell vector lengths with L2 normalization. utilize random matrix theory-based filtering robustness test enable data-driven threshold for dimensions. Our method outperforms 11 widely used performs particularly well challenging datasets high sparsity variability. To facilitate use provide user-friendly package automates accurate detection without time-consuming tuning.
Language: Английский
Citations
6bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2023, Volume and Issue: unknown
Published: July 25, 2023
Double dipping is a well-known pitfall in single-cell and spatial transcriptomics data analysis: after clustering algorithm finds clusters as putative cell types or domains, statistical tests are applied to the same identify differentially expressed (DE) genes potential cell-type spatial-domain markers. Because that contribute inherently likely be identified DE genes, double can result false-positive markers, especially when spurious, leading ambiguously defined domains. To address this challenge, we propose ClusterDE, method designed post-clustering reliable markers of while controlling false discovery rate (FDR) regardless quality. The core ClusterDE involves generating synthetic null an silico negative control contains only one type domain, allowing for detection removal spurious discoveries caused by dipping. We demonstrate controls FDR identifies canonical top distinguishing them from housekeeping genes. ClusterDE's ability discover absence such used determine whether two ambiguous should merged. Additionally, compatible with state-of-the-art analysis pipelines like Seurat Scanpy.
Language: Английский
Citations
13bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown
Published: Aug. 26, 2024
Abstract Background Selecting highly variable features is a crucial step in most analysis pipelines of single-cell RNA-sequencing (scRNA-seq) data. Despite numerous methods proposed recent years, systematic understanding the best solution still lacking. Results Here, we systematically evaluate 47 gene (HVG) selection methods, consisting 21 baseline developed based on different data transformations and mean-variance adjustment techniques 26 hybrid mixtures methods. Across 19 diverse benchmark datasets, 18 objective evaluation criteria per method, 5,358 settings, observe that no single method consistently outperforms others across all datasets criteria. However, as group robustly outperform individual Based these findings, new HVG approach, mixture (mixHVG), incorporates top-ranked from multiple better to selection. An open source R package mixhvg enable convenient use mixHVG its integration into users’ pipelines. Conclusion Our study not only provides comparison existing leading solution, but also creates pipeline resource for evaluating future.
Language: Английский
Citations
4Nature Cell Biology, Journal Year: 2025, Volume and Issue: unknown
Published: Feb. 19, 2025
Language: Английский
Citations
0