Accurate quantification of nascent and mature RNAs from single-cell and single-nucleus RNA-seq DOI Creative Commons
Delaney K. Sullivan, Kristján Eldjárn Hjörleifsson, Nikhila Swarna

et al.

Nucleic Acids Research, Journal Year: 2024, Volume and Issue: unknown

Published: Dec. 6, 2024

In single-cell and single-nucleus RNA sequencing (RNA-seq), the coexistence of nascent (unprocessed) mature (processed) messenger (mRNA) poses challenges in accurate read mapping interpretation count matrices. The traditional transcriptome reference, defining "region interest" bulk RNA-seq, restricts its focus to mRNA transcripts. This restriction leads two problems: reads originating outside are prone mismapping within this region, additionally, such external cannot be matched specific transcript targets. Expanding encompass both targets provides a more comprehensive framework for RNA-seq analysis. Here, we introduce concept distinguishing flanking k-mers (DFKs) improve reads. We have developed an algorithm identify DFKs, which serve as sophisticated "background filter", enhancing accuracy quantification. dual strategy expanded region interest coupled with use DFKs enhances precision quantifying molecules, well delineating ambiguous status.

Language: Английский

RNA velocity unraveled DOI Creative Commons
Gennady Gorin, Meichen Fang, Tara Chari

et al.

PLoS Computational Biology, Journal Year: 2022, Volume and Issue: 18(9), P. e1010492 - e1010492

Published: Sept. 12, 2022

We perform a thorough analysis of RNA velocity methods, with view towards understanding the suitability various assumptions underlying popular implementations. In addition to providing self-contained exposition mathematics, we undertake simulations and controlled experiments on biological datasets assess workflow sensitivity parameter choices biology. Finally, argue for more rigorous approach velocity, present framework Markovian that points directions improvement mitigation current problems.

Language: Английский

Citations

117

Interpretable and tractable models of transcriptional noise for the rational design of single-molecule quantification experiments DOI Creative Commons
Gennady Gorin, John J. Vastola, Meichen Fang

et al.

Nature Communications, Journal Year: 2022, Volume and Issue: 13(1)

Published: Dec. 9, 2022

The question of how cell-to-cell differences in transcription rate affect RNA count distributions is fundamental for understanding biological processes underlying transcription. Answering this requires quantitative models that are both interpretable (describing concrete biophysical phenomena) and tractable (amenable to mathematical analysis). This enables the identification experiments which best discriminate between competing hypotheses. As a proof principle, we introduce simple but flexible class involving continuous stochastic driving discrete splicing process, compare contrast two biologically plausible hypotheses about variation. One assumes variation due DNA experiencing mechanical strain, while other it regulator number fluctuations. We framework numerically analytically studying such models, apply Bayesian model selection identify candidate genes show signatures each single-cell transcriptomic data from mouse glutamatergic neurons.

Language: Английский

Citations

41

Modeling bursty transcription and splicing with the chemical master equation DOI Creative Commons
Gennady Gorin, Lior Pachter

Biophysical Journal, Journal Year: 2022, Volume and Issue: 121(6), P. 1056 - 1069

Published: Feb. 7, 2022

Language: Английский

Citations

34

Signal and noise in metabarcoding data DOI Creative Commons
Zachary Gold,

Andrew O. Shelton,

Helen R. Casendino

et al.

PLoS ONE, Journal Year: 2023, Volume and Issue: 18(5), P. e0285674 - e0285674

Published: May 11, 2023

Metabarcoding is a powerful molecular tool for simultaneously surveying hundreds to thousands of species from single sample, underpinning microbiome and environmental DNA (eDNA) methods. Deriving quantitative estimates underlying biological communities metabarcoding critical enhancing the utility such approaches health conservation. Recent work has demonstrated that correcting amplification biases in genetic data can yield template concentrations. However, major source uncertainty stems non-detections across technical PCR replicates where one replicate fails detect observed other replicates. Such are special case variability among data. While many sampling processes underlie variation data, understanding causes an important step distinguishing signal noise studies. Here, we use both simulated empirical 1) suggest how may arise 2) outline steps recognize uninformative practice, 3) identify conditions under which amplicon sequence reliably signals. We show with simulations that, given species, rate function concentration species-specific efficiency. Consequently, conclude datasets strongly affected by (1) deterministic during (2) stochastic amplicons sequencing-both model-but also (3) rare molecules prior PCR, remains frontier metabarcoding. Our results highlight importance estimating efficiencies critically evaluating patterns non-detection better distinguish inherent detections targets.

Language: Английский

Citations

19

Studying stochastic systems biology of the cell with single-cell genomics data DOI Creative Commons
Gennady Gorin, John J. Vastola, Lior Pachter

et al.

Cell Systems, Journal Year: 2023, Volume and Issue: 14(10), P. 822 - 843.e22

Published: Sept. 25, 2023

Language: Английский

Citations

19

Differences in molecular sampling and data processing explain variation among single-cell and single-nucleus RNA-seq experiments DOI Creative Commons
John Chamberlin, Younghee Lee, Gábor Marth

et al.

Genome Research, Journal Year: 2024, Volume and Issue: 34(2), P. 179 - 188

Published: Feb. 1, 2024

A mechanistic understanding of the biological and technical factors that impact transcript measurements is essential to designing analyzing single-cell single-nucleus RNA sequencing experiments. Nuclei contain same pre-mRNA population as cells, but they a small subset mRNAs. Nonetheless, early studies argued analysis yielded results comparable cellular samples if were included. However, typical workflows do not distinguish between mRNA when estimating gene expression, variation in their relative abundances across cell types has received limited attention. These gaps are especially important given incorporating become commonplace for both assays, despite known length bias capture. Here, we reanalyze public data sets from mouse human describe mechanisms contrasting effects sampling on expression marker selection RNA-seq. We show levels vary considerably among types, which mediates degree limits generalizability recently published normalization method intended correct this bias. As an alternative, repurpose existing post hoc length–based correction conventional RNA-seq set enrichment analysis. Finally, inclusion bioinformatic processing can impart larger effect than assay choice itself, pivotal effective reuse data. analyses advance our sources experiments provide useful guidance future studies.

Language: Английский

Citations

5

Dissection and integration of bursty transcriptional dynamics for complex systems DOI
Cheng Gao,

Suriyanarayanan Vaikuntanathan,

Samantha J. Riesenfeld

et al.

Proceedings of the National Academy of Sciences, Journal Year: 2024, Volume and Issue: 121(18)

Published: April 26, 2024

RNA velocity estimation is a potentially powerful tool to reveal the directionality of transcriptional changes in single-cell RNA-sequencing data, but it lacks accuracy, absent advanced metabolic labeling techniques. We developed an approach,

Language: Английский

Citations

5

Biophysical modeling with variational autoencoders for bimodal, single-cell RNA sequencing data DOI Creative Commons
Maria Carilli, Gennady Gorin, Yongin Choi

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2023, Volume and Issue: unknown

Published: Jan. 14, 2023

Abstract We motivate and present biVI , which combines the variational autoencoder framework of scVI with biophysically motivated, bivariate models for nascent mature RNA distributions. While previous approaches to integrate bimodal data via ignore causal relationship between measurements, biophysical processes that give rise observations. demonstrate through simulated benchmarking captures cell type structure in a low-dimensional space accurately recapitulates parameter values copy number On biological data, provides scalable route identifying mechanisms underlying gene expression. This analytical approach outlines generalizable strateg treating multimodal datasets generated by high-throughput, single-cell genomic assays.

Language: Английский

Citations

12

Benchmarking Machine Learning Models for Cell Type Annotation in Single-Cell vs Single-Nucleus RNA-Seq Data DOI Creative Commons
Giovane G. Tortelote

Research Square (Research Square), Journal Year: 2025, Volume and Issue: unknown

Published: Jan. 8, 2025

Abstract Background Machine learning (ML) models can automate cell annotation and reduce human bias. However, it remains unclear which ML model best suits the characteristics of single-cell RNA sequencing data whether a trained be applied to transcriptomes collected from nuclei rather than whole cells. This study evaluates performance eight selected for in (scRNA-seq) vs single-nucleus (snRNA-seq) datasets, focusing on their ability generalize across datasets with varying populations transcriptome isolation techniques. Results In first part, we use two publicly available scRNA-seq Peripheral Blood Mononuclear Cells (PBMC3K PBMC10K) assess each type classification within datasets. XGBoost achieved high accuracy (95.4%-95.8%), precision, F1-scores, outperforming simpler like Logistic Regression Naive Bayes. Ensemble methods Random Forest demonstrated strong precision recall. Elastic Net nearly as good generalizability achieving (94.7%-95.1%). second investigated impact techniques (single-cell vs. RNA-seq) using cardiomyocyte differentiation (GSE129096). Although excelled (accuracy F1-scores > 95%), declined notably data, suggesting inherent transcriptomic differences capacity. Notably, all struggled classifying intermediate-stage cells, highlighting challenges distinguishing transitional populations, such cardiac progenitors that retain stem markers while showing expression differentiated markers. Conclusion classify cells origination both snRNA-seq. tree-based penalized elastic regression superior diverse emphasizing importance selection robust annotation. These findings underscore need tailored computational approaches when working heterogeneous data.

Language: Английский

Citations

0

Trajectory inference from single-cell genomics data with a process time model DOI Creative Commons
Meichen Fang, Gennady Gorin, Lior Pachter

et al.

PLoS Computational Biology, Journal Year: 2025, Volume and Issue: 21(1), P. e1012752 - e1012752

Published: Jan. 21, 2025

Single-cell transcriptomics experiments provide gene expression snapshots of heterogeneous cell populations across states. These have been used to infer trajectories and dynamic information even without intensive, time-series data by ordering cells according similarity. However, while single-cell sometimes offer valuable insights into processes, current methods for are limited descriptive notions “pseudotime” that lack intrinsic physical meaning. Instead pseudotime, we propose inference “process time” via a principled modeling approach formulating inferring latent variables corresponding timing subject biophysical process. Our implementation this approach, called Chronocell, provides formulation built on state transitions. The Chronocell model is identifiable, making parameter meaningful. Furthermore, can interpolate between trajectory inference, when states lie continuum, clustering, cluster discrete By using variety datasets ranging from cluster-like continuous, show enables us assess the suitability reveals distinct cellular distributions along process time consistent with biological times. We also compare our estimates degradation rates those derived metabolic labeling datasets, thereby showcasing utility Chronocell. Nevertheless, based performance characterization simulations, find be challenging, highlighting importance dataset quality careful assessment.

Language: Английский

Citations

0