Biophysically Interpretable Inference of Cell Types from Multimodal Sequencing Data DOI Creative Commons
Tara Chari, Gennady Gorin, Lior Pachter

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2023, Volume and Issue: unknown

Published: Sept. 17, 2023

Multimodal, single-cell genomics technologies enable simultaneous capture of multiple facets DNA and RNA processing in the cell. This creates opportunities for transcriptome-wide, mechanistic studies cellular heterogeneous cell types, with applications ranging from inferring kinetic differences between cells, to role stochasticity driving heterogeneity. However, current methods determining types or 'clusters' present multimodal data often rely on ad hoc independent treatment modalities, assumptions ignoring inherent properties count data. To interpretable consistent cluster determination data, we meK-Means (mechanistic K-Means) which integrates modalities learns underlying, shared biophysical states through a unifying model transcription. In particular, demonstrate how can be used cells unspliced spliced mRNA modalities. By utilizing causal, physical relationships underlying these identify transcriptional kinetics across induce observed gene expression profiles, provide an alternative definition governing parameters processes.

Language: Английский

The specious art of single-cell genomics DOI Creative Commons
Tara Chari, Lior Pachter

PLoS Computational Biology, Journal Year: 2023, Volume and Issue: 19(8), P. e1011288 - e1011288

Published: Aug. 17, 2023

Dimensionality reduction is standard practice for filtering noise and identifying relevant features in large-scale data analyses. In biology, single-cell genomics studies typically begin with to 2 or 3 dimensions produce "all-in-one" visuals of the that are amenable human eye, these subsequently used qualitative quantitative exploratory analysis. However, there little theoretical support this practice, we show extreme dimension reduction, from hundreds thousands 2, inevitably induces significant distortion high-dimensional datasets. We therefore examine practical implications low-dimensional embedding find extensive distortions inconsistent practices make such embeddings counter-productive exploratory, biological lieu this, discuss alternative approaches conducting targeted feature exploration enable hypothesis-driven discovery.

Language: Английский

Citations

182

RNA velocity unraveled DOI Creative Commons
Gennady Gorin, Meichen Fang, Tara Chari

et al.

PLoS Computational Biology, Journal Year: 2022, Volume and Issue: 18(9), P. e1010492 - e1010492

Published: Sept. 12, 2022

We perform a thorough analysis of RNA velocity methods, with view towards understanding the suitability various assumptions underlying popular implementations. In addition to providing self-contained exposition mathematics, we undertake simulations and controlled experiments on biological datasets assess workflow sensitivity parameter choices biology. Finally, argue for more rigorous approach velocity, present framework Markovian that points directions improvement mitigation current problems.

Language: Английский

Citations

117

Interpretable and tractable models of transcriptional noise for the rational design of single-molecule quantification experiments DOI Creative Commons
Gennady Gorin, John J. Vastola, Meichen Fang

et al.

Nature Communications, Journal Year: 2022, Volume and Issue: 13(1)

Published: Dec. 9, 2022

The question of how cell-to-cell differences in transcription rate affect RNA count distributions is fundamental for understanding biological processes underlying transcription. Answering this requires quantitative models that are both interpretable (describing concrete biophysical phenomena) and tractable (amenable to mathematical analysis). This enables the identification experiments which best discriminate between competing hypotheses. As a proof principle, we introduce simple but flexible class involving continuous stochastic driving discrete splicing process, compare contrast two biologically plausible hypotheses about variation. One assumes variation due DNA experiencing mechanical strain, while other it regulator number fluctuations. We framework numerically analytically studying such models, apply Bayesian model selection identify candidate genes show signatures each single-cell transcriptomic data from mouse glutamatergic neurons.

Language: Английский

Citations

41

Biophysical modeling with variational autoencoders for bimodal, single-cell RNA sequencing data DOI Creative Commons
Maria Carilli, Gennady Gorin,

Yongin Choi

et al.

Nature Methods, Journal Year: 2024, Volume and Issue: 21(8), P. 1466 - 1469

Published: July 25, 2024

Language: Английский

Citations

16

Modeling bursty transcription and splicing with the chemical master equation DOI Creative Commons
Gennady Gorin, Lior Pachter

Biophysical Journal, Journal Year: 2022, Volume and Issue: 121(6), P. 1056 - 1069

Published: Feb. 7, 2022

Language: Английский

Citations

34

Signal and noise in metabarcoding data DOI Creative Commons
Zachary Gold,

Andrew O. Shelton,

Helen R. Casendino

et al.

PLoS ONE, Journal Year: 2023, Volume and Issue: 18(5), P. e0285674 - e0285674

Published: May 11, 2023

Metabarcoding is a powerful molecular tool for simultaneously surveying hundreds to thousands of species from single sample, underpinning microbiome and environmental DNA (eDNA) methods. Deriving quantitative estimates underlying biological communities metabarcoding critical enhancing the utility such approaches health conservation. Recent work has demonstrated that correcting amplification biases in genetic data can yield template concentrations. However, major source uncertainty stems non-detections across technical PCR replicates where one replicate fails detect observed other replicates. Such are special case variability among data. While many sampling processes underlie variation data, understanding causes an important step distinguishing signal noise studies. Here, we use both simulated empirical 1) suggest how may arise 2) outline steps recognize uninformative practice, 3) identify conditions under which amplicon sequence reliably signals. We show with simulations that, given species, rate function concentration species-specific efficiency. Consequently, conclude datasets strongly affected by (1) deterministic during (2) stochastic amplicons sequencing-both model-but also (3) rare molecules prior PCR, remains frontier metabarcoding. Our results highlight importance estimating efficiencies critically evaluating patterns non-detection better distinguish inherent detections targets.

Language: Английский

Citations

19

Studying stochastic systems biology of the cell with single-cell genomics data DOI Creative Commons
Gennady Gorin, John J. Vastola, Lior Pachter

et al.

Cell Systems, Journal Year: 2023, Volume and Issue: 14(10), P. 822 - 843.e22

Published: Sept. 25, 2023

Language: Английский

Citations

19

kallisto, bustools and kb-python for quantifying bulk, single-cell and single-nucleus RNA-seq DOI
Delaney K. Sullivan, Kyung Hoi Min, Kristján Eldjárn Hjörleifsson

et al.

Nature Protocols, Journal Year: 2024, Volume and Issue: unknown

Published: Oct. 10, 2024

Language: Английский

Citations

9

Spectral neural approximations for models of transcriptional dynamics DOI
Gennady Gorin, Maria Carilli, Tara Chari

et al.

Biophysical Journal, Journal Year: 2024, Volume and Issue: 123(17), P. 2892 - 2901

Published: May 6, 2024

Language: Английский

Citations

8

kallisto, bustools, and kb-python for quantifying bulk, single-cell, and single-nucleus RNA-seq DOI Creative Commons
Delaney K. Sullivan, Kyung Hoi Min, Kristján Eldjárn Hjörleifsson

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2023, Volume and Issue: unknown

Published: Nov. 22, 2023

Abstract The term “RNA-seq” refers to a collection of assays based on sequencing experiments that involve quantifying RNA species from bulk tissue, single cells, or nuclei. kallisto, bustools, and kb-python programs are free, open-source software tools for performing this analysis together can produce gene expression quantification raw reads. quantifications be individualized multiple samples, both. Additionally, these allow values classified as originating nascent mature species, making workflow amenable both cell-based nucleus-based assays. This protocol describes in detail how use kallisto bustools conjunction with wrapper, kb-python, preprocess RNA-seq data.

Language: Английский

Citations

16