Modeling and predicting single-cell multi-gene perturbation responses with scLAMBDA DOI Creative Commons
Gefei Wang, Tian-Yu Liu, Jia Zhao

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: Dec. 8, 2024

Abstract Understanding cellular responses to genetic perturbations is essential for understanding gene regulation and phenotype formation. While high-throughput single-cell RNA-sequencing has facilitated detailed profiling of heterogeneous transcriptional at the level, there remains a pressing need computational models that can decode mechanisms driving these accurately predict outcomes prioritize target genes experimental design. Here, we present scLAMBDA, deep generative learning framework designed model perturbations, including single-gene combinatorial multi-gene perturbations. By leveraging embeddings derived from large language models, scLAMBDA effectively integrates prior biological knowledge disentangles basal cell states perturbation-specific salient representations. Through comprehensive evaluations on multiple CRISPR Perturb-seq datasets, consistently outperformed state-of-the-art methods in predicting perturbation outcomes, achieving higher prediction accuracy. Notably, demonstrated robust generalization unseen its predictions captured both average expression changes heterogeneity responses. Furthermore, enable diverse downstream analyses, identification differentially expressed exploration interactions, demonstrating utility versatility.

Language: Английский

Simple controls exceed best deep learning algorithms and reveal foundation model effectiveness for predicting genetic perturbations DOI Creative Commons
Daniel R. Wong, Abby S. Hill, Rocco Moccia

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2025, Volume and Issue: unknown

Published: Jan. 8, 2025

Abstract Modeling genetic perturbations and their effect on the transcriptome is a key area of pharmaceutical research. Due to complexity transcriptome, there has been much excitement development in deep learning (DL) because its ability model complex relationships. In particular, transformer-based foundation paradigm emerged as gold-standard predicting post-perturbation responses. However, understanding these increasingly models evaluating practical utility lacking, along with simple but appropriate benchmarks compare predictive methods. Here, we present baseline method that outperforms both state art (SOTA) DL other proposed simpler neural architectures, setting necessary benchmark evaluate field prediction. We also elucidate for task prediction via generalizable fine-tuning experiments can be translated different applications tasks interest. Furthermore, provide corrected version popular dataset used benchmarking perturbation models. Our hope this work will properly contextualize further space control procedures.

Language: Английский

Citations

1

Causal models and prediction in cell line perturbation experiments DOI Creative Commons
James P. Long, Yumeng Yang, Shohei Shimizu

et al.

BMC Bioinformatics, Journal Year: 2025, Volume and Issue: 26(1)

Published: Jan. 7, 2025

Abstract In cell line perturbation experiments, a collection of cells is perturbed with external agents and responses such as protein expression measured. Due to cost constraints, only small fraction all possible perturbations can be tested in vitro . This has led the development computational models that predict cellular silico A central challenge for these effect new, previously untested were not used training data. Here we propose causal structural equations modeling how cells. From this model, derive two estimators predicting responses: Linear Regression (LR) estimator structure learning term Causal Structure (CSR). The CSR requires more assumptions than LR, but effects drugs applied Next present Cellbox, recently proposed system ordinary differential (ODEs) based model obtained best prediction performance on Melanoma data set (Yuan et al. Cell Syst 12:128–140, 2021). We analytic results show close connection between providing new interpretation Cellbox model. compare LR CSR/Cellbox simulations, highlighting strengths weaknesses approaches. Finally benchmark set. find comparable or slightly better Cellbox.

Language: Английский

Citations

0

AUC-PR is a More Informative Metric for Assessing the Biological Relevance of In Silico Cellular Perturbation Prediction Models DOI Creative Commons
Hongxu Zhu, Amir Asiaee,

Leila Azinfar

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2025, Volume and Issue: unknown

Published: March 11, 2025

Abstract In silico perturbation models, computational methods which can predict cellular responses to perturbations, present an opportunity reduce the need for costly and time-intensive in vitro experiments. Many recently proposed models high-dimensional responses, such as gene or protein expression perturbations knockout drugs. However, evaluating performance has largely relied on metrics R 2 , assess overall prediction accuracy but fail capture biologically significant outcomes like identification of differentially expressed genes. this study, we a novel evaluation framework that introduces AUC-PR metric precision recall DE predictions. By applying both single-cell pseudo-bulked datasets, systematically benchmark simple advanced models. Our results highlight discrepancy between AUC-PR, with achieving high values struggling identify Differentially genes accurately, reflected their low values. This finding underscores limitations traditional importance relevant assessments. provides more comprehensive understanding model capabilities, advancing application approaches research.

Language: Английский

Citations

0

New horizons at the interface of artificial intelligence and translational cancer research DOI
Josephine Yates, Eliezer M. Van Allen

Cancer Cell, Journal Year: 2025, Volume and Issue: 43(4), P. 708 - 727

Published: April 1, 2025

Language: Английский

Citations

0

GenePert: Leveraging GenePT Embeddings for Gene Perturbation Prediction DOI Creative Commons
Yiqun T. Chen, James Zou

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: Oct. 29, 2024

Abstract Predicting how perturbation of a target gene affects the expression other genes is critical component understanding cell biology. This challenging prediction problem as model must capture complex gene-gene relationships and output high-dimensional sparse. To address this challenge, we present GenePert, simple approach that leverages GenePT embeddings, which are derived using ChatGPT from text descriptions individual genes, to predict changes due perturbations via regularized regression models. Benchmarked on eight CRISPR screen datasets across multiple types five different pretrained embedding models, GenePert consistently outperforms all state-of-the-art models measured in both Pearson correlation mean squared error metrics. Even with limited training data, our generalizes effectively, offering scalable solution for predicting outcomes. These findings underscore power informative embeddings outcomes unseen genetic experiments silico . available at https://github.com/zou-group/GenePert

Language: Английский

Citations

2

A systematic comparison of computational methods for expression forecasting DOI Open Access
Eric Kernfeld, Yunxiao Yang, Joshua S. Weinstock

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2023, Volume and Issue: unknown

Published: July 31, 2023

Abstract Expression forecasting methods use machine learning models to predict how a cell will alter its transcriptome upon perturbation. Such are enticing because they promise answer pressing questions in fields ranging from developmental genetics fate engineering and fast, cheap, accessible complement the corresponding experiments. However, absolute relative accuracy of these is poorly characterized, limiting their informed use, improvement, interpretation predictions. To address issues, we created benchmarking platform that combines panel 11 large-scale perturbation datasets with an expression software engine encompasses or interfaces wide variety methods. We used our systematically assess methods, parameters, sources auxiliary data, finding performance strongly depends on choice metric, especially for simple metrics like mean squared error, it uncommon out-perform baselines. Our serve as resource improve identify contexts which can succeed.

Language: Английский

Citations

5

A cross-species foundation model for single cells DOI Creative Commons
Korbinian Traeuble, Matthias Heinig

Cell Research, Journal Year: 2024, Volume and Issue: unknown

Published: Oct. 31, 2024

Language: Английский

Citations

0

Integrative Computational Framework, Dyscovr, Links Mutated Driver Genes to Expression Dysregulation Across 19 Cancer Types DOI Creative Commons
Sara Geraghty, Jacob A. Boyer,

Mahya Fazel-Zarandi

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: Nov. 21, 2024

Though somatic mutations play a critical role in driving cancer initiation and progression, the systems-level functional impacts of these mutations-particularly, how they alter expression across genome give rise to hallmarks-are not yet well-understood, even for well-studied driver genes. To address this, we designed an integrative machine learning model, Dyscovr, that leverages mutation, gene expression, copy number alteration (CNA), methylation, clinical data uncover putative relationships between nonsynonymous key genes transcriptional changes genome. We applied Dyscovr pan-cancer within 19 individual types, finding both broadly relevant type-specific links targets, including subset further identify as exhibiting negative genetic relationships. Our work newly implicates-and validates cell lines-

Language: Английский

Citations

0

Modeling and predicting single-cell multi-gene perturbation responses with scLAMBDA DOI Creative Commons
Gefei Wang, Tian-Yu Liu, Jia Zhao

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: Dec. 8, 2024

Abstract Understanding cellular responses to genetic perturbations is essential for understanding gene regulation and phenotype formation. While high-throughput single-cell RNA-sequencing has facilitated detailed profiling of heterogeneous transcriptional at the level, there remains a pressing need computational models that can decode mechanisms driving these accurately predict outcomes prioritize target genes experimental design. Here, we present scLAMBDA, deep generative learning framework designed model perturbations, including single-gene combinatorial multi-gene perturbations. By leveraging embeddings derived from large language models, scLAMBDA effectively integrates prior biological knowledge disentangles basal cell states perturbation-specific salient representations. Through comprehensive evaluations on multiple CRISPR Perturb-seq datasets, consistently outperformed state-of-the-art methods in predicting perturbation outcomes, achieving higher prediction accuracy. Notably, demonstrated robust generalization unseen its predictions captured both average expression changes heterogeneity responses. Furthermore, enable diverse downstream analyses, identification differentially expressed exploration interactions, demonstrating utility versatility.

Language: Английский

Citations

0