Deep learning modeling of rare noncoding genetic variants in human motor neurons definesCCDC146as a therapeutic target for ALS DOI Creative Commons
Sai Zhang, Tobias Moll,

Jasper Rubin-Sigler

et al.

medRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: April 1, 2024

Amyotrophic lateral sclerosis (ALS) is a fatal and incurable neurodegenerative disease caused by the selective progressive death of motor neurons (MNs). Understanding genetic molecular factors influencing ALS survival crucial for management therapeutics. In this study, we introduce deep learning-powered analysis framework to link rare noncoding variants survival. Using data from human induced pluripotent stem cell (iPSC)-derived MNs, method prioritizes functional using learning, links cis-regulatory elements (CREs) target genes epigenomics data, integrates these through gene-level burden tests identify survival-modifying variants, CREs, genes. We apply approach analyze 6,715 genomes, pinpoint four novel associated with survival, including chr7:76,009,472:C>T linked CCDC146 . CRISPR-Cas9 editing variant increases expression in iPSC-derived MNs exacerbates ALS-specific phenotypes, TDP-43 mislocalization. Suppressing an antisense oligonucleotide (ASO), showing no toxicity, completely rescues ALS-associated defects derived sporadic patients carriers G4C2-repeat expansion within C9ORF72 ASO targeting may be broadly effective therapeutic ALS. Our provides generic powerful studying genetics complex diseases.

Language: Английский

Applications of deep learning in understanding gene regulation DOI Creative Commons
Zhongxiao Li,

Elva Gao,

Juexiao Zhou

et al.

Cell Reports Methods, Journal Year: 2023, Volume and Issue: 3(1), P. 100384 - 100384

Published: Jan. 1, 2023

Gene regulation is a central topic in cell biology. Advances omics technologies and the accumulation of data have provided better opportunities for gene studies than ever before. For this reason deep learning, as data-driven predictive modeling approach, has been successfully applied to field during past decade. In article, we aim give brief yet comprehensive overview representative deep-learning methods regulation. Specifically, discuss compare design principles datasets used by each method, creating reference researchers who wish replicate or improve existing methods. We also common problems approaches prospectively introduce emerging paradigms that will potentially alleviate them. hope article provide rich up-to-date resource shed light on future research directions area.

Language: Английский

Citations

29

Single-cell omics: experimental workflow, data analyses and applications DOI
Fengying Sun, Haoyan Li, Dongqing Sun

et al.

Science China Life Sciences, Journal Year: 2024, Volume and Issue: unknown

Published: July 23, 2024

Language: Английский

Citations

11

Benchmarking algorithms for single-cell multi-omics prediction and integration DOI
Yinlei Hu,

Siyuan Wan,

Yuanhanyu Luo

et al.

Nature Methods, Journal Year: 2024, Volume and Issue: unknown

Published: Sept. 25, 2024

Language: Английский

Citations

9

Topological identification and interpretation for single-cell epigenetic regulation elucidation in multi-tasks using scAGDE DOI Creative Commons

Guoqian Hao,

Fan Yi, Zhuohan Yu

et al.

Nature Communications, Journal Year: 2025, Volume and Issue: 16(1)

Published: Feb. 16, 2025

Single-cell ATAC-seq technology advances our understanding of single-cell heterogeneity in gene regulation by enabling exploration epigenetic landscapes and regulatory elements. However, low sequencing depth per cell leads to data sparsity high dimensionality, limiting the characterization Here, we develop scAGDE, a chromatin accessibility model-based deep graph representation learning method that simultaneously learns clustering through explicit modeling generation. Our evaluations demonstrated scAGDE outperforms existing methods segregation, key marker identification, visualization across diverse datasets while mitigating dropout events unveiling hidden chromatin-accessible regions. We find preferentially identifies enhancer-like regions elucidates complex landscapes, pinpointing putative enhancers regulating constitutive expression CTLA4 transcriptional dynamics CD8A immune cells. When applied human brain tissue, successfully annotated cis-regulatory element-specified types revealed functional diversity mechanisms glutamatergic neurons. reveals at individual levels but struggles with sparsity. authors introduce framework improves embedding clustering, outperforming uncovering mechanisms.

Language: Английский

Citations

1

Benchmarking of deep neural networks for predicting personal gene expression from DNA sequence highlights shortcomings DOI Creative Commons
Alexander Sasse, Bernard Ng, Anna Spiro

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2023, Volume and Issue: unknown

Published: March 20, 2023

Deep learning methods have recently become the state-of-the-art in a variety of regulatory genomic tasks1-6 including prediction gene expression from DNA. As such, these promise to serve as important tools interpreting full spectrum genetic variation observed personal genomes. Previous evaluation strategies assessed their predictions across regions, however, systematic benchmarking is lacking assess individuals, which would directly evaluates utility DNA interpreters. We used paired Whole Genome Sequencing and 839 individuals ROSMAP study7 evaluate ability current predict at varied loci. Our approach identifies limitation correctly direction variant effects. show that this stems insufficiently learnt sequence motif grammar, suggest new model training improve performance.

Language: Английский

Citations

21

scCASE: accurate and interpretable enhancement for single-cell chromatin accessibility sequencing data DOI Creative Commons

Songming Tang,

Xuejian Cui, Rongxiang Wang

et al.

Nature Communications, Journal Year: 2024, Volume and Issue: 15(1)

Published: Feb. 22, 2024

Single-cell chromatin accessibility sequencing (scCAS) has emerged as a valuable tool for interrogating and elucidating epigenomic heterogeneity gene regulation. However, scCAS data inherently suffers from limitations such high sparsity dimensionality, which pose significant challenges downstream analyses. Although several methods are proposed to enhance data, there still that hinder the effectiveness of these methods. Here, we propose scCASE, enhancement method based on non-negative matrix factorization incorporates an iteratively updating cell-to-cell similarity matrix. Through comprehensive experiments multiple datasets, demonstrate advantages scCASE over existing enhancement. The interpretable cell type-specific peaks identified by can provide biological insights into subpopulations. Moreover, leverage large compendia available omics reference, further expand scCASER, enables incorporation external reference improve performance.

Language: Английский

Citations

7

Advances and applications in single-cell and spatial genomics DOI
Jingjing Wang, Fang Ye, Haoxi Chai

et al.

Science China Life Sciences, Journal Year: 2024, Volume and Issue: unknown

Published: Dec. 20, 2024

Language: Английский

Citations

7

Joint Representation Learning for Retrieval and Annotation of Genomic Interval Sets DOI Creative Commons
Erfaneh Gharavi,

Nathan J. LeRoy,

Guangtao Zheng

et al.

Bioengineering, Journal Year: 2024, Volume and Issue: 11(3), P. 263 - 263

Published: March 8, 2024

As available genomic interval data increase in scale, we require fast systems to search them. A common approach is simple string matching compare a term metadata, but this limited by incomplete or inaccurate annotations. An alternative directly through region overlap analysis, leads challenges like sparsity, high dimensionality, and computational expense. We novel methods quickly flexibly query large, messy databases. Here, develop system using representation learning. train numerical embeddings for collection of sets simultaneously with their metadata labels, capturing similarity between low-dimensional space. Using these learned co-embeddings, that solves three related information retrieval tasks embedding distance computations: retrieving user string, suggesting new labels database sets, similar set. evaluate use cases show jointly representations are promising fast, flexible, accurate retrieval.

Language: Английский

Citations

6

Deciphering cell types by integrating scATAC-seq data with genome sequences DOI
Yuansong Zeng,

Mai Luo,

Ningyuan Shangguan

et al.

Nature Computational Science, Journal Year: 2024, Volume and Issue: 4(4), P. 285 - 298

Published: April 10, 2024

Language: Английский

Citations

6

Scalable and unbiased sequence-informed embedding of single-cell ATAC-seq data with CellSpace DOI Creative Commons
Zakieh Tayyebi, Allison R. Pine, Christina Leslie

et al.

Nature Methods, Journal Year: 2024, Volume and Issue: 21(6), P. 1014 - 1022

Published: May 9, 2024

Abstract Standard scATAC sequencing (scATAC-seq) analysis pipelines represent cells as sparse numeric vectors relative to an atlas of peaks or genomic tiles and consequently ignore sequence information at accessible loci. Here we present CellSpace, efficient scalable sequence-informed embedding algorithm for scATAC-seq that learns a mapping DNA k -mers the same space, address this limitation. We show CellSpace captures meaningful latent structure in datasets, including cell subpopulations developmental hierarchies, can score transcription factor activities single based on proximity binding motifs embedded space. Importantly, implicitly mitigates batch effects arising from multiple samples, donors assays, even when individual datasets are processed different peak atlases. Thus, provides powerful tool integrating interpreting large-scale compendia.

Language: Английский

Citations

6