Effective gene expression prediction from sequence by integrating long-range interactions DOI Open Access
Žiga Avsec, Vikram Agarwal,

Daniel Visentin

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2021, Volume and Issue: unknown

Published: April 8, 2021

Abstract The next phase of genome biology research requires understanding how DNA sequence encodes phenotypes, from the molecular to organismal levels. How noncoding determines gene expression in different cell types is a major unsolved problem, and critical downstream applications human genetics depend on improved solutions. Here, we report substantially prediction accuracy through use new deep learning architecture called Enformer that able integrate long-range interactions (up 100 kb away) genome. This improvement yielded more accurate variant effect predictions for both natural genetic variants saturation mutagenesis measured by massively parallel reporter assays. Notably, outperformed best team assessment interpretation (CAGI5) challenge with no additional training. Furthermore, learned predict promoter-enhancer directly competitively methods take direct experimental data as input. We expect these advances will enable effective fine-mapping growing disease associations cell-type-specific regulatory mechanisms provide framework interpret cis -regulatory evolution. To foster applications, have made pre-trained model openly available, pre-computed all common 1000 Genomes dataset. One-sentence summary Improved candidate enhancer prioritization driven extended interaction modelling.

Language: Английский

Effective gene expression prediction from sequence by integrating long-range interactions DOI Creative Commons
Žiga Avsec, Vikram Agarwal,

Daniel Visentin

et al.

Nature Methods, Journal Year: 2021, Volume and Issue: 18(10), P. 1196 - 1203

Published: Oct. 1, 2021

Abstract How noncoding DNA determines gene expression in different cell types is a major unsolved problem, and critical downstream applications human genetics depend on improved solutions. Here, we report substantially prediction accuracy from sequences through the use of deep learning architecture, called Enformer, that able to integrate information long-range interactions (up 100 kb away) genome. This improvement yielded more accurate variant effect predictions for both natural genetic variants saturation mutagenesis measured by massively parallel reporter assays. Furthermore, Enformer learned predict enhancer–promoter directly sequence competitively with methods take direct experimental data as input. We expect these advances will enable effective fine-mapping disease associations provide framework interpret cis -regulatory evolution.

Language: Английский

Citations

717

Artificial intelligence in clinical and genomic diagnostics DOI Creative Commons
Raquel Dias, Ali Torkamani

Genome Medicine, Journal Year: 2019, Volume and Issue: 11(1)

Published: Nov. 19, 2019

Abstract Artificial intelligence (AI) is the development of computer systems that are able to perform tasks normally require human intelligence. Advances in AI software and hardware, especially deep learning algorithms graphics processing units (GPUs) power their training, have led a recent rapidly increasing interest medical applications. In clinical diagnostics, AI-based vision approaches poised revolutionize image-based while other subtypes begun show similar promise various diagnostic modalities. some areas, such as genomics, specific type algorithm known used process large complex genomic datasets. this review, we first summarize main classes problems well suited solve describe benefit from these solutions. Next, focus on emerging methods for including variant calling, genome annotation classification, phenotype-to-genotype correspondence. Finally, end with discussion future potential individualized medicine applications, risk prediction common diseases, challenges, limitations, biases must be carefully addressed successful deployment particularly those utilizing genetics genomics data.

Language: Английский

Citations

340

Chromatin and gene-regulatory dynamics of the developing human cerebral cortex at single-cell resolution DOI Creative Commons
Alexandro E. Trevino, Fabian Müller, Jimena Andersen

et al.

Cell, Journal Year: 2021, Volume and Issue: 184(19), P. 5053 - 5069.e23

Published: Aug. 13, 2021

Language: Английский

Citations

330

A survey on deep learning in medicine: Why, how and when? DOI
Francesco Piccialli,

Vittorio Di Somma,

Fabio Giampaolo

et al.

Information Fusion, Journal Year: 2020, Volume and Issue: 66, P. 111 - 137

Published: Sept. 15, 2020

Language: Английский

Citations

295

Chromatin accessibility dynamics in a model of human forebrain development DOI
Alexandro E. Trevino, Nasa Sinnott-Armstrong, Jimena Andersen

et al.

Science, Journal Year: 2020, Volume and Issue: 367(6476)

Published: Jan. 24, 2020

Organoids recapitulate brain development Gene expression changes and their control by accessible chromatin in the human during is of great interest but limited accessibility. Trevino et al. avoided this problem developing three-dimensional organoid models forebrain examining accessibility gene at single-cell level. From analysis, they matched developmental profiles between fetal samples, identified transcription factor binding profiles, predicted how factors are linked to cortical development. The researchers were able correlate neurodevelopmental disease risk loci genes with specific cell types Science , issue p. eaay1645

Language: Английский

Citations

222

iLearnPlus:a comprehensive and automated machine-learning platform for nucleic acid and protein sequence analysis, prediction and visualization DOI Creative Commons
Zhen Chen, Pei Zhao, Chen Li

et al.

Nucleic Acids Research, Journal Year: 2021, Volume and Issue: 49(10), P. e60 - e60

Published: Feb. 25, 2021

Abstract Sequence-based analysis and prediction are fundamental bioinformatic tasks that facilitate understanding of the sequence(-structure)-function paradigm for DNAs, RNAs proteins. Rapid accumulation sequences requires equally pervasive development new predictive models, which depends on availability effective tools support these efforts. We introduce iLearnPlus, first machine-learning platform with graphical- web-based interfaces construction pipelines predictions using nucleic acid protein sequences. iLearnPlus provides a comprehensive set algorithms automates sequence-based feature extraction analysis, deployment assessment performance, statistical data visualization; all without programming. includes wide range sets encode information from input over twenty cover several deep-learning approaches, outnumbering current solutions by margin. Our solution caters to experienced bioinformaticians, given broad options, biologists no programming background, point-and-click interface easy-to-follow design process. showcase two case studies concerning long noncoding (lncRNAs) RNA transcripts crotonylation sites in chains. is an open-source available at https://github.com/Superzchen/iLearnPlus/ webserver http://ilearnplus.erc.monash.edu/.

Language: Английский

Citations

199

Computational network biology: Data, models, and applications DOI
Chuang Liu, Yifang Ma, Jing Zhao

et al.

Physics Reports, Journal Year: 2019, Volume and Issue: 846, P. 1 - 66

Published: Dec. 30, 2019

Language: Английский

Citations

182

Cross-species regulatory sequence activity prediction DOI Creative Commons
David R. Kelley

PLoS Computational Biology, Journal Year: 2020, Volume and Issue: 16(7), P. e1008050 - e1008050

Published: July 20, 2020

Machine learning algorithms trained to predict the regulatory activity of nucleic acid sequences have revealed principles gene regulation and guided genetic variation analysis. While human genome has been extensively annotated studied, model organisms less explored. Model organism genomes offer both additional training unique annotations describing tissue cell states unavailable in humans. Here, we develop a strategy train deep convolutional neural networks simultaneously on multiple apply it learn sequence predictors for large compendia mouse data. Training improves expression prediction accuracy held out variant sequences. We further demonstrate novel powerful approach models analyze variants associated with molecular phenotypes disease. Together these techniques unleash thousands non-human epigenetic transcriptional profiles toward more effective investigation how affects

Language: Английский

Citations

179

Patterns of de novo tandem repeat mutations and their role in autism DOI
Ileena Mitra, Bonnie Huang, Nima Mousavi

et al.

Nature, Journal Year: 2021, Volume and Issue: 589(7841), P. 246 - 250

Published: Jan. 13, 2021

Language: Английский

Citations

152

Deep learning for plant genomics and crop improvement DOI Creative Commons
Hai Wang, Emre Çimen, Nisha Singh

et al.

Current Opinion in Plant Biology, Journal Year: 2020, Volume and Issue: 54, P. 34 - 41

Published: Jan. 24, 2020

Our era has witnessed tremendous advances in plant genomics, characterized by an explosion of high-throughput techniques to identify multi-dimensional genome-wide molecular phenotypes at low costs. More importantly, genomics is not merely acquiring phenotypes, but also leveraging powerful data mining tools predict and explain them. In recent years, deep learning been found extremely effective these tasks. This review highlights two prominent questions the intersection learning: 1) how can flow information from genomic DNA sequences be modeled; 2) we functional variants natural populations using models? Additionally, discuss possibility unleashing power synthetic biology create novel elements with desirable functions. Taken together, propose a central role future research crop genetic improvement.

Language: Английский

Citations

150