A review of deep learning applications in human genomics using next-generation sequencing data DOI Creative Commons
W. Alharbi, Mamoon Rashid

Human Genomics, Journal Year: 2022, Volume and Issue: 16(1)

Published: July 25, 2022

Genomics is advancing towards data-driven science. Through the advent of high-throughput data generating technologies in human genomics, we are overwhelmed with heap genomic data. To extract knowledge and pattern out this data, artificial intelligence especially deep learning methods has been instrumental. In current review, address development application methods/models different subarea genomics. We assessed over- under-charted area genomics by techniques. Deep algorithms underlying tools have discussed briefly later part review. Finally, about late genomic. Conclusively, review timely for biotechnology or scientists order to guide them why, when how use analyse

Language: Английский

Cross-species regulatory sequence activity prediction DOI Creative Commons
David R. Kelley

PLoS Computational Biology, Journal Year: 2020, Volume and Issue: 16(7), P. e1008050 - e1008050

Published: July 20, 2020

Machine learning algorithms trained to predict the regulatory activity of nucleic acid sequences have revealed principles gene regulation and guided genetic variation analysis. While human genome has been extensively annotated studied, model organisms less explored. Model organism genomes offer both additional training unique annotations describing tissue cell states unavailable in humans. Here, we develop a strategy train deep convolutional neural networks simultaneously on multiple apply it learn sequence predictors for large compendia mouse data. Training improves expression prediction accuracy held out variant sequences. We further demonstrate novel powerful approach models analyze variants associated with molecular phenotypes disease. Together these techniques unleash thousands non-human epigenetic transcriptional profiles toward more effective investigation how affects

Language: Английский

Citations

179

DeepC: predicting 3D genome folding using megabase-scale transfer learning DOI
Ron Schweßinger, Matthew Gosden, Damien J. Downes

et al.

Nature Methods, Journal Year: 2020, Volume and Issue: 17(11), P. 1118 - 1124

Published: Oct. 12, 2020

Language: Английский

Citations

170

Improving the trans-ancestry portability of polygenic risk scores by prioritizing variants in predicted cell-type-specific regulatory elements DOI
Tiffany Amariuta, Kazuyoshi Ishigaki, Hiroki Sugishita

et al.

Nature Genetics, Journal Year: 2020, Volume and Issue: 52(12), P. 1346 - 1354

Published: Nov. 30, 2020

Language: Английский

Citations

158

A sequence-based global map of regulatory activity for deciphering human genetics DOI Creative Commons
Kathleen Chen, Aaron K. Wong, Olga G. Troyanskaya

et al.

Nature Genetics, Journal Year: 2022, Volume and Issue: 54(7), P. 940 - 949

Published: July 1, 2022

Abstract Epigenomic profiling has enabled large-scale identification of regulatory elements, yet we still lack a systematic mapping from any sequence or variant to activities. We address this challenge with Sei, framework for integrating human genetics data information discover the basis traits and diseases. Sei learns vocabulary activities, called classes, using deep learning model that predicts 21,907 chromatin profiles across >1,300 cell lines tissues. Sequence classes provide global classification quantification effects based on diverse such as type-specific enhancer functions. These predictions are supported by tissue-specific expression, expression quantitative trait loci evolutionary constraint data. Furthermore, enable characterization tissue-specific, architecture complex generate mechanistic hypotheses individual pathogenic mutations. resource elucidate health disease.

Language: Английский

Citations

157

Multimodal single-cell chromatin analysis with Signac DOI Creative Commons
Tim Stuart, Avi Srivastava, Caleb A. Lareau

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2020, Volume and Issue: unknown

Published: Nov. 10, 2020

The recent development of experimental methods for measuring chromatin state at single-cell resolution has created a need computational tools capable analyzing these datasets. Here we developed Signac, framework the analysis data, as an extension Seurat R toolkit multimodal analysis. Signac enables end-to-end including peak calling, quantification, quality control, dimension reduction, clustering, integration with gene expression datasets, DNA motif analysis, and interactive visualization. Furthermore, facilitates datasets that co-assay accessibility expression, protein abundance, mitochondrial genotype. We demonstrate scaling to containing over 700,000 cells. Availability Installation instructions, documentation, tutorials are available at: https://satijalab.org/signac/

Language: Английский

Citations

155

Explaining Deep Neural Networks and Beyond: A Review of Methods and Applications DOI Creative Commons
Wojciech Samek, Grégoire Montavon, Sebastian Lapuschkin

et al.

Proceedings of the IEEE, Journal Year: 2021, Volume and Issue: 109(3), P. 247 - 278

Published: March 1, 2021

With the broader and highly successful usage of machine learning in industry sciences, there has been a growing demand for Explainable AI. Interpretability explanation methods gaining better understanding about problem solving abilities strategies nonlinear Machine Learning, particular, deep neural networks, are therefore receiving increased attention. In this work we aim to (1) provide timely overview active emerging field, with focus on 'post-hoc' explanations, explain its theoretical foundations, (2) put interpretability algorithms test both from theory comparative evaluation perspective using extensive simulations, (3) outline best practice aspects i.e. how include interpretation into standard (4) demonstrate explainable AI representative selection application scenarios. Finally, discuss challenges possible future directions exciting foundational field learning.

Language: Английский

Citations

153

Comprehensive evaluation of deep learning architectures for prediction of DNA/RNA sequence binding specificities DOI

Ameni Trabelsi,

Mohamed Chaabane, Asa Ben‐Hur

et al.

Bioinformatics, Journal Year: 2019, Volume and Issue: 35(14), P. i269 - i277

Published: May 14, 2019

Deep learning architectures have recently demonstrated their power in predicting DNA- and RNA-binding specificity. Existing methods fall into three classes: Some are based on convolutional neural networks (CNNs), others use recurrent (RNNs) rely hybrid combining CNNs RNNs. However, existing studies the relative merit of various remains unclear.In this study we present a systematic exploration deep for For purpose, deepRAM, an end-to-end tool that provides implementation wide selection architectures; its fully automatic model procedure allows us to perform fair unbiased comparison architectures. We find deeper more complex provide clear advantage with sufficient training data, CNN/RNN outperform other terms accuracy. Our work guidelines can assist practitioner choosing appropriate network architecture, insight difference between models learned by networks. In particular, although improve accuracy, comes at expense loss interpretability features model.The source code deepRAM is available https://github.com/MedChaabane/deepRAM.Supplementary data Bioinformatics online.

Language: Английский

Citations

151

Sequence-based modeling of three-dimensional genome architecture from kilobase to chromosome scale DOI
Jian Zhou

Nature Genetics, Journal Year: 2022, Volume and Issue: 54(5), P. 725 - 734

Published: May 1, 2022

Language: Английский

Citations

127

Predicting RNA splicing from DNA sequence using Pangolin DOI Creative Commons

Tony Zeng,

Yang Li

Genome biology, Journal Year: 2022, Volume and Issue: 23(1)

Published: April 21, 2022

Abstract Recent progress in deep learning has greatly improved the prediction of RNA splicing from DNA sequence. Here, we present Pangolin, a model to predict splice site strength multiple tissues. Pangolin outperforms state-of-the-art methods for predicting on variety tasks. improves impact genetic variants splicing, including common, rare, and lineage-specific variation. In addition, identifies loss-of-function mutations with high accuracy recall, particularly that are not missense or nonsense, demonstrating remarkable potential identifying pathogenic variants.

Language: Английский

Citations

119

Machine learning meets omics: applications and perspectives DOI

Rufeng Li,

Lixin Li, Yungang Xu

et al.

Briefings in Bioinformatics, Journal Year: 2021, Volume and Issue: 23(1)

Published: Oct. 8, 2021

Abstract The innovation of biotechnologies has allowed the accumulation omics data at an alarming rate, thus introducing era ‘big data’. Extracting inherent valuable knowledge from various remains a daunting problem in bioinformatics. Better solutions often need some kind more innovative methods for efficient handlings and effective results. Recent advancements integrated analysis computational modeling multi-omics helped address such needs increasingly harmonious manner. development application machine learning have largely advanced our insights into biology biomedicine greatly promoted therapeutic strategies, especially precision medicine. Here, we propose comprehensive survey discussion on what happened, is happening will happen when meets omics. Specifically, describe how artificial intelligence can be applied to studies review recent interface between ever-widest range including genomics, transcriptomics, proteomics, metabolomics, radiomics, as well those single-cell resolution. We also discuss provide synthesis ideas, new insights, current challenges perspectives

Language: Английский

Citations

117