Current sequence-based models capture gene expression determinants in promoters but mostly ignore distal enhancers DOI Creative Commons
Alexander Karollus, Thomas Mauermeier, Julien Gagneur

et al.

Genome biology, Journal Year: 2023, Volume and Issue: 24(1)

Published: March 27, 2023

Abstract Background The largest sequence-based models of transcription control to date are obtained by predicting genome-wide gene regulatory assays across the human genome. This setting is fundamentally correlative, as those exposed during training solely sequence variation between genes that arose through evolution, questioning extent which capture genuine causal signals. Results Here we confront predictions state-of-the-art regulation against data from two large-scale observational studies and five deep perturbation assays. most advanced these models, Enformer, large, captures determinants promoters. However, fail effects enhancers on expression, notably in medium long distances particularly for highly expressed More generally, predicted impact distal elements expression small ability correctly integrate long-range information significantly more limited than receptive fields suggest. likely caused escalating class imbalance actual candidate distance increases. Conclusions Our results suggest have point silico study promoter regions variants can provide meaningful insights practical guidance how use them. Moreover, foresee it will require new kinds train accurately accounting elements.

Language: Английский

Ensembl 2022 DOI Creative Commons
Fiona Cunningham, James E. Allen, Jamie Allen

et al.

Nucleic Acids Research, Journal Year: 2021, Volume and Issue: 50(D1), P. D988 - D995

Published: Oct. 19, 2021

Ensembl (https://www.ensembl.org) is unique in its flexible infrastructure for access to genomic data and annotation. It has been designed efficiently deliver annotation at scale all eukaryotic life, it also provides deep comprehensive key species. Genomes representing a greater diversity of species are increasingly being sequenced. In response, we have focussed our recent efforts on expediting the new assemblies. Here, report release greatest annual number newly annotated genomes history via dedicated Rapid Release platform (http://rapid.ensembl.org). We developed method generate comparative analyses these assemblies and, first time, non-vertebrate eukaryotes. Meanwhile, continually improve, extend update high-value reference vertebrate details here. range specific software tools tasks, such as Variant Effect Predictor (VEP) interface Recoder. All data, freely available download accessible programmatically.

Language: Английский

Citations

1644

Scientific discovery in the age of artificial intelligence DOI
Hanchen Wang, Tianfan Fu, Yuanqi Du

et al.

Nature, Journal Year: 2023, Volume and Issue: 620(7972), P. 47 - 60

Published: Aug. 2, 2023

Language: Английский

Citations

723

SCENIC+: single-cell multiomic inference of enhancers and gene regulatory networks DOI Creative Commons
Carmen Bravo González‐Blas, Seppe De Winter, Gert Hulselmans

et al.

Nature Methods, Journal Year: 2023, Volume and Issue: 20(9), P. 1355 - 1367

Published: July 13, 2023

Abstract Joint profiling of chromatin accessibility and gene expression in individual cells provides an opportunity to decipher enhancer-driven regulatory networks (GRNs). Here we present a method for the inference GRNs, called SCENIC+. SCENIC+ predicts genomic enhancers along with candidate upstream transcription factors (TFs) links these target genes. To improve both recall precision TF identification, curated clustered motif collection more than 30,000 motifs. We benchmarked on diverse datasets from different species, including human peripheral blood mononuclear cells, ENCODE cell lines, melanoma states Drosophila retinal development. Next, exploit predictions study conserved TFs, GRNs between mouse types cerebral cortex. Finally, use dynamics regulation differentiation trajectories effect perturbations state. is available at scenicplus.readthedocs.io .

Language: Английский

Citations

284

Scaling Vision Transformers to Gigapixel Images via Hierarchical Self-Supervised Learning DOI
Richard J. Chen,

Chengkuan Chen,

Yicong Li

et al.

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Journal Year: 2022, Volume and Issue: unknown, P. 16123 - 16134

Published: June 1, 2022

Vision Transformers (ViTs) and their multi-scale hierarchical variations have been successful at capturing image representations but use has generally studied for low-resolution images (e.g. 256 × 256, 384 384). For gigapixel whole-slide imaging (WSI) in computational pathology, WSIs can be as large 150000 pixels 20 magnification exhibit a structure of visual tokens across varying resolutions: from 16 individual cells, to 4096 characterizing interactions within the tissue microenvironment. We introduce new ViT architecture called Hierarchical Image Pyramid Transformer (HIPT), which leverages natural inherent using two levels self-supervised learning learn high-resolution representations. HIPT is pretrained 33 cancer types 10,678 WSIs, 408,218 images, 104M images. benchmark on 9 slide-level tasks, demonstrate that: 1) with pretraining outperforms current state-of-the-art methods subtyping survival prediction, 2) ViTs are able model important inductive biases about phenotypes tumor

Language: Английский

Citations

279

scGPT: toward building a foundation model for single-cell multi-omics using generative AI DOI
Haotian Cui, Xiaoming Wang, Hassaan Maan

et al.

Nature Methods, Journal Year: 2024, Volume and Issue: 21(8), P. 1470 - 1480

Published: Feb. 26, 2024

Language: Английский

Citations

262

Ensembl 2024 DOI Creative Commons
Peter W. Harrison,

M Ridwan Amode,

Olanrewaju Austine-Orimoloye

et al.

Nucleic Acids Research, Journal Year: 2023, Volume and Issue: 52(D1), P. D891 - D899

Published: Nov. 11, 2023

Abstract Ensembl (https://www.ensembl.org) is a freely available genomic resource that has produced high-quality annotations, tools, and services for vertebrates model organisms more than two decades. In recent years, there been dramatic shift in the landscape, with large increase number phylogenetic breadth of reference genomes, alongside major advances pan-genome representations higher species. order to support these efforts accelerate downstream research, continues focus on scaling rapid annotation new genome assemblies, developing methods comparative analysis, expanding depth quality our annotations. This year we have continued expansion global biodiversity doubling annotated genomes Rapid Release site over 1700, driven by close collaboration projects such as Darwin Tree Life. We also strengthened key agricultural species, including first regulatory builds farmed animals, updated tools resources scientific community, notably Variant Effect Predictor. data, software, are available.

Language: Английский

Citations

256

Obtaining genetics insights from deep learning via explainable artificial intelligence DOI
Gherman Novakovsky,

Nick Dexter,

Maxwell W. Libbrecht

et al.

Nature Reviews Genetics, Journal Year: 2022, Volume and Issue: 24(2), P. 125 - 137

Published: Oct. 3, 2022

Language: Английский

Citations

223

The evolution, evolvability and engineering of gene regulatory DNA DOI
Eeshit Dhaval Vaishnav, Carl G. de Boer, Jennifer Molinet

et al.

Nature, Journal Year: 2022, Volume and Issue: 603(7901), P. 455 - 463

Published: March 9, 2022

Language: Английский

Citations

196

DeepSTARR predicts enhancer activity from DNA sequence and enables the de novo design of synthetic enhancers DOI
Bernardo P. de Almeida,

Franziska Reiter,

Michaela Pagani

et al.

Nature Genetics, Journal Year: 2022, Volume and Issue: 54(5), P. 613 - 624

Published: May 1, 2022

Language: Английский

Citations

188

Gene regulatory network inference in the era of single-cell multi-omics DOI
Pau Badia-i-Mompel, Lorna Wessels, Sophia Müller‐Dott

et al.

Nature Reviews Genetics, Journal Year: 2023, Volume and Issue: 24(11), P. 739 - 754

Published: June 26, 2023

Language: Английский

Citations

187