Transfer learning reveals sequence determinants of the quantitative response to transcription factor dosage DOI Creative Commons

Sahin Naqvi,

Seungsoo Kim,

Saman Tabatabaee

и другие.

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2024, Номер unknown

Опубликована: Май 29, 2024

Deep learning approaches have made significant advances in predicting cell type-specific chromatin patterns from the identity and arrangement of transcription factor (TF) binding motifs. However, most models been applied unperturbed contexts, precluding a predictive understanding how state responds to TF perturbation. Here, we used transfer train interpret deep that use DNA sequence predict, with accuracy approaching experimental reproducibility, concentration two dosage-sensitive TFs (TWIST1, SOX9) affects regulatory element (RE) accessibility facial progenitor cells. High-affinity motifs allow for heterotypic co-binding are concentrated at center REs buffer against quantitative changes dosage strongly predict accessibility. In contrast, low-affinity or homotypic distributed throughout lead sensitive responses minimal contributions Both buffering sensitizing features show signatures purifying selection. We validated these using reporter assays showed biophysical model TF-nucleosome competition can explain effect Our approach combining measurements response therefore represents powerful method reveal additional layers cis-regulatory code.

Язык: Английский

CADD v1.7: using protein language models, regulatory CNNs and other nucleotide-level scores to improve genome-wide variant predictions DOI Creative Commons
Max Schubach, Thorben Maaß, Lusiné Nazaretyan

и другие.

Nucleic Acids Research, Год журнала: 2024, Номер 52(D1), С. D1143 - D1154

Опубликована: Янв. 5, 2024

Machine Learning-based scoring and classification of genetic variants aids the assessment clinical findings is employed to prioritize in diverse studies analyses. Combined Annotation-Dependent Depletion (CADD) one first methods for genome-wide prioritization across different molecular functions has been continuously developed improved since its original publication. Here, we present our most recent release, CADD v1.7. We explored integrated new annotation features, among them state-of-the-art protein language model scores (Meta ESM-1v), regulatory variant effect predictions (from sequence-based convolutional neural networks) sequence conservation (Zoonomia). evaluated version on data sets derived from ClinVar, ExAC/gnomAD 1000 Genomes variants. For coding effects, tested 31 Deep Mutational Scanning (DMS) ProteinGym and, prediction, used saturation mutagenesis reporter assay promoter enhancer sequences. The inclusion features further overall performance CADD. As with previous releases, all sets, v1.7 scores, scripts on-site an easy-to-use webserver are readily provided via https://cadd.bihealth.org/ or https://cadd.gs.washington.edu/ community.

Язык: Английский

Процитировано

116

Transcription factor binding site orientation and order are major drivers of gene regulatory activity DOI Creative Commons
Ilias Georgakopoulos-Soares, Chengyu Deng, Vikram Agarwal

и другие.

Nature Communications, Год журнала: 2023, Номер 14(1)

Опубликована: Апрель 22, 2023

Abstract The gene regulatory code and grammar remain largely unknown, precluding our ability to link phenotype genotype in sequences. Here, using a massively parallel reporter assay (MPRA) of 209,440 sequences, we examine all possible pair triplet combinations, permutations orientations eighteen liver-associated transcription factor binding sites (TFBS). We find that TFBS orientation order have major effect on activity. Corroborating these results with genomic analyses, clear human promoter biases similar transcriptional effects an MPRA tested 164,307 liver candidate elements. Additionally, by adding model predicts expression from sequence improve performance 7.7%. Collectively, show significant activity need be considered when analyzing the functional variants

Язык: Английский

Процитировано

43

Multiplex profiling of developmental cis-regulatory elements with quantitative single-cell expression reporters DOI Creative Commons
Jean‐Benoît Lalanne, Samuel G. Regalado, Silvia Domcke

и другие.

Nature Methods, Год журнала: 2024, Номер 21(6), С. 983 - 993

Опубликована: Май 9, 2024

Abstract The inability to scalably and precisely measure the activity of developmental cis -regulatory elements (CREs) in multicellular systems is a bottleneck genomics. Here we develop dual RNA cassette that decouples detection quantification tasks inherent multiplex single-cell reporter assays. resulting measurement expression accurate over multiple orders magnitude, with precision approaching limit set by Poisson counting noise. Together barcode stabilization via circularization, these scalable quantitative reporters provide high-contrast readouts, analogous classic situ assays but entirely from sequencing. Screening >200 regions accessible chromatin vitro model early mammalian development, identify 13 (8 previously uncharacterized) autonomous cell-type-specific CREs. We further demonstrate chimeric CRE pairs generate cognate two-cell-type profiles assess gain- loss-of-function phenotypes variants perturbed transcription factor binding sites. Single-cell can be applied quantitatively characterize native, synthetic CREs at scale, high sensitivity resolution.

Язык: Английский

Процитировано

18

Deciphering the impact of genomic variation on function DOI
J Engreitz, Heather A. Lawson, Harinder Singh

и другие.

Nature, Год журнала: 2024, Номер 633(8028), С. 47 - 57

Опубликована: Сен. 4, 2024

Язык: Английский

Процитировано

18

Machine-guided design of cell-type-targeting cis-regulatory elements DOI Creative Commons
Sager J. Gosai, Rodrigo Castro, Natalia Fuentes

и другие.

Nature, Год журнала: 2024, Номер 634(8036), С. 1211 - 1220

Опубликована: Окт. 23, 2024

Cis-regulatory elements (CREs) control gene expression, orchestrating tissue identity, developmental timing and stimulus responses, which collectively define the thousands of unique cell types in body

Язык: Английский

Процитировано

18

A foundation model of transcription across human cell types DOI Creative Commons
Xi Fu, Shentong Mo,

Alejandro Buendia

и другие.

Nature, Год журнала: 2025, Номер 637(8047), С. 965 - 973

Опубликована: Янв. 8, 2025

Transcriptional regulation, which involves a complex interplay between regulatory sequences and proteins, directs all biological processes. Computational models of transcription lack generalizability to accurately extrapolate unseen cell types conditions. Here we introduce GET (general expression transformer), an interpretable foundation model designed uncover grammars across 213 human fetal adult types1,2. Relying exclusively on chromatin accessibility data sequence information, achieves experimental-level accuracy in predicting gene even previously types3. also shows remarkable adaptability new sequencing platforms assays, enabling inference broad range conditions, uncovers universal cell-type-specific factor interaction networks. We evaluated its performance prediction activity, elements regulators, identification physical interactions factors found that it outperforms current models4 lentivirus-based massively parallel reporter assay readout5,6. In erythroblasts7, identified distal (greater than 1 Mbp) regions were missed by previous models, and, B cells, lymphocyte-specific factor-transcription explains the functional significance leukaemia risk predisposing germline mutation8-10. sum, provide generalizable accurate for together with catalogues regulation interactions, type specificity.

Язык: Английский

Процитировано

4

DNA-Diffusion: Leveraging Generative Models for Controlling Chromatin Accessibility and Gene Expression via Synthetic Regulatory Elements DOI Creative Commons
Lucas F. daSilva,

Simon Senan,

Z. Patel

и другие.

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2024, Номер unknown

Опубликована: Фев. 1, 2024

Abstract The challenge of systematically modifying and optimizing regulatory elements for precise gene expression control is central to modern genomics synthetic biology. Advancements in generative AI have paved the way designing sequences with aim safely accurately modulating expression. We leverage diffusion models design context-specific DNA sequences, which hold significant potential toward enabling novel therapeutic applications requiring modulation Our framework uses a cell type-specific model generate 200 bp based on chromatin accessibility across different types. evaluate generated key metrics ensure they retain properties endogenous sequences: transcription factor binding site composition, accessibility, capacity by activate contexts using state-of-the-art prediction models. results demonstrate ability robustly potential. DNA-Diffusion paves revolutionizing approach mammalian biology precision therapy.

Язык: Английский

Процитировано

14

Evaluating the representational power of pre-trained DNA language models for regulatory genomics DOI Creative Commons
Ziqi Tang,

Nikunj V. Somia,

Yiyang Yu

и другие.

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2024, Номер unknown

Опубликована: Март 4, 2024

ABSTRACT The emergence of genomic language models (gLMs) offers an unsupervised approach to learning a wide diversity cis -regulatory patterns in the non-coding genome without requiring labels functional activity generated by wet-lab experiments. Previous evaluations have shown that pre-trained gLMs can be leveraged improve predictive performance across broad range regulatory genomics tasks, albeit using relatively simple benchmark datasets and baseline models. Since these studies were tested upon fine-tuning their weights for each downstream task, determining whether gLM representations embody foundational understanding biology remains open question. Here we evaluate representational power predict interpret cell-type-specific data span DNA RNA regulation. Our findings suggest probing do not offer substantial advantages over conventional machine approaches use one-hot encoded sequences. This work highlights major gap with current gLMs, raising potential issues pre-training strategies genome.

Язык: Английский

Процитировано

12

Massively parallel characterization of transcriptional regulatory elements DOI Creative Commons
Vikram Agarwal, Fumitaka Inoue, Max Schubach

и другие.

Nature, Год журнала: 2025, Номер unknown

Опубликована: Янв. 15, 2025

Abstract The human genome contains millions of candidate cis -regulatory elements (cCREs) with cell-type-specific activities that shape both health and many disease states 1 . However, we lack a functional understanding the sequence features control activity these cCREs. Here used lentivirus-based massively parallel reporter assays (lentiMPRAs) to test regulatory more than 680,000 sequences, representing an extensive set annotated cCREs among three cell types (HepG2, K562 WTC11), found 41.7% sequences were active. By testing in orientations, find promoters have strand-orientation biases their 200-nucleotide cores function as non-cell-type-specific ‘on switches’ provide similar expression levels associated gene. contrast, enhancers weaker orientation biases, but increased tissue-specific characteristics. Utilizing our lentiMPRA data, develop sequence-based models predict cCRE variant effects high accuracy, delineate motifs model combinatorial effects. Testing library encompassing 60,000 all further identified factors determine cell-type specificity. Collectively, work provides catalogue CREs widely lines showcases how large-scale measurements can be dissect grammar.

Язык: Английский

Процитировано

1

Uncovering the whole genome silencers of human cells via Ss-STARR-seq DOI Creative Commons
Xiusheng Zhu, Lei Huang, Chao Wang

и другие.

Nature Communications, Год журнала: 2025, Номер 16(1)

Опубликована: Янв. 16, 2025

Silencers, the yin to enhancers' yang, play a pivotal role in fine-tuning gene expression throughout genome. However, despite their recognized importance, comprehensive identification of these regulatory elements genome is still its early stages. We developed method called Ss-STARR-seq directly determine activity silencers whole In this study, we applied human cell lines K562, LNCaP, and 293 T, identified 134,171, 137,753, 125,307 on genome-wide scale, respectively, function various cells cell-specific manner. Silencers exhibited substantial enrichment transcriptional-inhibitory motifs, including REST, demonstrated overlap with binding sites repressor transcription factors within endogenous environment. Interestingly, H3K27me3 did not reflect silencer but facilitated silencer's inhibitory expression. Additionally, have any significant histone markers at level. Our findings unveil that aspect-silencers only transition into enhancers diverse also achieve functional conversion insulators. Regarding biological effects, knockout experiments underscored redundancy specificity regulating proliferation. summary, study pioneers elucidation landscape cells, delineates global features, identifies specific influencing cancer critical regulation. Here, authors technique identify tens thousands cells. These possess unique epigenetic features are capable cellular phenotypes.

Язык: Английский

Процитировано

1