Perspectives on Codebook: sequence specificity of uncharacterized human transcription factors DOI Creative Commons
Arttu Jolma,

Kaitlin U. Laverty,

Ali Fathi

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: Nov. 12, 2024

SUMMARY We describe an effort (“Codebook”) to determine the sequence specificity of 332 putative and largely uncharacterized human transcription factors (TFs), as well 61 control TFs. Nearly 5,000 independent experiments across multiple in vitro vivo assays produced motifs for just over half TFs analyzed (177, or 53%), which most are unique a single TF. The data highlight extensive contribution transposable elements TF evolution, both cis trans , identify tens thousands conserved, base-level binding sites genome. use provides unprecedented opportunity benchmark analyze specificity, function, further explored accompanying manuscripts. 1,421 now associated with DNA motif. Extrapolation from Codebook benchmarking, however, suggests that many currently known well-studied may inaccurately TF’s true preferences.

Language: Английский

Massively parallel characterization of transcriptional regulatory elements DOI Creative Commons
Vikram Agarwal, Fumitaka Inoue, Max Schubach

et al.

Nature, Journal Year: 2025, Volume and Issue: unknown

Published: Jan. 15, 2025

Abstract The human genome contains millions of candidate cis -regulatory elements (cCREs) with cell-type-specific activities that shape both health and many disease states 1 . However, we lack a functional understanding the sequence features control activity these cCREs. Here used lentivirus-based massively parallel reporter assays (lentiMPRAs) to test regulatory more than 680,000 sequences, representing an extensive set annotated cCREs among three cell types (HepG2, K562 WTC11), found 41.7% sequences were active. By testing in orientations, find promoters have strand-orientation biases their 200-nucleotide cores function as non-cell-type-specific ‘on switches’ provide similar expression levels associated gene. contrast, enhancers weaker orientation biases, but increased tissue-specific characteristics. Utilizing our lentiMPRA data, develop sequence-based models predict cCRE variant effects high accuracy, delineate motifs model combinatorial effects. Testing library encompassing 60,000 all further identified factors determine cell-type specificity. Collectively, work provides catalogue CREs widely lines showcases how large-scale measurements can be dissect grammar.

Language: Английский

Citations

2

Perspectives on Codebook: sequence specificity of uncharacterized human transcription factors DOI Creative Commons
Arttu Jolma,

Kaitlin U. Laverty,

Ali Fathi

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: Nov. 12, 2024

SUMMARY We describe an effort (“Codebook”) to determine the sequence specificity of 332 putative and largely uncharacterized human transcription factors (TFs), as well 61 control TFs. Nearly 5,000 independent experiments across multiple in vitro vivo assays produced motifs for just over half TFs analyzed (177, or 53%), which most are unique a single TF. The data highlight extensive contribution transposable elements TF evolution, both cis trans , identify tens thousands conserved, base-level binding sites genome. use provides unprecedented opportunity benchmark analyze specificity, function, further explored accompanying manuscripts. 1,421 now associated with DNA motif. Extrapolation from Codebook benchmarking, however, suggests that many currently known well-studied may inaccurately TF’s true preferences.

Language: Английский

Citations

4