bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2024,
Номер
unknown
Опубликована: Май 29, 2024
Deep
learning
approaches
have
made
significant
advances
in
predicting
cell
type-specific
chromatin
patterns
from
the
identity
and
arrangement
of
transcription
factor
(TF)
binding
motifs.
However,
most
models
been
applied
unperturbed
contexts,
precluding
a
predictive
understanding
how
state
responds
to
TF
perturbation.
Here,
we
used
transfer
train
interpret
deep
that
use
DNA
sequence
predict,
with
accuracy
approaching
experimental
reproducibility,
concentration
two
dosage-sensitive
TFs
(TWIST1,
SOX9)
affects
regulatory
element
(RE)
accessibility
facial
progenitor
cells.
High-affinity
motifs
allow
for
heterotypic
co-binding
are
concentrated
at
center
REs
buffer
against
quantitative
changes
dosage
strongly
predict
accessibility.
In
contrast,
low-affinity
or
homotypic
distributed
throughout
lead
sensitive
responses
minimal
contributions
Both
buffering
sensitizing
features
show
signatures
purifying
selection.
We
validated
these
using
reporter
assays
showed
biophysical
model
TF-nucleosome
competition
can
explain
effect
Our
approach
combining
measurements
response
therefore
represents
powerful
method
reveal
additional
layers
cis-regulatory
code.
Nucleic Acids Research,
Год журнала:
2024,
Номер
52(D1), С. D1143 - D1154
Опубликована: Янв. 5, 2024
Machine
Learning-based
scoring
and
classification
of
genetic
variants
aids
the
assessment
clinical
findings
is
employed
to
prioritize
in
diverse
studies
analyses.
Combined
Annotation-Dependent
Depletion
(CADD)
one
first
methods
for
genome-wide
prioritization
across
different
molecular
functions
has
been
continuously
developed
improved
since
its
original
publication.
Here,
we
present
our
most
recent
release,
CADD
v1.7.
We
explored
integrated
new
annotation
features,
among
them
state-of-the-art
protein
language
model
scores
(Meta
ESM-1v),
regulatory
variant
effect
predictions
(from
sequence-based
convolutional
neural
networks)
sequence
conservation
(Zoonomia).
evaluated
version
on
data
sets
derived
from
ClinVar,
ExAC/gnomAD
1000
Genomes
variants.
For
coding
effects,
tested
31
Deep
Mutational
Scanning
(DMS)
ProteinGym
and,
prediction,
used
saturation
mutagenesis
reporter
assay
promoter
enhancer
sequences.
The
inclusion
features
further
overall
performance
CADD.
As
with
previous
releases,
all
sets,
v1.7
scores,
scripts
on-site
an
easy-to-use
webserver
are
readily
provided
via
https://cadd.bihealth.org/
or
https://cadd.gs.washington.edu/
community.
Nature Communications,
Год журнала:
2023,
Номер
14(1)
Опубликована: Апрель 22, 2023
Abstract
The
gene
regulatory
code
and
grammar
remain
largely
unknown,
precluding
our
ability
to
link
phenotype
genotype
in
sequences.
Here,
using
a
massively
parallel
reporter
assay
(MPRA)
of
209,440
sequences,
we
examine
all
possible
pair
triplet
combinations,
permutations
orientations
eighteen
liver-associated
transcription
factor
binding
sites
(TFBS).
We
find
that
TFBS
orientation
order
have
major
effect
on
activity.
Corroborating
these
results
with
genomic
analyses,
clear
human
promoter
biases
similar
transcriptional
effects
an
MPRA
tested
164,307
liver
candidate
elements.
Additionally,
by
adding
model
predicts
expression
from
sequence
improve
performance
7.7%.
Collectively,
show
significant
activity
need
be
considered
when
analyzing
the
functional
variants
Nature Methods,
Год журнала:
2024,
Номер
21(6), С. 983 - 993
Опубликована: Май 9, 2024
Abstract
The
inability
to
scalably
and
precisely
measure
the
activity
of
developmental
cis
-regulatory
elements
(CREs)
in
multicellular
systems
is
a
bottleneck
genomics.
Here
we
develop
dual
RNA
cassette
that
decouples
detection
quantification
tasks
inherent
multiplex
single-cell
reporter
assays.
resulting
measurement
expression
accurate
over
multiple
orders
magnitude,
with
precision
approaching
limit
set
by
Poisson
counting
noise.
Together
barcode
stabilization
via
circularization,
these
scalable
quantitative
reporters
provide
high-contrast
readouts,
analogous
classic
situ
assays
but
entirely
from
sequencing.
Screening
>200
regions
accessible
chromatin
vitro
model
early
mammalian
development,
identify
13
(8
previously
uncharacterized)
autonomous
cell-type-specific
CREs.
We
further
demonstrate
chimeric
CRE
pairs
generate
cognate
two-cell-type
profiles
assess
gain-
loss-of-function
phenotypes
variants
perturbed
transcription
factor
binding
sites.
Single-cell
can
be
applied
quantitatively
characterize
native,
synthetic
CREs
at
scale,
high
sensitivity
resolution.
Nature,
Год журнала:
2024,
Номер
634(8036), С. 1211 - 1220
Опубликована: Окт. 23, 2024
Cis-regulatory
elements
(CREs)
control
gene
expression,
orchestrating
tissue
identity,
developmental
timing
and
stimulus
responses,
which
collectively
define
the
thousands
of
unique
cell
types
in
body
Nature,
Год журнала:
2025,
Номер
637(8047), С. 965 - 973
Опубликована: Янв. 8, 2025
Transcriptional
regulation,
which
involves
a
complex
interplay
between
regulatory
sequences
and
proteins,
directs
all
biological
processes.
Computational
models
of
transcription
lack
generalizability
to
accurately
extrapolate
unseen
cell
types
conditions.
Here
we
introduce
GET
(general
expression
transformer),
an
interpretable
foundation
model
designed
uncover
grammars
across
213
human
fetal
adult
types1,2.
Relying
exclusively
on
chromatin
accessibility
data
sequence
information,
achieves
experimental-level
accuracy
in
predicting
gene
even
previously
types3.
also
shows
remarkable
adaptability
new
sequencing
platforms
assays,
enabling
inference
broad
range
conditions,
uncovers
universal
cell-type-specific
factor
interaction
networks.
We
evaluated
its
performance
prediction
activity,
elements
regulators,
identification
physical
interactions
factors
found
that
it
outperforms
current
models4
lentivirus-based
massively
parallel
reporter
assay
readout5,6.
In
erythroblasts7,
identified
distal
(greater
than
1
Mbp)
regions
were
missed
by
previous
models,
and,
B
cells,
lymphocyte-specific
factor-transcription
explains
the
functional
significance
leukaemia
risk
predisposing
germline
mutation8-10.
sum,
provide
generalizable
accurate
for
together
with
catalogues
regulation
interactions,
type
specificity.
bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2024,
Номер
unknown
Опубликована: Фев. 1, 2024
Abstract
The
challenge
of
systematically
modifying
and
optimizing
regulatory
elements
for
precise
gene
expression
control
is
central
to
modern
genomics
synthetic
biology.
Advancements
in
generative
AI
have
paved
the
way
designing
sequences
with
aim
safely
accurately
modulating
expression.
We
leverage
diffusion
models
design
context-specific
DNA
sequences,
which
hold
significant
potential
toward
enabling
novel
therapeutic
applications
requiring
modulation
Our
framework
uses
a
cell
type-specific
model
generate
200
bp
based
on
chromatin
accessibility
across
different
types.
evaluate
generated
key
metrics
ensure
they
retain
properties
endogenous
sequences:
transcription
factor
binding
site
composition,
accessibility,
capacity
by
activate
contexts
using
state-of-the-art
prediction
models.
results
demonstrate
ability
robustly
potential.
DNA-Diffusion
paves
revolutionizing
approach
mammalian
biology
precision
therapy.
bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2024,
Номер
unknown
Опубликована: Март 4, 2024
ABSTRACT
The
emergence
of
genomic
language
models
(gLMs)
offers
an
unsupervised
approach
to
learning
a
wide
diversity
cis
-regulatory
patterns
in
the
non-coding
genome
without
requiring
labels
functional
activity
generated
by
wet-lab
experiments.
Previous
evaluations
have
shown
that
pre-trained
gLMs
can
be
leveraged
improve
predictive
performance
across
broad
range
regulatory
genomics
tasks,
albeit
using
relatively
simple
benchmark
datasets
and
baseline
models.
Since
these
studies
were
tested
upon
fine-tuning
their
weights
for
each
downstream
task,
determining
whether
gLM
representations
embody
foundational
understanding
biology
remains
open
question.
Here
we
evaluate
representational
power
predict
interpret
cell-type-specific
data
span
DNA
RNA
regulation.
Our
findings
suggest
probing
do
not
offer
substantial
advantages
over
conventional
machine
approaches
use
one-hot
encoded
sequences.
This
work
highlights
major
gap
with
current
gLMs,
raising
potential
issues
pre-training
strategies
genome.
Abstract
The
human
genome
contains
millions
of
candidate
cis
-regulatory
elements
(cCREs)
with
cell-type-specific
activities
that
shape
both
health
and
many
disease
states
1
.
However,
we
lack
a
functional
understanding
the
sequence
features
control
activity
these
cCREs.
Here
used
lentivirus-based
massively
parallel
reporter
assays
(lentiMPRAs)
to
test
regulatory
more
than
680,000
sequences,
representing
an
extensive
set
annotated
cCREs
among
three
cell
types
(HepG2,
K562
WTC11),
found
41.7%
sequences
were
active.
By
testing
in
orientations,
find
promoters
have
strand-orientation
biases
their
200-nucleotide
cores
function
as
non-cell-type-specific
‘on
switches’
provide
similar
expression
levels
associated
gene.
contrast,
enhancers
weaker
orientation
biases,
but
increased
tissue-specific
characteristics.
Utilizing
our
lentiMPRA
data,
develop
sequence-based
models
predict
cCRE
variant
effects
high
accuracy,
delineate
motifs
model
combinatorial
effects.
Testing
library
encompassing
60,000
all
further
identified
factors
determine
cell-type
specificity.
Collectively,
work
provides
catalogue
CREs
widely
lines
showcases
how
large-scale
measurements
can
be
dissect
grammar.
Nature Communications,
Год журнала:
2025,
Номер
16(1)
Опубликована: Янв. 16, 2025
Silencers,
the
yin
to
enhancers'
yang,
play
a
pivotal
role
in
fine-tuning
gene
expression
throughout
genome.
However,
despite
their
recognized
importance,
comprehensive
identification
of
these
regulatory
elements
genome
is
still
its
early
stages.
We
developed
method
called
Ss-STARR-seq
directly
determine
activity
silencers
whole
In
this
study,
we
applied
human
cell
lines
K562,
LNCaP,
and
293
T,
identified
134,171,
137,753,
125,307
on
genome-wide
scale,
respectively,
function
various
cells
cell-specific
manner.
Silencers
exhibited
substantial
enrichment
transcriptional-inhibitory
motifs,
including
REST,
demonstrated
overlap
with
binding
sites
repressor
transcription
factors
within
endogenous
environment.
Interestingly,
H3K27me3
did
not
reflect
silencer
but
facilitated
silencer's
inhibitory
expression.
Additionally,
have
any
significant
histone
markers
at
level.
Our
findings
unveil
that
aspect-silencers
only
transition
into
enhancers
diverse
also
achieve
functional
conversion
insulators.
Regarding
biological
effects,
knockout
experiments
underscored
redundancy
specificity
regulating
proliferation.
summary,
study
pioneers
elucidation
landscape
cells,
delineates
global
features,
identifies
specific
influencing
cancer
critical
regulation.
Here,
authors
technique
identify
tens
thousands
cells.
These
possess
unique
epigenetic
features
are
capable
cellular
phenotypes.