Frontiers in Genome Editing,
Год журнала:
2024,
Номер
6
Опубликована: Окт. 31, 2024
Large
scale
cancer
genomic
studies
in
patients
have
unveiled
millions
of
non-coding
variants.
While
a
handful
been
shown
to
drive
development,
the
vast
majority
unknown
function.
This
review
describes
challenges
functionally
annotating
variants
and
understanding
how
they
contribute
cancer.
We
summarize
recently
developed
high-throughput
technologies
address
these
challenges.
Finally,
we
outline
future
prospects
for
genetics
help
catalyze
personalized
therapy.
bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2024,
Номер
unknown
Опубликована: Дек. 25, 2024
Despite
extensive
mapping
of
cis-regulatory
elements
(cREs)
across
cellular
contexts
with
chromatin
accessibility
assays,
the
sequence
syntax
and
genetic
variants
that
regulate
transcription
factor
(TF)
binding
at
context-specific
cREs
remain
elusive.
We
introduce
ChromBPNet,
a
deep
learning
DNA
model
base-resolution
profiles
detects,
learns
deconvolves
assay-specific
enzyme
biases
from
regulatory
determinants
accessibility,
enabling
robust
discovery
compact
TF
motif
lexicons,
cooperative
precision
footprints
assays
sequencing
depths.
Extensive
benchmarks
show
despite
its
lightweight
design,
is
competitive
much
larger
contemporary
models
predicting
variant
effects
on
pioneer
reporter
activity
cell
ancestry,
while
providing
interpretation
disrupted
syntax.
ChromBPNet
also
helps
prioritize
interpret
influence
complex
traits
rare
diseases,
thereby
powerful
lens
to
decode
variation.
bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2024,
Номер
unknown
Опубликована: Июнь 1, 2024
Despite
extensive
characterization
of
mammalian
Pol
II
transcription,
the
DNA
sequence
determinants
transcription
initiation
at
a
third
human
promoters
and
most
enhancers
remain
poorly
understood.
We
trained
interpreted
neural
network
called
ProCapNet
that
accurately
models
base-resolution
profiles
from
PRO-cap
experiments
using
local
sequence.
learns
motifs
with
distinct
effects
on
rates
TSS
positioning
uncovers
context-specific
cryptic
initiator
elements
intertwined
within
other
TF
motifs.
annotates
predictive
in
nearly
all
actively
transcribed
regulatory
across
multiple
cell-lines,
revealing
shared
Prime
editing
installs
precise
edits
into
the
genome
with
minimal
unwanted
byproducts,
but
low
and
variable
efficiencies
have
complicated
application
of
approach
to
high-throughput
functional
genomics.
Here
we
assembled
a
prime
platform
capable
high-efficiency
substitution
suitable
for
interrogation
small
genetic
variants.
We
benchmarked
this
pooled,
loss-of-function
screening
using
library
~240,000
engineered
guide
RNAs
(epegRNAs)
targeting
~17,000
codons
1–3
bp
substitutions.
Comparing
abundance
these
epegRNAs
across
screen
samples
identified
negative
selection
phenotypes
7,996
nonsense
mutations
targeted
1,149
essential
genes
synonymous
that
disrupted
splice
site
motifs
at
3′
exon
boundaries.
Rigorous
evaluation
codon-matched
controls
demonstrated
were
highly
specific
intended
edit.
Altogether,
established
multiplexed,
characterization
variants
simple
readouts.
This
work
establishes
(up
tens
thousands)
phenotypes.
Biochemical Society Transactions,
Год журнала:
2024,
Номер
52(2), С. 803 - 819
Опубликована: Апрель 17, 2024
Recent
advances
in
genome
editing
technologies
are
allowing
investigators
to
engineer
and
study
cancer-associated
mutations
their
endogenous
genetic
contexts
with
high
precision
efficiency.
Of
these,
base
prime
quickly
becoming
gold-standards
the
field
due
versatility
scalability.
Here,
we
review
merits
limitations
of
these
technologies,
application
modern
cancer
research,
speculate
how
could
be
integrated
address
future
directions
field.
bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2024,
Номер
unknown
Опубликована: Июль 27, 2024
Models
that
predict
RNA
levels
from
DNA
sequences
show
tremendous
promise
for
decoding
tissue-specific
gene
regulatory
mechanisms,
revealing
the
genetic
architecture
of
traits,
and
interpreting
noncoding
variation.
Existing
methods
take
two
different
approaches:
1)
associating
expression
with
linear
combinations
common
variants
(training
across
individuals
on
single
genes),
or
2)
learning
genome-wide
sequence-to-expression
rules
neural
networks
loci
using
a
reference
genome).
Since
limitations
both
strategies
have
been
highlighted
recently,
we
sought
to
combine
sequence
context
provided
by
deep
information
cross-individual
training.
We
utilized
fine-tuning
develop
Performer,
model
accuracy
approaching
cis-heritability
most
genes.
Performer
prioritizes
allele
frequency
spectrum
disrupt
motifs,
fall
in
annotated
elements,
functional
evidence
modulating
expression.
While
obstacles
remain
personalized
prediction,
our
findings
establish
as
viable
strategy.
bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2024,
Номер
unknown
Опубликована: Май 24, 2024
Engineering
regulatory
DNA
sequences
with
precise
activity
levels
in
specific
cell
types
hold
immense
potential
for
medicine
and
biotechnology.
However,
the
vast
combinatorial
space
of
possible
complex
grammars
governing
gene
regulation
have
proven
challenging
existing
approaches.
Supervised
deep
learning
models
that
score
proposed
by
local
search
algorithms
ignore
global
structure
functional
sequence
space.
While
diffusion-based
generative
shown
promise
these
distributions,
their
application
to
has
been
limited.
Evaluating
quality
generated
also
remains
due
a
lack
unified
framework
characterizes
key
properties
DNA.
Here
we
introduce
Discrete
Diffusion
(D3),
conditionally
sampling
targeted
levels.
We
develop
comprehensive
suite
evaluation
metrics
assess
similarity,
composition
sequences.
Through
benchmarking
on
three
high-quality
genomics
datasets
spanning
human
promoters
fly
enhancers,
demonstrate
D3
outperforms
methods
capturing
diversity
cis-regulatory
generating
more
accurately
reflect
genomic
Furthermore,
show
D3-generated
can
effectively
augment
supervised
improve
predictive
performance,
even
data-limited
scenarios.
Abstract
The
rapid
advancement
of
sequencing
technologies
has
led
to
the
identification
numerous
mutations
in
cancer
genomes,
many
which
are
variants
unknown
significance
(VUS).
Computational
models
increasingly
being
used
predict
functional
impact
these
mutations,
both
coding
and
noncoding
regions.
Integration
with
emerging
genomic
datasets
will
refine
our
understanding
mutation
effects
guide
clinical
decision
making.
Future
advancements
modeling
protein
interactions
transcriptional
regulation
further
enhance
ability
interpret
VUS.
Periodic
incorporation
developments
into
VUS
reclassification
practice
potential
significantly
improve
personalized
care.
medRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2024,
Номер
unknown
Опубликована: Ноя. 21, 2024
Abstract
Congenital
heart
defects
(CHD)
arise
in
part
due
to
inherited
genetic
variants
that
alter
genes
and
noncoding
regulatory
elements
the
human
genome.
These
are
thought
act
during
fetal
development
influence
formation
of
different
structures.
However,
identifying
genes,
pathways,
cell
types
mediate
these
effects
has
been
challenging
immense
diversity
involved
as
well
superimposed
complexities
interpreting
sequences.
As
such,
understanding
molecular
functions
both
coding
remains
paramount
our
fundamental
cardiac
CHD.
Here,
we
created
a
gene
regulation
map
healthy
across
developmental
time,
applied
it
interpret
associated
with
CHD
quantitative
traits.
We
collected
single-cell
multiomic
data
from
734,000
single
cells
sampled
41
hearts
spanning
post-conception
weeks
6
22,
enabling
construction
maps
90
states,
including
rare
populations
conduction
cells.
Through
an
unbiased
analysis
all
types,
find
common
valve
traits
converge
affect
valvular
interstitial
(VICs).
VICs
enriched
for
high
expression
known
previously
identified
through
mapping
variants.
Eight
other
similar
linked
diseases
or
via
enhancers
VICs.
In
addition,
certain
impact
activities
highly
specific
particular
subanatomic
structures
heart,
illuminating
how
such
can
aspects
structure
function.
Together,
results
implicate
new
enhancers,
etiology
CHD,
identify
convergence
on
VICs,
suggest
more
expansive
view
instrumental
risk
beyond
working
cardiomyocyte.
This
will
provide
foundational
resource
development,
disease,
discovering
targets
cell-type
therapies.