bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2023,
Volume and Issue:
unknown
Published: April 20, 2023
A
protein’s
genetic
architecture
–
the
set
of
causal
rules
by
which
its
sequence
produces
functions
also
determines
possible
evolutionary
trajectories.
Prior
research
has
proposed
that
proteins
is
very
complex,
with
pervasive
epistatic
interactions
constrain
evolution
and
make
function
difficult
to
predict
from
sequence.
Most
this
work
analyzed
only
direct
paths
between
two
interest
excluding
vast
majority
genotypes
trajectories
considered
a
single
protein
function,
leaving
unaddressed
functional
specificity
impact
on
new
functions.
Here
we
develop
method
based
ordinal
logistic
regression
directly
characterize
global
determinants
multiple
20-state
combinatorial
deep
mutational
scanning
(DMS)
experiments.
We
use
it
dissect
transcription
factor’s
for
DNA,
using
data
DMS
an
ancient
steroid
hormone
receptor’s
capacity
activate
biologically
relevant
DNA
elements.
show
recognition
consists
dense
main
pairwise
effects
involve
virtually
every
amino
acid
state
in
protein-DNA
interface,
but
higher-order
epistasis
plays
tiny
role.
Pairwise
enlarge
sequences
are
primary
different
They
massively
expand
number
opportunities
single-residue
mutations
switch
one
target
another.
By
bringing
variants
close
together
space,
therefore
facilitates
rather
than
constrains
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Sept. 24, 2024
Epistasis
complicates
our
understanding
of
protein
sequence-function
relationships
and
impedes
ability
to
build
accurate
predictive
models
for
novel
genotypes.
Although
pairwise
epistasis
has
been
extensively
studied
in
proteins,
the
significance
higher-order
remains
contentious,
largely
due
challenges
fitting
epistatatic
interactions
full-length
proteins.
Here,
we
introduce
a
transformer-based
approach.
The
key
feature
method
is
that
can
adjust
order
fit
by
model
changing
number
attention
layers
while
also
accounting
any
global
nonlinearity
induced
experimental
conditions.
This
allows
us
test
if
inclusion
leads
enhanced
performance.
Applying
10
large
datasets,
found
importance
differs
substantially
between
up
60%
total
variance
attributed
epistasis.
We
including
particularly
important
generalizing
locally
sampled
fitness
data
distant
regions
sequence
space
modeling
an
additional
multipeak
landscape
derived
from
combining
mutagenesis
4
orthologous
green
fluorescencent
Our
findings
suggest
often
does
play
role
relationships,
thus
should
be
properly
incorporated
during
engineering
evolutionary
analysis.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2023,
Volume and Issue:
unknown
Published: April 20, 2023
A
protein’s
genetic
architecture
–
the
set
of
causal
rules
by
which
its
sequence
produces
functions
also
determines
possible
evolutionary
trajectories.
Prior
research
has
proposed
that
proteins
is
very
complex,
with
pervasive
epistatic
interactions
constrain
evolution
and
make
function
difficult
to
predict
from
sequence.
Most
this
work
analyzed
only
direct
paths
between
two
interest
excluding
vast
majority
genotypes
trajectories
considered
a
single
protein
function,
leaving
unaddressed
functional
specificity
impact
on
new
functions.
Here
we
develop
method
based
ordinal
logistic
regression
directly
characterize
global
determinants
multiple
20-state
combinatorial
deep
mutational
scanning
(DMS)
experiments.
We
use
it
dissect
transcription
factor’s
for
DNA,
using
data
DMS
an
ancient
steroid
hormone
receptor’s
capacity
activate
biologically
relevant
DNA
elements.
show
recognition
consists
dense
main
pairwise
effects
involve
virtually
every
amino
acid
state
in
protein-DNA
interface,
but
higher-order
epistasis
plays
tiny
role.
Pairwise
enlarge
sequences
are
primary
different
They
massively
expand
number
opportunities
single-residue
mutations
switch
one
target
another.
By
bringing
variants
close
together
space,
therefore
facilitates
rather
than
constrains