bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: July 23, 2024
Functional
innovation
at
the
protein
level
is
a
key
source
of
evolutionary
novelties.
The
constraints
on
functional
innovations
are
likely
to
be
highly
specific
in
different
proteins,
which
shaped
by
their
unique
histories
and
extent
global
epistasis
that
arises
from
structures
biochemistries.
These
contextual
nuances
sequence-function
relationship
have
implications
both
for
basic
understanding
process
engineering
proteins
with
desirable
properties.
Here,
we
investigated
molecular
basis
novel
function
model
member
an
ancient,
conserved,
biotechnologically
relevant
family.
Major
Facilitator
Superfamily
sugar
porters
functionally
diverse
group
thought
plastic
evolvable.
By
dissecting
recent
α-glucoside
transporter
yeast
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2023,
Volume and Issue:
unknown
Published: Sept. 5, 2023
How
complicated
is
the
genetic
architecture
of
proteins
-
set
causal
effects
by
which
sequence
determines
function?
High-order
epistatic
interactions
among
residues
are
thought
to
be
pervasive,
making
a
protein's
function
difficult
predict
or
understand
from
its
sequence.
Most
studies,
however,
used
methods
that
overestimate
epistasis,
because
they
analyze
relative
designated
reference
causing
measurement
noise
and
small
local
idiosyncrasies
propagate
into
pervasive
high-order
have
not
effectively
accounted
for
global
nonlinearity
in
sequence-function
relationship.
Here
we
present
new
reference-free
method
jointly
estimates
specific
across
entire
genotype-phenotype
map.
This
yields
maximally
efficient
explanation
more
robust
than
existing
noise,
partial
sampling,
model
misspecification.
We
reanalyze
20
combinatorial
mutagenesis
experiments
diverse
find
additive
pairwise
effects,
along
with
simple
account
limited
dynamic
range,
explain
median
96%
total
variance
measured
phenotypes
(and
>92%
every
case).
Only
tiny
fraction
genotypes
strongly
affected
third-
higher-order
epistasis.
Genetic
also
sparse:
number
terms
required
vast
majority
smaller
many
orders
magnitude.
The
relationship
most
therefore
far
simpler
previously
thought,
opening
way
tractable
approaches
characterize
it.
Nature Immunology,
Journal Year:
2025,
Volume and Issue:
26(5), P. 760 - 774
Published: April 30, 2025
Adaptive
immunity
and
the
five
vertebrate
NF-κB
family
members
first
emerged
in
cartilaginous
fish,
suggesting
that
divergence
helped
to
facilitate
adaptive
immunity.
One
specialized
function
of
Rel
protein
macrophages
is
activation
Il12b,
which
encodes
a
key
regulator
T
cell
development.
We
found
Il12b
exhibits
much
greater
dependence
than
inducible
innate
genes
macrophages,
with
unique
dimers
depending
on
heightened
intrinsic
DNA-binding
affinity.
Chromatin
immunoprecipitation
followed
by
sequencing
experiments
defined
differential
preferences
genome-wide,
X-ray
crystallography
revealed
residue
supports
affinity
dimers.
Unexpectedly,
this
residue,
dimers,
portion
promoter
bound
were
largely
restricted
mammals.
Our
findings
reveal
major
structural
transitions
an
member
one
its
target
promoters
at
late
stage
evolution
apparently
contributed
immunoregulatory
rewiring
mammalian
species.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Jan. 30, 2024
In
a
recent
preprint,
Park,
Metzger,
and
Thornton
reanalyze
20
empirical
protein
sequence-function
landscapes
using
"reference-free
analysis"
(RFA)
method
they
recently
developed.
They
argue
that
these
are
simpler
less
epistatic
than
earlier
work
suggested,
attribute
the
difference
to
limitations
of
methods
used
in
original
analyses
landscapes,
which
claim
more
sensitive
measurement
noise,
missing
data,
other
artifacts.
Here,
we
show
claims
incorrect.
Instead,
find
RFA
introduced
by
Park
et
al.
is
exactly
equivalent
reference-based
least-squares
analysis
many
(and
also
Hadamard-based
approach
implement).
Because
reanalyzed
fact
identical,
different
conclusions
drawn
instead
reflect
interpretations
parameters
describing
inferred
landscapes;
do
not
support
conclusion
epistasis
plays
only
small
role
landscapes.
Proceedings of the National Academy of Sciences,
Journal Year:
2024,
Volume and Issue:
121(34)
Published: Aug. 12, 2024
Mutations
in
protein
active
sites
can
dramatically
improve
function.
The
site,
however,
is
densely
packed
and
extremely
sensitive
to
mutations.
Therefore,
some
mutations
may
only
be
tolerated
combination
with
others
a
phenomenon
known
as
epistasis.
Epistasis
reduces
the
likelihood
of
obtaining
improved
functional
variants
slows
natural
lab
evolutionary
processes.
Research
has
shed
light
on
molecular
origins
epistasis
its
role
shaping
trajectories
outcomes.
In
addition,
sequence-
AI-based
strategies
that
infer
epistatic
relationships
from
mutational
patterns
or
experimental
evolution
data
have
been
used
design
variants.
recent
years,
combinations
such
approaches
atomistic
calculations
successfully
predicted
highly
combinatorial
sites.
These
were
thousands
active-site
variants,
demonstrating
that,
while
our
understanding
remains
incomplete,
determinants
are
critical
for
accurate
now
sufficiently
understood.
We
conclude
space
explored
by
expanded
enhance
activities
discover
new
ones.
Furthermore,
opens
way
systematically
exploring
sequence
structure
impacts
function,
deepening
control
over
activity.
A
protein's
genetic
architecture
-
the
set
of
causal
rules
by
which
its
sequence
produces
functions
also
determines
possible
evolutionary
trajectories.
Prior
research
has
proposed
that
proteins
is
very
complex,
with
pervasive
epistatic
interactions
constrain
evolution
and
make
function
difficult
to
predict
from
sequence.
Most
this
work
analyzed
only
direct
paths
between
two
interest
excluding
vast
majority
genotypes
trajectories
considered
a
single
protein
function,
leaving
unaddressed
functional
specificity
impact
on
new
functions.
Here,
we
develop
method
based
ordinal
logistic
regression
directly
characterize
global
determinants
multiple
20-state
combinatorial
deep
mutational
scanning
(DMS)
experiments.
We
use
it
dissect
transcription
factor's
for
DNA,
using
data
DMS
an
ancient
steroid
hormone
receptor's
capacity
activate
biologically
relevant
DNA
elements.
show
recognition
consists
dense
main
pairwise
effects
involve
virtually
every
amino
acid
state
in
protein-DNA
interface,
but
higher-order
epistasis
plays
tiny
role.
Pairwise
enlarge
sequences
are
primary
different
They
massively
expand
number
opportunities
single-residue
mutations
switch
one
target
another.
By
bringing
variants
close
together
space,
therefore
facilitates
rather
than
constrains
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2025,
Volume and Issue:
unknown
Published: Jan. 29, 2025
ABSTRACT
Biological
systems
may
be
biased
in
the
phenotypes
they
can
access
by
mutation
1–7
,
but
extent
of
these
biases
and
their
causal
role
evolution
extant
phenotypic
diversity
remains
unclear.
There
are
three
major
challenges:
it
is
difficult
to
isolate
effect
bias
genotype-phenotype
(GP)
map
from
that
natural
selection
producing
6,8–11
universe
possible
genotypes
so
vast
complex
a
direct
characterization
has
been
impossible,
most
evolved
long
ago
species
whose
GP
maps
cannot
recovered.
Here
we
develop
exhaustive
multi-phenotype
deep
mutational
scanning
experimentally
characterize
complete
two
reconstructed
ancestral
steroid
receptor
proteins,
which
existed
during
an
ancient
phylogenetic
interval
when
new
phenotype—specific
binding
DNA
response
element—evolved
12
.
We
measured
all
specificity
encoded
amino
acid
combinations
at
sites
protein’s
interface.
found
structured
strong
global
bias—unequal
propensity
encode
various
phenotypes—and
extreme
heterogeneity
accessible
around
each
genotype,
strongly
affect
on
both
short
timescales.
Distinct
steered
toward
lineage-specific
functional
history.
Our
findings
establish
relationship
were
factors
evolutionary
process
produced
present-day
patterns
conservation
this
protein
family.
PLoS Computational Biology,
Journal Year:
2025,
Volume and Issue:
21(3), P. e1012818 - e1012818
Published: March 20, 2025
Quantitative
models
of
sequence-function
relationships
are
ubiquitous
in
computational
biology,
e.g.,
for
modeling
the
DNA
binding
transcription
factors
or
fitness
landscapes
proteins.
Interpreting
these
models,
however,
is
complicated
by
fact
that
values
model
parameters
can
often
be
changed
without
affecting
predictions.
Before
meaningfully
interpreted,
one
must
remove
degrees
freedom
(called
“gauge
freedoms”
physics)
imposing
additional
constraints
(a
process
called
“fixing
gauge”).
However,
strategies
fixing
gauge
have
received
little
attention.
Here
we
derive
an
analytically
tractable
family
gauges
a
large
class
relationships.
These
derived
context
with
all-order
interactions,
but
important
subset
applied
to
diverse
types
including
additive
pairwise-interaction
and
higher-order
interactions.
Many
commonly
used
special
cases
within
this
family.
We
demonstrate
utility
showing
how
different
choices
both
explore
complex
activity
reveal
simplified
approximately
correct
localized
regions
sequence
space.
The
results
provide
practical
gauge-fixing
exploration
interpretation.
Biochemistry,
Journal Year:
2025,
Volume and Issue:
unknown
Published: March 25, 2025
Proteins
evolve
through
complex
sequence
spaces,
with
fitness
landscapes
serving
as
a
conceptual
framework
that
links
to
function.
Fitness
can
be
smooth,
where
multiple
similarly
accessible
evolutionary
paths
are
available,
or
rugged,
the
presence
of
local
optima
complicate
evolution
and
prediction.
Indeed,
many
proteins,
especially
those
functions
under
selection
pressures,
exist
on
rugged
landscapes.
Here
we
discuss
theoretical
underpins
our
understanding
landscapes,
alongside
recent
work
has
advanced
understanding─particularly
biophysical
basis
for
smoothness
versus
ruggedness.
Finally,
address
rapid
advances
have
been
made
in
computational
experimental
exploration
exploitation
how
these
identify
efficient
routes
protein
optimization.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: May 13, 2024
Quantitative
models
of
sequence-function
relationships
are
ubiquitous
in
computational
biology,
e.g.,
for
modeling
the
DNA
binding
transcription
factors
or
fitness
landscapes
proteins.
Interpreting
these
models,
however,
is
complicated
by
fact
that
values
model
parameters
can
often
be
changed
without
affecting
predictions.
Before
meaningfully
interpreted,
one
must
remove
degrees
freedom
(called
"gauge
freedoms"
physics)
imposing
additional
constraints
(a
process
called
"fixing
gauge").
However,
strategies
fixing
gauge
have
received
little
attention.
Here
we
derive
an
analytically
tractable
family
gauges
a
large
class
relationships.
These
derived
context
with
all-order
interactions,
but
important
subset
applied
to
diverse
types
including
additive
pairwise-interaction
and
higher-order
interactions.
Many
commonly
used
special
cases
within
this
family.
We
demonstrate
utility
showing
how
different
choices
both
explore
complex
activity
reveal
simplified
approximately
correct
localized
regions
sequence
space.
The
results
provide
practical
gauge-fixing
exploration
interpretation.
Molecular Biology and Evolution,
Journal Year:
2024,
Volume and Issue:
41(11)
Published: Oct. 30, 2024
Abstract
Functional
innovation
at
the
protein
level
is
a
key
source
of
evolutionary
novelties.
The
constraints
on
functional
innovations
are
likely
to
be
highly
specific
in
different
proteins,
which
shaped
by
their
unique
histories
and
extent
global
epistasis
that
arises
from
structures
biochemistries.
These
contextual
nuances
sequence–function
relationship
have
implications
both
for
basic
understanding
process
engineering
proteins
with
desirable
properties.
Here,
we
investigated
molecular
basis
novel
function
model
member
an
ancient,
conserved,
biotechnologically
relevant
family.
Major
Facilitator
Superfamily
sugar
porters
functionally
diverse
group
thought
plastic
evolvable.
By
dissecting
recent
α-glucoside
transporter
yeast
Saccharomyces
eubayanus,
show
ability
transport
substrate
requires
high-order
interactions
between
many
regions
numerous
residues
proximal
channel.
To
reconcile
diversity
this
family
constrained
evolution
protein,
generated
new,
state-of-the-art
genome
annotations
332
Saccharomycotina
species
spanning
∼400
My
evolution.
integrating
phylogenetic
phenotypic
analyses
across
these
species,
transporters
evolved
multifunctional
ancestor
became
subfunctionalized.
accumulation
additive
epistatic
substitutions
entrenched
subfunction,
made
simultaneous
acquisition
multiple
interacting
only
reasonably
accessible
path
novelty.