How well do contextual protein encodings learn structure, function, and evolutionary context?
Cell Systems,
Год журнала:
2025,
Номер
16(3), С. 101201 - 101201
Опубликована: Март 1, 2025
Язык: Английский
Language models for protein design
Current Opinion in Structural Biology,
Год журнала:
2025,
Номер
92, С. 103027 - 103027
Опубликована: Март 6, 2025
Язык: Английский
Scaling unlocks broader generation and deeper functional understanding of proteins
bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2025,
Номер
unknown
Опубликована: Апрель 16, 2025
Abstract
Generative
protein
language
models
(PLMs)
are
powerful
tools
for
designing
proteins
purpose-built
to
solve
problems
in
medicine,
agriculture,
and
industrial
processes.
Recent
work
has
trained
ever
larger
models,
but
there
been
little
systematic
study
of
the
optimal
training
distributions
influence
model
scale
on
sequences
generated
by
PLMs.
We
introduce
ProGen3
family
sparse
generative
PLMs,
we
develop
compute-optimal
scaling
laws
up
a
46B-parameter
pre-trained
1.5T
amino
acid
tokens.
ProGen3’s
pre-training
data
is
sampled
from
an
optimized
distribution
over
Profluent
Protein
Atlas
v1,
carefully
curated
dataset
3.4B
full-length
proteins.
evaluate
first
time
wet
lab
find
that
generate
viable
much
wider
diversity
families.
Finally,
both
computationally
experimentally
more
responsive
alignment
with
laboratory
data,
resulting
improved
fitness
prediction
sequence
generation
capabilities.
These
results
indicate
PLMs
like
ProGen3-46B
larger,
well-curated
datasets
foundation
push
frontier
design.
1
Язык: Английский
Tracing the stepwise Darwinian evolution of a plant halogenase
bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2024,
Номер
unknown
Опубликована: Дек. 11, 2024
Abstract
Halogenation
chemistry
is
rare
in
plant
metabolism,
with
the
chloroalkaloid
acutumine
produced
by
Menispermaceae
species
being
only
well
characterized
example,
involving
a
specialized
dechloroacutumine
halogenase
(DAH)
from
iron(II)-
and
2-oxoglutarate-dependent
dioxygenase
(2ODD)
superfamily.
While
DAH
presumed
to
have
evolved
an
ancestral
2ODD
enzyme,
broader
question
of
how
new
enzymes
arise
through
Darwinian
processes,
such
as
birth
Menispermaceae,
remains
fundamental
challenge
understanding
metabolic
evolution.
Here,
we
investigate
DAH’s
evolutionary
trajectory
using
chromosomal-level
genome
assembly
Menispermum
canadense
.
By
analyzing
genomic
context
M.
syntenic
regions
related
plants,
show
that
tandem
duplication
flavonol
synthase
(
FLS
)
gene,
followed
series
neofunctionalization
gene
loss
events.
Through
structural
modeling,
molecular
dynamics
simulations,
site-directed
mutagenesis,
identify
residue
changes
enabling
transition
DAH.
This
functional
switch
required
traversing
complex
landscape
where
adaptive
peaks
were
separated
deep
fitness
valleys.
Our
work
illustrates
enzymatic
functions
can
lineage-specific
pathways
gradually
reshape
active
site
architecture
permissive
mutations,
ultimately
mechanism-switching
mutations
establish
novel
catalytic
activities.
Язык: Английский
AI-generated small binder improves prime editing
bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2024,
Номер
unknown
Опубликована: Сен. 14, 2024
Abstract
The
prime
editing
2
(PE2)
system
comprises
a
nickase
Cas9
fused
to
reverse
transcriptase
utilizing
guide
RNA
(pegRNA)
introduce
desired
mutations
at
target
genomic
sites.
However,
the
PE
efficiency
is
limited
by
mismatch
repair
(MMR)
that
excises
DNA
strand
containing
edits.
Thus,
inhibiting
key
components
of
MMR
complex
through
transient
expression
dominant
negative
MLH1
(MLH1dn)
exhibited
approximately
7.7-fold
increase
in
over
PE2,
generating
PE4.
Herein,
generative
artificial
intelligence
(AI)
technologies,
RFdiffusion
and
AlphaFold
3,
we
ultimately
generated
de
novo
small
binder
(named
MLH1-SB),
which
bind
dimeric
interface
PMS2
disrupt
formation
components.
MLH1-SB’s
size
(82
amino
acids)
allowed
it
be
integrated
into
pre-existing
architectures
via
2A
system,
creating
novel
PE-SB
platform.
Resultantly,
incorporating
MLH1-SB
PE7,
have
developed
an
improved
architecture
called
PE7-SB,
demonstrates
highest
date
(29.4-fold
PE2
2.4-fold
PE7
HeLa
cells),
providing
insight
AI
technologies
will
boost
up
improvement
genome
tools.
Язык: Английский
Ai-Generated Small Binder Improves Prime Editing
Опубликована: Янв. 1, 2024
Язык: Английский
Re‐engineering of a carotenoid‐binding protein based on NMR structure
Protein Science,
Год журнала:
2024,
Номер
33(12)
Опубликована: Ноя. 16, 2024
Abstract
Recently,
a
number
of
message
passing
neural
network
(MPNN)‐based
methods
have
been
introduced
that,
based
on
backbone
atom
coordinates,
efficiently
recover
native
amino
acid
sequences
proteins
and
predict
modifications
that
result
in
better
expressing,
more
soluble,
stable
variants.
However,
usually,
X‐ray
structures,
or
artificial
structures
generated
by
algorithms
trained
were
employed
to
define
target
conformations.
Here,
we
show
commonly
used
ProteinMPNN
SolubleMPNN
display
low
sequence
recovery
determined
using
NMR.
We
subsequently
propose
computational
approach
successfully
apply
re‐engineer
AstaP,
protein
natively
binds
large
hydrophobic
ligand
astaxanthin
(C
40
H
52
O
4
),
for
which
only
structure
NMR
is
currently
available.
The
engineered
variants,
designated
NeuroAstaP,
are
51
shorter
than
the
22
kDa
parent
protein,
38%–42%
identity
it,
exhibit
good
yields,
expressed
mostly
monomeric
form,
demonstrate
efficient
binding
carotenoids
vitro
cells.
Altogether,
our
work
further
tests
limits
machine
learning
engineering
paves
way
MPNN‐based
modification
NMR‐derived
structures.
Язык: Английский