Nature Biotechnology,
Год журнала:
2024,
Номер
unknown
Опубликована: Апрель 23, 2024
In
recent
years,
generative
protein
sequence
models
have
been
developed
to
sample
novel
sequences.
However,
predicting
whether
generated
proteins
will
fold
and
function
remains
challenging.
We
evaluate
a
set
of
20
diverse
computational
metrics
assess
the
quality
enzyme
sequences
produced
by
three
contrasting
models:
ancestral
reconstruction,
adversarial
network
language
model.
Focusing
on
two
families,
we
expressed
purified
over
500
natural
with
70-90%
identity
most
similar
benchmark
for
in
vitro
activity.
Over
rounds
experiments,
filter
that
improved
rate
experimental
success
50-150%.
The
proposed
drive
engineering
research
serving
as
helping
select
active
variants
testing.
Journal of Medical Internet Research,
Год журнала:
2024,
Номер
26, С. e59505 - e59505
Опубликована: Авг. 20, 2024
In
the
complex
and
multidimensional
field
of
medicine,
multimodal
data
are
prevalent
crucial
for
informed
clinical
decisions.
Multimodal
span
a
broad
spectrum
types,
including
medical
images
(eg,
MRI
CT
scans),
time-series
sensor
from
wearable
devices
electronic
health
records),
audio
recordings
heart
respiratory
sounds
patient
interviews),
text
notes
research
articles),
videos
surgical
procedures),
omics
genomics
proteomics).
While
advancements
in
large
language
models
(LLMs)
have
enabled
new
applications
knowledge
retrieval
processing
field,
most
LLMs
remain
limited
to
unimodal
data,
typically
text-based
content,
often
overlook
importance
integrating
diverse
modalities
encountered
practice.
This
paper
aims
present
detailed,
practical,
solution-oriented
perspective
on
use
(M-LLMs)
field.
Our
investigation
spanned
M-LLM
foundational
principles,
current
potential
applications,
technical
ethical
challenges,
future
directions.
By
connecting
these
elements,
we
aimed
provide
comprehensive
framework
that
links
aspects
M-LLMs,
offering
unified
vision
their
care.
approach
guide
both
practical
implementations
M-LLMs
care,
positioning
them
as
paradigm
shift
toward
integrated,
data–driven
We
anticipate
this
work
will
spark
further
discussion
inspire
development
innovative
approaches
next
generation
systems.
Directed
protein
evolution
is
central
to
biomedical
applications
but
faces
challenges
like
experimental
complexity,
inefficient
multi-property
optimization,
and
local
maxima
traps.
While
in
silico
methods
using
language
models
(PLMs)
can
provide
modeled
fitness
landscape
guidance,
they
struggle
generalize
across
diverse
families
map
activity.
We
present
EVOLVEpro,
a
few-shot
active
learning
framework
that
combines
PLMs
regression
rapidly
improve
EVOLVEpro
surpasses
current
methods,
yielding
up
100-fold
improvements
desired
properties.
demonstrate
its
effectiveness
six
proteins
RNA
production,
genome
editing,
antibody
binding
applications.
These
results
highlight
the
advantages
of
with
minimal
data
over
zero-shot
predictions.
opens
new
possibilities
for
AI-guided
engineering
biology
medicine.
bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2024,
Номер
unknown
Опубликована: Фев. 8, 2024
Abstract
Large
pretrained
protein
language
models
(PLMs)
have
improved
property
and
structure
prediction
from
sequences
via
transfer
learning,
in
which
weights
representations
PLMs
are
repurposed
for
downstream
tasks.
Although
shown
great
promise,
currently
there
is
little
understanding
of
how
the
features
learned
by
pretraining
relate
to
useful
We
perform
a
systematic
analysis
learning
using
PLMs,
conducting
370
experiments
across
comprehensive
suite
factors
including
different
tasks,
architectures,
model
sizes,
depths,
time.
observe
that
while
almost
all
down-stream
tasks
do
benefit
compared
naive
sequence
representations,
majority
performance
does
not
scale
with
pretraining,
instead
relies
on
low-level
early
pretraining.
Our
results
point
mismatch
between
current
PLM
paradigms
most
applications
these
models,
indicating
need
better
methods.
Nature Biotechnology,
Год журнала:
2024,
Номер
unknown
Опубликована: Апрель 23, 2024
In
recent
years,
generative
protein
sequence
models
have
been
developed
to
sample
novel
sequences.
However,
predicting
whether
generated
proteins
will
fold
and
function
remains
challenging.
We
evaluate
a
set
of
20
diverse
computational
metrics
assess
the
quality
enzyme
sequences
produced
by
three
contrasting
models:
ancestral
reconstruction,
adversarial
network
language
model.
Focusing
on
two
families,
we
expressed
purified
over
500
natural
with
70-90%
identity
most
similar
benchmark
for
in
vitro
activity.
Over
rounds
experiments,
filter
that
improved
rate
experimental
success
50-150%.
The
proposed
drive
engineering
research
serving
as
helping
select
active
variants
testing.