Abstract
Protein
structure
prediction
(PSP)
has
been
a
prominent
topic
in
bioinformatics
and
computational
biology,
aiming
to
predict
protein
function
from
sequence
data.
The
three‐dimensional
conformation
of
proteins
is
pivotal
for
their
intricate
biological
roles.
With
the
advancement
capabilities
adoption
deep
learning
(DL)
technologies
(especially
Transformer
network
architectures),
PSP
field
ushered
brand‐new
era
“neuralization.”
Here,
we
focus
on
reviewing
evolution
traditional
modern
learning‐based
approaches
characteristics
various
structural
methods.
This
emphasizes
advantages
hybrid
methods
over
approaches.
study
also
provides
summary
analysis
widely
used
databases
latest
models.
It
discusses
networks
algorithmic
optimization
model
training,
validation,
evaluation.
In
addition,
discussion
major
advances
presented.
update
AlphaFold
3
further
extends
boundaries
models,
especially
protein‐small
molecule
prediction.
marks
key
shift
toward
holistic
approach
biomolecular
elucidation,
at
solving
almost
all
sequence‐to‐structure
puzzles
phenomena.
Chemical Reviews,
Journal Year:
2024,
Volume and Issue:
124(4), P. 1899 - 1949
Published: Feb. 8, 2024
Macromolecular
crowding
affects
the
activity
of
proteins
and
functional
macromolecular
complexes
in
all
cells,
including
bacteria.
Crowding,
together
with
physicochemical
parameters
such
as
pH,
ionic
strength,
energy
status,
influences
structure
cytoplasm
thereby
indirectly
function.
Notably,
also
promotes
formation
biomolecular
condensates
by
phase
separation,
initially
identified
eukaryotic
cells
but
more
recently
discovered
to
play
key
functions
Bacterial
require
a
variety
mechanisms
maintain
homeostasis,
particular
environments
fluctuating
conditions,
is
emerging
one
mechanism.
In
this
work,
we
connect
homeostasis
function
bacterial
cell
compare
supramolecular
structures
found
bacteria
those
cells.
We
focus
on
effects
separation
control
chromosome
replication,
segregation,
division,
discuss
contribution
fitness
adaptation
environmental
stress.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: April 12, 2024
Protein-protein
interactions
(PPIs)
are
ubiquitous
in
biology,
yet
a
comprehensive
structural
characterization
of
the
PPIs
underlying
biochemical
processes
is
lacking.
Although
AlphaFold-Multimer
(AF-M)
has
potential
to
fill
this
knowledge
gap,
standard
AF-M
confidence
metrics
do
not
reliably
separate
relevant
from
an
abundance
false
positive
predictions.
To
address
limitation,
we
used
machine
learning
on
well
curated
datasets
train
Structure
Prediction
and
Omics
informed
Classifier
called
SPOC
that
shows
excellent
performance
separating
true
PPIs,
including
proteome-wide
screens.
We
applied
all-by-all
matrix
nearly
300
human
genome
maintenance
proteins,
generating
~40,000
predictions
can
be
viewed
at
predictomes.org,
where
users
also
score
their
own
with
SPOC.
High
discovered
using
our
approach
suggest
novel
hypotheses
maintenance.
Our
results
provide
framework
for
interpreting
large
scale
screens
help
lay
foundation
interactome.
Proceedings of the National Academy of Sciences,
Journal Year:
2024,
Volume and Issue:
121(26)
Published: June 20, 2024
Proteomics
has
been
revolutionized
by
large
protein
language
models
(PLMs),
which
learn
unsupervised
representations
from
corpora
of
sequences.
These
are
typically
fine-tuned
in
a
supervised
setting
to
adapt
the
model
specific
downstream
tasks.
However,
computational
and
memory
footprint
fine-tuning
(FT)
PLMs
presents
barrier
for
many
research
groups
with
limited
resources.
Natural
processing
seen
similar
explosion
size
models,
where
these
challenges
have
addressed
methods
parameter-efficient
(PEFT).
In
this
work,
we
introduce
paradigm
proteomics
through
leveraging
method
LoRA
training
new
two
important
tasks:
predicting
protein–protein
interactions
(PPIs)
symmetry
homooligomer
quaternary
structures.
We
show
that
approaches
competitive
traditional
FT
while
requiring
reduced
substantially
fewer
parameters.
additionally
PPI
prediction
task,
only
classification
head
also
remains
full
FT,
using
five
orders
magnitude
parameters,
each
outperform
state-of-the-art
compute.
further
perform
comprehensive
evaluation
hyperparameter
space,
demonstrate
PEFT
is
robust
variations
hyperparameters,
elucidate
best
practices
differ
those
natural
processing.
All
our
adaptation
code
available
open-source
at
https://github.com/microsoft/peft_proteomics
.
Thus,
provide
blueprint
democratize
power
PLM
Molecular Cell,
Journal Year:
2025,
Volume and Issue:
unknown
Published: Feb. 1, 2025
Protein-protein
interactions
(PPIs)
are
ubiquitous
in
biology,
yet
a
comprehensive
structural
characterization
of
the
PPIs
underlying
cellular
processes
is
lacking.
AlphaFold-Multimer
(AF-M)
has
potential
to
fill
this
knowledge
gap,
but
standard
AF-M
confidence
metrics
do
not
reliably
separate
relevant
from
an
abundance
false
positive
predictions.
To
address
limitation,
we
used
machine
learning
on
curated
datasets
train
structure
prediction
and
omics-informed
classifier
(SPOC)
that
effectively
separates
true
predictions
PPIs,
including
proteome-wide
screens.
We
applied
SPOC
all-by-all
matrix
nearly
300
human
genome
maintenance
proteins,
generating
∼40,000
can
be
viewed
at
predictomes.org,
where
users
also
score
their
own
with
SPOC.
High-confidence
discovered
using
our
approach
enable
hypothesis
generation
maintenance.
Our
results
provide
framework
for
interpreting
large-scale
screens
help
lay
foundation
interactome.
Nature Methods,
Journal Year:
2025,
Volume and Issue:
unknown
Published: Feb. 5, 2025
Abstract
Advances
in
computational
structure
prediction
will
vastly
augment
the
hundreds
of
thousands
currently
available
protein
complex
structures.
Translating
these
into
discoveries
requires
aligning
them,
which
is
computationally
prohibitive.
Foldseek-Multimer
computes
alignments
from
compatible
chain-to-chain
alignments,
identified
by
efficiently
clustering
their
superposition
vectors.
3–4
orders
magnitudes
faster
than
gold
standard,
while
producing
comparable
alignments;
this
allows
it
to
compare
billions
pairs
11
h.
open-source
software
at
GitHub
via
https://github.com/steineggerlab/foldseek/
,
https://search.foldseek.com/search/
and
BFMD
database.
Journal of Chemical Theory and Computation,
Journal Year:
2025,
Volume and Issue:
unknown
Published: Feb. 5, 2025
Biomolecular
interactions
are
essential
in
many
biological
processes,
including
complex
formation
and
phase
separation
processes.
Coarse-grained
computational
models
especially
valuable
for
studying
such
processes
via
simulation.
Here,
we
present
COCOMO2,
an
updated
residue-based
coarse-grained
model
that
extends
its
applicability
from
intrinsically
disordered
peptides
to
folded
proteins.
This
is
accomplished
with
the
introduction
of
a
surface
exposure
scaling
factor,
which
adjusts
interaction
strengths
based
on
solvent
accessibility,
enable
more
realistic
modeling
involving
domains
without
additional
costs.
COCOMO2
was
parametrized
directly
solubility
data
improve
performance
predicting
concentration-dependent
broader
range
biomolecular
systems
compared
original
version.
enables
new
applications
study
condensates
involve
IDPs
together
assembly
also
provides
expanded
foundation
development
multiscale
approaches
span
residue-level
atomistic
resolution.
Science,
Journal Year:
2025,
Volume and Issue:
unknown
Published: March 13, 2025
Homo-oligomerization
of
biological
macromolecules
leads
to
functional
assemblies
that
are
critical
understanding
various
cellular
processes.
However,
RNA
quaternary
structures
have
been
rarely
reported.
Comparative
genomics
analysis
has
identified
families
containing
hundreds
sequences
adopt
conserved
secondary
and
likely
fold
into
complex
three-dimensional
(3D)
structures.
We
use
cryo-electron
microscopy
(cryo-EM)
determine
from
four
families,
including
ARRPOF
OLE
forming
dimers,
ROOL
GOLLD
hexameric,
octameric
dodecameric
nanostructures,
at
2.6
4.6
Å
resolutions.
These
homo-oligomeric
reveal
a
plethora
structural
motifs
contribute
multivalency,
kissing
loop,
palindromic
base-pairing,
A-stacking,
metal
ion
coordination,
pseudoknot
minor-groove
interactions.
results
provide
the
molecular
basis
intermolecular
interactions
driving
multivalency
with
potential
relevance.
Science,
Journal Year:
2024,
Volume and Issue:
386(6720), P. 439 - 445
Published: Oct. 24, 2024
Machine
learning
(ML)–based
design
approaches
have
advanced
the
field
of
de
novo
protein
design,
with
diffusion-based
generative
methods
increasingly
dominating
pipelines.
Here,
we
report
a
“hallucination”-based
approach
that
functions
in
relaxed
sequence
space,
enabling
efficient
high-quality
backbones
over
multiple
scales
and
broad
scope
application
without
need
for
any
form
retraining.
We
experimentally
produced
characterized
more
than
100
proteins.
Three
high-resolution
crystal
structures
two
cryo–electron
microscopy
density
maps
designed
single-chain
proteins
comprising
up
to
1000
amino
acids
validate
accuracy
method.
Our
pipeline
can
also
be
used
synthetic
protein-protein
interactions,
as
validated
by
set
heterodimers.
Relaxed
optimization
offers
attractive
performance
respect
designability,
applicability
different
problems,
scalability
across
sizes.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: May 29, 2024
Abstract
Genome
sequencing
efforts
have
led
to
the
discovery
of
tens
millions
protein
missense
variants
found
in
human
population
with
majority
these
having
no
annotated
role
and
some
likely
contributing
trait
variation
disease.
Sequence-based
artificial
intelligence
approaches
become
highly
accurate
at
predicting
that
are
detrimental
function
proteins
but
they
do
not
inform
on
mechanisms
disruption.
Here
we
combined
sequence
structure-based
methods
perform
proteome-wide
prediction
deleterious
information
their
impact
stability,
protein-protein
interactions
small-molecule
binding
pockets.
AlphaFold2
structures
were
used
predict
approximately
100,000
pockets
stability
changes
for
over
200
million
variants.
To
interfaces
nearly
500,000
complexes.
We
illustrate
value
mechanism-aware
variant
effect
predictions
study
relation
between
abundance
structural
properties
underlying
trans
quantitative
loci
(pQTLs).
characterised
distribution
mechanistic
impacts
patients
experimentally
studied
example
disease
linked
FGFR1.