Communications Chemistry,
Journal Year:
2024,
Volume and Issue:
7(1)
Published: Feb. 14, 2024
Abstract
Metal-organic
frameworks
(MOFs)
exhibit
great
promise
for
CO
2
capture.
However,
finding
the
best
performing
materials
poses
computational
and
experimental
grand
challenges
in
view
of
vast
chemical
space
potential
building
blocks.
Here,
we
introduce
GHP-MOFassemble,
a
generative
artificial
intelligence
(AI),
high
performance
framework
rational
accelerated
design
MOFs
with
adsorption
capacity
synthesizable
linkers.
GHP-MOFassemble
generates
novel
linkers,
assembled
one
three
pre-selected
metal
nodes
(Cu
paddlewheel,
Zn
tetramer)
into
primitive
cubic
topology.
screens
validates
AI-generated
uniqueness,
synthesizability,
structural
validity,
uses
molecular
dynamics
simulations
to
study
their
stability
consistency,
crystal
graph
neural
networks
Grand
Canonical
Monte
Carlo
quantify
capacities.
We
present
top
six
capacities
greater
than
2m
mol
g
−1
,
i.e.,
higher
96.9%
structures
hypothetical
MOF
dataset.
Nature Machine Intelligence,
Journal Year:
2024,
Volume and Issue:
6(4), P. 417 - 427
Published: April 11, 2024
Abstract
Fragment-based
drug
discovery
has
been
an
effective
paradigm
in
early-stage
development.
An
open
challenge
this
area
is
designing
linkers
between
disconnected
molecular
fragments
of
interest
to
obtain
chemically
relevant
candidate
molecules.
In
work,
we
propose
DiffLinker,
E(3)-equivariant
three-dimensional
conditional
diffusion
model
for
linker
design.
Given
a
set
fragments,
our
places
missing
atoms
and
designs
molecule
incorporating
all
the
initial
fragments.
Unlike
previous
approaches
that
are
only
able
connect
pairs
method
can
link
arbitrary
number
Additionally,
automatically
determines
its
attachment
points
input
We
demonstrate
DiffLinker
outperforms
other
methods
on
standard
datasets,
generating
more
diverse
synthetically
accessible
experimentally
test
real-world
applications,
showing
it
successfully
generate
valid
conditioned
target
protein
pockets.
Briefings in Bioinformatics,
Journal Year:
2024,
Volume and Issue:
25(2)
Published: Jan. 22, 2024
Abstract
Antimicrobial
peptides
(AMPs),
short
with
diverse
functions,
effectively
target
and
combat
various
organisms.
The
widespread
misuse
of
chemical
antibiotics
has
led
to
increasing
microbial
resistance.
Due
their
low
drug
resistance
toxicity,
AMPs
are
considered
promising
substitutes
for
traditional
antibiotics.
While
existing
deep
learning
technology
enhances
AMP
generation,
it
also
presents
certain
challenges.
Firstly,
generation
overlooks
the
complex
interdependencies
among
amino
acids.
Secondly,
current
models
fail
integrate
crucial
tasks
like
screening,
attribute
prediction
iterative
optimization.
Consequently,
we
develop
a
integrated
framework,
Diff-AMP,
that
automates
identification,
We
innovatively
kinetic
diffusion
attention
mechanisms
into
reinforcement
framework
efficient
generation.
Additionally,
our
module
incorporates
pre-training
transfer
strategies
precise
identification
screening.
employ
convolutional
neural
network
multi-attribute
learning-based
optimization
strategy
produce
AMPs.
This
molecule
optimization,
thereby
advancing
research.
have
deployed
Diff-AMP
on
web
server,
code,
data
server
details
available
in
Data
Availability
section.
ACM Computing Surveys,
Journal Year:
2025,
Volume and Issue:
unknown
Published: Jan. 26, 2025
Large
Language
Models
(LLMs)
have
emerged
as
a
transformative
power
in
enhancing
natural
language
comprehension,
representing
significant
stride
toward
artificial
general
intelligence.
The
application
of
LLMs
extends
beyond
conventional
linguistic
boundaries,
encompassing
specialized
systems
developed
within
various
scientific
disciplines.
This
growing
interest
has
led
to
the
advent
LLMs,
novel
subclass
specifically
engineered
for
facilitating
discovery.
As
burgeoning
area
community
AI
Science,
warrant
comprehensive
exploration.
However,
systematic
and
up-to-date
survey
introducing
them
is
currently
lacking.
In
this
paper,
we
endeavor
methodically
delineate
concept
“scientific
language”,
whilst
providing
thorough
review
latest
advancements
LLMs.
Given
expansive
realm
disciplines,
our
analysis
adopts
focused
lens,
concentrating
on
biological
chemical
domains.
includes
an
in-depth
examination
textual
knowledge,
small
molecules,
macromolecular
proteins,
genomic
sequences,
their
combinations,
analyzing
terms
model
architectures,
capabilities,
datasets,
evaluation.
Finally,
critically
examine
prevailing
challenges
point
out
promising
research
directions
along
with
advances
By
offering
overview
technical
developments
field,
aspires
be
invaluable
resource
researchers
navigating
intricate
landscape
arXiv (Cornell University),
Journal Year:
2021,
Volume and Issue:
unknown
Published: Jan. 1, 2021
Molecular
graph
representation
learning
is
a
fundamental
problem
in
modern
drug
and
material
discovery.
graphs
are
typically
modeled
by
their
2D
topological
structures,
but
it
has
been
recently
discovered
that
3D
geometric
information
plays
more
vital
role
predicting
molecular
functionalities.
However,
the
lack
of
real-world
scenarios
significantly
impeded
representation.
To
cope
with
this
challenge,
we
propose
Graph
Multi-View
Pre-training
(GraphMVP)
framework
where
self-supervised
(SSL)
performed
leveraging
correspondence
consistency
between
structures
views.
GraphMVP
effectively
learns
encoder
enhanced
richer
discriminative
geometry.
We
further
provide
theoretical
insights
to
justify
effectiveness
GraphMVP.
Finally,
comprehensive
experiments
show
can
consistently
outperform
existing
SSL
methods.
Molecular
representation
learning
(MRL)
has
gained
tremendous
attention
due
to
its
critical
role
in
from
limited
supervised
data
for
applications
like
drug
design.
In
most
MRL
methods,
molecules
are
treated
as
1D
sequential
tokens
or
2D
topology
graphs,
limiting
their
ability
incorporate
3D
information
downstream
tasks
and,
particular,
making
it
almost
impossible
geometry
prediction
generation.
Herein,
we
propose
Uni-Mol,
a
universal
framework
that
significantly
enlarges
the
and
application
scope
of
schemes.
Uni-Mol
is
composed
two
models
with
same
SE(3)-equivariant
transformer
architecture:
molecular
pretraining
model
trained
by
209M
conformations;
pocket
3M
candidate
protein
data.
The
used
independently
separate
tasks,
combined
when
protein-ligand
binding
tasks.
By
properly
incorporating
information,
outperforms
SOTA
14/15
property
Moreover,
achieves
superior
performance
spatial
including
pose
prediction,
conformation
generation,
etc.
Finally,
show
can
be
successfully
applied
few-shot
druggability
prediction.
will
made
publicly
available
at
\url{https://github.com/dptech-corp/Uni-Mol}
ACS Central Science,
Journal Year:
2023,
Volume and Issue:
9(4), P. 563 - 581
Published: March 10, 2023
The
vastness
of
the
materials
design
space
makes
it
impractical
to
explore
using
traditional
brute-force
methods,
particularly
in
reticular
chemistry.
However,
machine
learning
has
shown
promise
expediting
and
guiding
design.
Despite
numerous
successful
applications
materials,
progress
field
stagnated,
possibly
because
digital
chemistry
is
more
an
art
than
a
science
its
limited
accessibility
inexperienced
researchers.
To
address
this
issue,
we
present
mofdscribe,
software
ecosystem
tailored
novice
seasoned
chemists
that
streamlines
ideation,
modeling,
publication
process.
Though
optimized
for
chemistry,
our
tools
are
versatile
can
be
used
nonreticular
research.
We
believe
mofdscribe
will
enable
reliable,
efficient,
comparable
Journal of Chemical Information and Modeling,
Journal Year:
2023,
Volume and Issue:
64(7), P. 2174 - 2194
Published: Nov. 7, 2023
The
discovery
of
new
drugs
has
important
implications
for
human
health.
Traditional
methods
drug
rely
on
experiments
to
optimize
the
structure
lead
molecules,
which
are
time-consuming
and
high-cost.
Recently,
artificial
intelligence
exhibited
promising
efficient
performance
drug-like
molecule
generation.
In
particular,
deep
generative
models
achieve
great
success
in
de
novo
generation
molecules
with
desired
properties,
showing
massive
potential
novel
discovery.
this
study,
we
review
recent
progress
using
models,
mainly
focusing
representations,
public
databases,
data
processing
tools,
advanced
based
frameworks.
present
a
comprehensive
comparison
state-of-the-art
summary
commonly
used
molecular
design
strategies.
We
identify
research
gaps
challenges
such
as
need
better
missing
3D
information
representation,
lack
high-precision
evaluation
metrics.
suggest
future
directions
Proceedings of the AAAI Conference on Artificial Intelligence,
Journal Year:
2023,
Volume and Issue:
37(4), P. 5105 - 5112
Published: June 26, 2023
Molecule
generation,
especially
generating
3D
molecular
geometries
from
scratch
(i.e.,
de
novo
generation),
has
become
a
fundamental
task
in
drug
design.
Existing
diffusion
based
molecule
generation
methods
could
suffer
unsatisfactory
performances,
when
large
molecules.
At
the
same
time,
generated
molecules
lack
enough
diversity.
This
paper
proposes
novel
model
to
address
those
two
challenges.
First,
interatomic
relations
are
not
included
molecules'
point
cloud
representations.
Thus,
it
is
difficult
for
existing
generative
models
capture
potential
forces
and
abundant
local
constraints.
To
tackle
this
challenge,
we
propose
augment
further
involve
dual
equivariant
encoders
encode
of
different
strengths.
Second,
diffusion-based
essentially
shift
elements
geometry
along
gradient
data
density.
Such
process
lacks
exploration
intermediate
steps
Langevin
dynamics.
issue,
introduce
distributional
controlling
variable
each
diffusion/reverse
step
enforce
thorough
explorations
improve
Extensive
experiments
on
multiple
benchmarks
demonstrate
that
proposed
significantly
outperforms
both
unconditional
conditional
tasks.
We
also
conduct
case
studies
help
understand
physicochemical
properties
The
codes
available
at
https://github.com/tencent-ailab/MDM.
Deep
learning
has
achieved
remarkable
success
in
representations
for
molecules,
which
is
crucial
various
biochemical
applications,
ranging
from
property
prediction
to
drug
design.
However,
training
Neural
Networks
(DNNs)
scratch
often
requires
abundant
labeled
are
expensive
acquire
the
real
world.
To
alleviate
this
issue,
tremendous
efforts
have
been
devoted
Chemical
Pre-trained
Models
(CPMs),
where
DNNs
pre-trained
using
large-scale
unlabeled
molecular
databases
and
then
fine-tuned
over
specific
downstream
tasks.
Despite
prosperity,
there
lacks
a
systematic
review
of
fast-growing
field.
In
paper,
we
present
first
survey
that
summarizes
current
progress
CPMs.
We
highlight
limitations
representation
models
motivate
CPM
studies.
Next,
systematically
recent
advances
on
topic
several
key
perspectives,
including
descriptors,
encoder
architectures,
pre-training
strategies,
applications.
also
challenges
promising
avenues
future
research,
providing
useful
resource
both
machine
scientific
communities.