Beilstein Journal of Organic Chemistry,
Год журнала:
2024,
Номер
20, С. 2476 - 2492
Опубликована: Окт. 4, 2024
This
review
surveys
the
recent
advances
and
challenges
in
predicting
optimizing
reaction
conditions
using
machine
learning
techniques.
The
paper
emphasizes
importance
of
acquiring
processing
large
diverse
datasets
chemical
reactions,
use
both
global
local
models
to
guide
design
synthetic
processes.
Global
exploit
information
from
comprehensive
databases
suggest
general
for
new
while
fine-tune
specific
parameters
a
given
family
improve
yield
selectivity.
also
identifies
current
limitations
opportunities
this
field,
such
as
data
quality
availability,
integration
high-throughput
experimentation.
demonstrates
how
combination
engineering,
science,
ML
algorithms
can
enhance
efficiency
effectiveness
design,
enable
novel
discoveries
chemistry.
Engineering,
Год журнала:
2021,
Номер
7(9), С. 1201 - 1211
Опубликована: Июль 29, 2021
Chemical
engineers
rely
on
models
for
design,
research,
and
daily
decision-making,
often
with
potentially
large
financial
safety
implications.
Previous
efforts
a
few
decades
ago
to
combine
artificial
intelligence
chemical
engineering
modeling
were
unable
fulfill
the
expectations.
In
last
five
years,
increasing
availability
of
data
computational
resources
has
led
resurgence
in
machine
learning-based
research.
Many
recent
have
facilitated
roll-out
learning
techniques
research
field
by
developing
databases,
benchmarks,
representations
applications
new
frameworks.
Machine
significant
advantages
over
traditional
techniques,
including
flexibility,
accuracy,
execution
speed.
These
strengths
also
come
weaknesses,
such
as
lack
interpretability
these
black-box
models.
The
greatest
opportunities
involve
using
time-limited
real-time
optimization
planning
that
require
high
accuracy
can
build
self-learning
ability
recognize
patterns,
learn
from
data,
become
more
intelligent
time.
threat
today
is
inappropriate
use
because
most
had
limited
training
computer
science
analysis.
Nevertheless,
will
definitely
trustworthy
element
toolbox
engineers.
Chemical Reviews,
Год журнала:
2023,
Номер
123(13), С. 8736 - 8780
Опубликована: Июнь 29, 2023
Small
data
are
often
used
in
scientific
and
engineering
research
due
to
the
presence
of
various
constraints,
such
as
time,
cost,
ethics,
privacy,
security,
technical
limitations
acquisition.
However,
big
have
been
focus
for
past
decade,
small
their
challenges
received
little
attention,
even
though
they
technically
more
severe
machine
learning
(ML)
deep
(DL)
studies.
Overall,
challenge
is
compounded
by
issues,
diversity,
imputation,
noise,
imbalance,
high-dimensionality.
Fortunately,
current
era
characterized
technological
breakthroughs
ML,
DL,
artificial
intelligence
(AI),
which
enable
data-driven
discovery,
many
advanced
ML
DL
technologies
developed
inadvertently
provided
solutions
problems.
As
a
result,
significant
progress
has
made
decade.
In
this
review,
we
summarize
analyze
several
emerging
potential
molecular
science,
including
chemical
biological
sciences.
We
review
both
basic
algorithms,
linear
regression,
logistic
regression
(LR),
Nucleic Acids Research,
Год журнала:
2021,
Номер
50(D1), С. D693 - D700
Опубликована: Ноя. 9, 2021
Abstract
Rhea
(https://www.rhea-db.org)
is
an
expert-curated
knowledgebase
of
biochemical
reactions
based
on
the
chemical
ontology
ChEBI
(Chemical
Entities
Biological
Interest)
(https://www.ebi.ac.uk/chebi).
In
this
paper,
we
describe
a
number
key
developments
in
since
our
last
report
database
issue
Nucleic
Acids
Research
2019.
These
include
improved
reaction
coverage
Rhea,
adoption
as
reference
vocabulary
for
enzyme
annotation
UniProt
UniProtKB
(https://www.uniprot.org),
development
new
website,
and
designation
ELIXIR
Core
Data
Resource.
We
hope
that
these
other
will
enhance
utility
resource
to
study
engineer
enzymes
metabolic
systems
which
they
function.
Machine Learning Science and Technology,
Год журнала:
2021,
Номер
3(1), С. 015022 - 015022
Опубликована: Дек. 7, 2021
Abstract
Transformer
models
coupled
with
a
simplified
molecular
line
entry
system
(SMILES)
have
recently
proven
to
be
powerful
combination
for
solving
challenges
in
cheminformatics.
These
models,
however,
are
often
developed
specifically
single
application
and
can
very
resource-intensive
train.
In
this
work
we
present
the
Chemformer
model—a
Transformer-based
model
which
quickly
applied
both
sequence-to-sequence
discriminative
cheminformatics
tasks.
Additionally,
show
that
self-supervised
pre-training
improve
performance
significantly
speed
up
convergence
on
downstream
On
direct
synthesis
retrosynthesis
prediction
benchmark
datasets
publish
state-of-the-art
results
top-1
accuracy.
We
also
existing
approaches
optimisation
task
optimise
multiple
tasks
simultaneously.
Models,
code
will
made
available
after
publication.
Briefings in Bioinformatics,
Год журнала:
2023,
Номер
25(1)
Опубликована: Ноя. 22, 2023
Abstract
Recently,
attention
mechanism
and
derived
models
have
gained
significant
traction
in
drug
development
due
to
their
outstanding
performance
interpretability
handling
complex
data
structures.
This
review
offers
an
in-depth
exploration
of
the
principles
underlying
attention-based
advantages
discovery.
We
further
elaborate
on
applications
various
aspects
development,
from
molecular
screening
target
binding
property
prediction
molecule
generation.
Finally,
we
discuss
current
challenges
faced
application
mechanisms
Artificial
Intelligence
technologies,
including
quality,
model
computational
resource
constraints,
along
with
future
directions
for
research.
Given
accelerating
pace
technological
advancement,
believe
that
will
increasingly
prominent
role
anticipate
these
usher
revolutionary
breakthroughs
pharmaceutical
domain,
significantly
development.
Briefings in Bioinformatics,
Год журнала:
2021,
Номер
23(1)
Опубликована: Сен. 21, 2021
Artificial
intelligence
(AI)
has
been
transforming
the
practice
of
drug
discovery
in
past
decade.
Various
AI
techniques
have
used
many
applications,
such
as
virtual
screening
and
design.
In
this
survey,
we
first
give
an
overview
on
discuss
related
which
can
be
reduced
to
two
major
tasks,
i.e.
molecular
property
prediction
molecule
generation.
We
then
present
common
data
resources,
representations
benchmark
platforms.
As
a
part
are
dissected
into
model
architectures
learning
paradigms.
To
reflect
technical
development
over
years,
surveyed
works
organized
chronologically.
expect
that
survey
provides
comprehensive
review
discovery.
also
provide
GitHub
repository
with
collection
papers
(and
codes,
if
applicable)
resource,
is
regularly
updated.
Nature Communications,
Год журнала:
2023,
Номер
14(1)
Опубликована: Июль 11, 2023
Polymers
are
a
vital
part
of
everyday
life.
Their
chemical
universe
is
so
large
that
it
presents
unprecedented
opportunities
as
well
significant
challenges
to
identify
suitable
application-specific
candidates.
We
present
complete
end-to-end
machine-driven
polymer
informatics
pipeline
can
search
this
space
for
candidates
at
speed
and
accuracy.
This
includes
fingerprinting
capability
called
polyBERT
(inspired
by
Natural
Language
Processing
concepts),
multitask
learning
approach
maps
the
fingerprints
host
properties.
linguist
treats
structure
polymers
language.
The
outstrips
best
presently
available
concepts
property
prediction
based
on
handcrafted
fingerprint
schemes
in
two
orders
magnitude
while
preserving
accuracy,
thus
making
strong
candidate
deployment
scalable
architectures
including
cloud
infrastructures.
Recent
years
have
witnessed
the
prosperity
of
pre-training
graph
neural
networks
(GNNs)
for
molecules.
Typically,
atom
types
as
node
attributes
are
randomly
masked
and
GNNs
then
trained
to
predict
in
AttrMask
\citep{hu2020strategies},
following
Masked
Language
Modeling
(MLM)
task
BERT~\citep{devlin2019bert}.
However,
unlike
MLM
where
vocabulary
is
large,
does
not
learn
informative
molecular
representations
due
small
unbalanced
`vocabulary'.
To
amend
this
problem,
we
propose
a
variant
VQ-VAE~\citep{van2017neural}
context-aware
tokenizer
encode
into
chemically
meaningful
discrete
codes.
This
can
enlarge
size
mitigate
quantitative
divergence
between
dominant
(e.g.,
carbons)
rare
atoms
phosphorus).
With
enlarged
`vocabulary',
novel
node-level
task,
dubbed
Atoms
(MAM),
mask
some
codes
pre-train
them.
MAM
also
mitigates
another
issue
AttrMask,
namely
negative
transfer.
It
be
easily
combined
with
various
tasks
improve
their
performance.
Furthermore,
triplet
contrastive
learning
(TMCL)
graph-level
model
heterogeneous
semantic
similarity
molecules
effective
molecule
retrieval.
TMCL
constitute
framework,
Mole-BERT,
which
match
or
outperform
state-of-the-art
methods
fully
data-driven
manner.
We
release
code
at
\textcolor{magenta}{\url{https://github.com/junxia97/Mole-BERT}}.