arXiv (Cornell University),
Год журнала:
2024,
Номер
unknown
Опубликована: Янв. 1, 2024
Large
language
models
(LLMs)
are
a
class
of
artificial
intelligence
based
on
deep
learning,
which
have
great
performance
in
various
tasks,
especially
natural
processing
(NLP).
typically
consist
neural
networks
with
numerous
parameters,
trained
large
amounts
unlabeled
input
using
self-supervised
or
semi-supervised
learning.
However,
their
potential
for
solving
bioinformatics
problems
may
even
exceed
proficiency
modeling
human
language.
In
this
review,
we
will
present
summary
the
prominent
used
processing,
such
as
BERT
and
GPT,
focus
exploring
applications
at
different
omics
levels
bioinformatics,
mainly
including
genomics,
transcriptomics,
proteomics,
drug
discovery
single
cell
analysis.
Finally,
review
summarizes
prospects
bioinformatic
problems.
Drug Discovery Today,
Год журнала:
2024,
Номер
29(6), С. 104009 - 104009
Опубликована: Апрель 30, 2024
AI
techniques
are
making
inroads
into
the
field
of
drug
discovery.
As
a
result,
growing
number
drugs
and
vaccines
have
been
discovered
using
AI.
However,
questions
remain
about
success
these
molecules
in
clinical
trials.
To
address
questions,
we
conducted
first
analysis
pipelines
AI-native
Biotech
companies.
In
Phase
I
find
AI-discovered
an
80–90%
rate,
substantially
higher
than
historic
industry
averages.
This
suggests,
argue,
that
is
highly
capable
designing
or
identifying
with
drug-like
properties.
II
rate
∼40%,
albeit
on
limited
sample
size,
comparable
to
Our
findings
highlight
early
signs
potential
for
molecules.
Bioinformatics Advances,
Год журнала:
2024,
Номер
4(1)
Опубликована: Янв. 1, 2024
Abstract
Summary
Network
biology
is
an
interdisciplinary
field
bridging
computational
and
biological
sciences
that
has
proved
pivotal
in
advancing
the
understanding
of
cellular
functions
diseases
across
systems
scales.
Although
been
around
for
two
decades,
it
remains
nascent.
It
witnessed
rapid
evolution,
accompanied
by
emerging
challenges.
These
stem
from
various
factors,
notably
growing
complexity
volume
data
together
with
increased
diversity
types
describing
different
tiers
organization.
We
discuss
prevailing
research
directions
network
biology,
focusing
on
molecular/cellular
networks
but
also
other
such
as
biomedical
knowledge
graphs,
patient
similarity
networks,
brain
social/contact
relevant
to
disease
spread.
In
more
detail,
we
highlight
areas
inference
comparison
multimodal
integration
heterogeneous
higher-order
analysis,
machine
learning
network-based
personalized
medicine.
Following
overview
recent
breakthroughs
these
five
areas,
offer
a
perspective
future
biology.
Additionally,
scientific
communities,
educational
initiatives,
importance
fostering
within
field.
This
article
establishes
roadmap
immediate
long-term
vision
Availability
implementation
Not
applicable.
Nature Machine Intelligence,
Год журнала:
2024,
Номер
6(4), С. 437 - 448
Опубликована: Март 29, 2024
Abstract
Generative
machine
learning
models
have
attracted
intense
interest
for
their
ability
to
sample
novel
molecules
with
desired
chemical
or
biological
properties.
Among
these,
language
trained
on
SMILES
(Simplified
Molecular-Input
Line-Entry
System)
representations
been
subject
the
most
extensive
experimental
validation
and
widely
adopted.
However,
these
what
is
perceived
be
a
major
limitation:
some
fraction
of
strings
that
they
generate
are
invalid,
meaning
cannot
decoded
structure.
This
shortcoming
has
motivated
remarkably
broad
spectrum
work
designed
mitigate
generation
invalid
correct
them
post
hoc.
Here
I
provide
causal
evidence
produce
outputs
not
harmful
but
instead
beneficial
models.
show
provides
self-corrective
mechanism
filters
low-likelihood
samples
from
model
output.
Conversely,
enforcing
valid
produces
structural
biases
in
generated
molecules,
impairing
distribution
limiting
generalization
unseen
space.
Together,
results
refute
prevailing
assumption
reframe
as
feature,
bug.
Accounts of Chemical Research,
Год журнала:
2023,
Номер
56(3), С. 402 - 412
Опубликована: Янв. 30, 2023
ConspectusIn
the
domain
of
reaction
development,
one
aims
to
obtain
higher
efficacies
as
measured
in
terms
yield
and/or
selectivities.
During
empirical
cycles,
an
admixture
outcomes
from
low
high
yields/selectivities
is
expected.
While
it
not
easy
identify
all
factors
that
might
impact
efficiency,
complex
and
nonlinear
dependence
on
nature
reactants,
catalysts,
solvents,
etc.
quite
likely.
Developmental
stages
newer
reactions
would
typically
offer
a
few
hundreds
samples
with
variations
participating
molecules
conditions.
These
"observations"
their
"output"
can
be
harnessed
valuable
labeled
data
for
developing
molecular
machine
learning
(ML)
models.
Once
robust
ML
model
built
specific
under
predict
outcome
any
new
choice
substrates/catalyst
seconds/minutes
thus
expedite
identification
promising
candidates
experimental
validation.
Recent
years
have
witnessed
impressive
applications
world,
most
them
aimed
at
predicting
important
chemical
or
biological
properties.
We
believe
integration
effective
workflows
made
richly
beneficial
discovery.As
technology,
direct
adaptation
used
well-developed
domains,
such
natural
language
processing
(NLP)
image
recognition,
unlikely
succeed
discovery.
Some
challenges
stem
ineffective
featurization
space,
unavailability
quality
its
distribution,
making
right
technically
deployment.
It
shall
noted
there
no
universal
suitable
inherently
high-dimensional
problem
reactions.
Given
these
backgrounds,
rendering
tools
conducive
exciting
well
challenging
endeavor
same
time.
With
increased
availability
efficient
algorithms,
we
focused
tapping
potential
small-data
discovery
(a
thousands
samples).In
this
Account,
describe
both
feature
engineering
approaches
applied
diverse
contemporary
interest.
Among
these,
catalytic
asymmetric
hydrogenation
imines/alkenes,
β-C(sp3)–H
bond
functionalization,
relay
Heck
employed
approach
using
quantum-chemically
derived
physical
organic
descriptors
features─all
designed
enantioselectivity.
The
selection
features
customize
interest
described,
along
emphasizing
insights
could
gathered
through
use
features.
Feature
methods
Buchwald–Hartwig
cross-coupling,
deoxyfluorination
alcohols,
enantioselectivity
N,S-acetal
formation
are
found
excellent
predictions.
propose
transfer
protocol,
wherein
trained
large
number
(105–106)
fine-tuned
library
target
task
reactions,
alternative
(102–103
reactions).
exploitation
deep
neural
network
latent
space
method
generative
tasks
useful
substrates
demonstrated
strategy.
Molecules,
Год журнала:
2023,
Номер
28(11), С. 4430 - 4430
Опубликована: Май 30, 2023
Deep
generative
models
applied
to
the
generation
of
novel
compounds
in
small-molecule
drug
design
have
attracted
a
lot
attention
recent
years.
To
that
interact
with
specific
target
proteins,
we
propose
Generative
Pre-Trained
Transformer
(GPT)-inspired
model
for
de
novo
target-specific
molecular
design.
By
implementing
different
keys
and
values
multi-head
conditional
on
specified
target,
proposed
method
can
generate
drug-like
both
without
target.
The
results
show
our
approach
(cMolGPT)
is
capable
generating
SMILES
strings
correspond
active
compounds.
Moreover,
generated
from
closely
match
chemical
space
real
molecules
cover
significant
portion
Thus,
Conditional
valuable
tool
molecule
has
potential
accelerate
optimization
cycle
time.
Journal of Agricultural and Food Chemistry,
Год журнала:
2023,
Номер
71(18), С. 6789 - 6802
Опубликована: Апрель 27, 2023
Flavor
molecules
are
commonly
used
in
the
food
industry
to
enhance
product
quality
and
consumer
experiences
but
associated
with
potential
human
health
risks,
highlighting
need
for
safer
alternatives.
To
address
these
health-associated
challenges
promote
reasonable
application,
several
databases
flavor
have
been
constructed.
However,
no
existing
studies
comprehensively
summarized
data
resources
according
quality,
focused
fields,
gaps.
Here,
we
systematically
25
molecule
published
within
last
20
years
revealed
that
inaccessibility,
untimely
updates,
nonstandard
descriptions
main
limitations
of
current
studies.
We
examined
development
computational
approaches
(e.g.,
machine
learning
molecular
simulation)
identification
novel
discussed
their
major
regarding
throughput,
model
interpretability,
lack
gold-standard
sets
equitable
evaluation.
Additionally,
future
strategies
mining
designing
based
on
multi-omics
artificial
intelligence
provide
a
new
foundation
science
research.
Journal of Chemical Information and Modeling,
Год журнала:
2023,
Номер
63(15), С. 4505 - 4532
Опубликована: Июль 19, 2023
The
field
of
computational
chemistry
has
seen
a
significant
increase
in
the
integration
machine
learning
concepts
and
algorithms.
In
this
Perspective,
we
surveyed
179
open-source
software
projects,
with
corresponding
peer-reviewed
papers
published
within
last
5
years,
to
better
understand
topics
being
investigated
by
approaches.
For
each
project,
provide
short
description,
link
code,
accompanying
license
type,
whether
training
data
resulting
models
are
made
publicly
available.
Based
on
those
deposited
GitHub
repositories,
most
popular
employed
Python
libraries
identified.
We
hope
that
survey
will
serve
as
resource
learn
about
or
specific
architectures
thereof
identifying
accessible
codes
topic
basis.
To
end,
also
include
for
generating
fundamental
learning.
our
observations
considering
three
pillars
collaborative
work,
open
data,
source
(code),
models,
some
suggestions
community.