Organic Process Research & Development,
Journal Year:
2024,
Volume and Issue:
unknown
Published: Nov. 11, 2024
In
the
pharmaceutical
industry,
solubility
is
a
critical
parameter
influencing
various
stages
of
drug
development,
from
early
discovery
to
commercial
manufacturing.
This
work
showcases
high-throughput
screening
workflow
and
describes
steps
required
standardize
curate
data
suitably
allow
automated
flow.
Using
high-quality
data,
we
developed
quantitative
structure–property
relationship
model
using
gradient
boosting
molecular
descriptors,
requiring
only
2D
structure
generate
predictions.
The
accuracy
competitive
with
alternative
approaches
where
additional
physical
not
required.
A
key
use
case
for
predictions
made
in
this
way
developing
control
strategies
mutagenic
impurities,
allowing
data-driven
consistent
method
calculating
contribution
purge
calculations.
Further
perspective
given
on
future
application
as
prediction
algorithm
approach
methodologies
supporting
development
general,
highlighting
potential
federated
learning
which
technological
overcome
barrier
cross-industry
sharing.
Crystal Growth & Design,
Journal Year:
2024,
Volume and Issue:
24(13), P. 5417 - 5438
Published: June 24, 2024
A
workflow
for
the
digital
design
of
crystallization
processes
starting
from
chemical
structure
active
pharmaceutical
ingredient
(API)
is
a
multistep,
multidisciplinary
process.
simple
version
would
be
to
first
predict
API
crystal
and,
it,
corresponding
properties
solubility,
morphology,
and
growth
rates,
assuming
that
nucleation
controlled
by
seeding,
then
use
these
parameters
This
usually
an
oversimplification
as
most
APIs
are
polymorphic,
stable
alone
may
not
have
required
development
into
drug
product.
perspective,
experience
Lilly
Digital
Design
project,
considers
fundamental
theoretical
basis
prediction
(CSP),
free
energy,
rate
prediction,
current
state
simulation.
illustrated
applying
modeling
techniques
real
examples,
olanzapine
succinic
acid.
We
demonstrate
promise
using
ab
initio
computer
solid
form
selection
process
in
development.
also
identify
open
problems
application
computational
achieving
accuracy
immediate
implementation
currently
limit
applicability
approach.
Soft Matter,
Journal Year:
2024,
Volume and Issue:
20(29), P. 5652 - 5669
Published: Jan. 1, 2024
Advances
in
physical
models
and
data
science
are
improving
predictions
of
polymer–solvent
phase
behavior
we
discuss
the
different
approaches
taken
today
remaining
barriers
to
making
broadly
useful
predictions.
Digital Discovery,
Journal Year:
2024,
Volume and Issue:
unknown
Published: Jan. 1, 2024
Three
ML
models
and
their
ensemble
predict
aqueous
solubility
of
small
organic
molecules
using
different
representations:
GCN
with
molecular
graphs,
EdgeConv
ESP
maps,
XGBoost
tabular
features
from
Mordred
descriptors.
Molecules,
Journal Year:
2024,
Volume and Issue:
29(20), P. 4894 - 4894
Published: Oct. 16, 2024
Deep
eutectic
solvents
(DESs)
are
popular
green
media
used
for
various
industrial,
pharmaceutical,
and
biomedical
applications.
However,
the
possible
compositions
of
systems
so
numerous
that
it
is
impossible
to
study
all
them
experimentally.
To
remedy
this
limitation,
solubility
landscape
selected
active
pharmaceutical
ingredients
(APIs)
in
choline
chloride-
betaine-based
deep
was
explored
using
theoretical
models
based
on
machine
learning.
The
available
data
APIs,
comprising
a
total
8014
points,
were
collected
neat
solvents,
binary
solvent
mixtures,
DESs.
This
set
augmented
with
new
measurements
sulfa
drugs
dry
descriptors
learning
protocol
obtained
from
σ-profiles
considered
molecules
computed
within
COSMO-RS
framework.
A
combination
six
sets
36
regressors
tested.
Taking
into
account
both
accuracy
generalization,
concluded
best
regressor
nuSVR
regressor-based
predictive
trained
relative
intermolecular
interactions
twelve-step
averaged
simplification
σ-profiles.
Journal of Cheminformatics,
Journal Year:
2024,
Volume and Issue:
16(1)
Published: Oct. 28, 2024
Abstract
Drug
solubility
is
an
important
parameter
in
the
drug
development
process,
yet
it
often
tedious
and
challenging
to
measure,
especially
for
expensive
drugs
or
those
available
small
quantities.
To
alleviate
these
challenges,
machine
learning
(ML)
has
been
applied
predict
as
alternative
approach.
However,
majority
of
existing
ML
research
focused
on
predictions
aqueous
and/or
at
specific
temperatures,
which
restricts
model
applicability
pharmaceutical
development.
bridge
this
gap,
we
compiled
a
dataset
27,000
datapoints,
including
molecules
measured
range
binary
solvent
mixtures
under
various
temperatures.
Next,
panel
models
were
trained
with
their
hyperparameters
tuned
using
Bayesian
optimization.
The
resulting
top-performing
models,
both
gradient
boosted
decision
trees
(light
boosting
extreme
boosting),
achieved
mean
absolute
errors
(MAE)
0.33
LogS
(S
g/100
g)
holdout
set.
These
further
validated
through
prospective
study,
wherein
four
predicted
by
then
in-house
experiments.
This
study
demonstrated
that
accurately
solutes
different
whose
features
closely
align
within
(MAE
<
0.5
LogS).
support
future
facilitate
advancements
field,
have
made
code
openly
available.
Scientific
contribution
Our
advances
state-of-the-art
predicting
leveraging
uniquely
comprehensive
dataset.
Unlike
studies
predominantly
focus
solvents
fixed
our
work
enables
prediction
variety
over
broad
temperature
range,
providing
practical
insights
modeling
realistic
applications.
along
open
access
significant
steps
process
new
molecule
discovery,
analysis
formulation.
Graphical
CrystEngComm,
Journal Year:
2024,
Volume and Issue:
26(6), P. 822 - 834
Published: Jan. 1, 2024
A
model-driven
workflow
that
uses
digital
tools
and
small-scale
experiments
to
maximise
the
efficiency
in
achieving
a
desired
set
of
crystallisation
responses,
kinetics
objectives.
Solubility
regression
modeling
is
foundational
for
several
chemical
engineering
applications,
particularly
crystallization
process
development.
Traditionally,
these
models
rely
on
parametric
semimechanistic
approaches
such
as
the
Van't
Hoff
Jouyban-Acree
(VH-JA)
cosolvency
model.
Although
generally
provide
narrow
prediction
intervals,
they
can
exhibit
increased
bias
when
dealing
with
significant
solute
heat
capacities
or
complex
mixture
effects.
This
study
explores
machine
learning,
including
Random
Forests,
Support
Vector
Machines,
Gaussian
Process
Regression,
and
Neural
Networks,
potential
alternatives.
While
most
learning
offered
a
lower
training
error,
it
was
observed
that
their
predictive
quality
quickly
deteriorates
further
from
data.
Hence,
hybrid
approach
explored
to
leverage
low
of
variance
VH-JA
model
through
heterogeneous
locally
weighted
bagging
ensembles.
Key
methodology
quantifying,
tracking,
minimizing
uncertainty
using
ensemble.
illustrated
case
solubility
ketoconazole
in
binary
mixtures
2-propanol
water.
The
optimal
ensemble,
comprising
58%
stepwise
42%
models,
reduced
root-mean-squared
error
maximum
absolute
percentage
by
≈30%
compared
full
VH-JA,
while
preserving
comparable
interval.
ACS Sustainable Chemistry & Engineering,
Journal Year:
2025,
Volume and Issue:
13(11), P. 4349 - 4368
Published: March 14, 2025
Selecting
more
sustainable
solvents
is
a
crucial
component
to
mitigating
the
environmental
impacts
of
chemical
processes.
Numerous
tools
have
been
developed
address
this
problem
within
pharmaceutical
industry,
employing
data-driven
approaches
such
as
multidimensional
scaling
or
principal
analysis
(PCA).
Interactive
knowledge-based
kernel
PCA
variant
that
allows
users
shape
2D
solvent
maps
by
defining
positions
data
points,
imparting
expert
knowledge
was
not
included
in
original
descriptor
set.
We
applied
interactive
task
selection
and
present
an
intuitive
interface
integrated
into
AI4Green,
electronic
laboratory
notebook
encourages
chemistry.
A
set
evidence-based
user
guidelines
were
used
combination
with
identify
four
potential
substitutions
for
example
thioesterification
reaction.