In
recent
data-driven
approaches
to
materials
discov-
ery,
scenarios
where
target
quantities
are
expensive
compute
or
measure
often
overlooked.
such
cases,
it
becomes
imperative
construct
a
training
set
that
includes
the
most
diverse,
representative,
and
informative
samples.
Here,
novel
regression
tree-based
active
learning
algorithm
is
employed
for
purpose.
It
applied
predict
band
gap
adsorption
properties
of
metal-organic
frameworks
(MOFs),
class
results
from
virtually
infinite
combinations
their
building
units.
Simpler
low
dimensional
descrip-
tors,
as
Stoichiometric-120
geometric
properties,
found
here
better
represent
MOFs
in
data
regime,
used
feature
space
this
model.
The
partition
given
by
tree
constructed
on
labeled
part
dataset
select
new
samples
be
added
set,
thereby
limiting
its
size
while
maximizing
prediction
quality.
Through
tests
QMOF,
hMOF,
dMOF
sets,
we
show
our
method
effective
constructing
small
sets
learn
models
well
thus
reducing
label-
ing
cost.
Specifically,
approach
highly
beneficial
when
labels
unevenly
distributed
descriptor
label
distribution
imbalanced,
which
case
real
world
data.
This
offers
unique
tool
efficiently
analyze
complex
structure-property
relationships
accelerate
discovery.
Communications Chemistry,
Journal Year:
2024,
Volume and Issue:
7(1)
Published: May 8, 2024
Abstract
Breakthroughs
in
efficient
use
of
biogas
fuel
depend
on
successful
separation
carbon
dioxide/methane
streams
and
identification
appropriate
materials.
In
this
work,
machine
learning
models
are
trained
to
predict
properties
metal-organic
frameworks
(MOFs).
Training
data
obtained
using
grand
canonical
Monte
Carlo
simulations
experimental
MOFs
which
have
been
carefully
curated
ensure
quality
structural
viability.
The
show
excellent
performance
predicting
gas
uptake
classifying
according
the
trade-off
between
selectivity,
with
R
2
values
consistently
above
0.9
for
validation
set.
We
make
prospective
predictions
an
independent
external
set
hypothetical
MOFs,
examine
these
comparison
results
calculations.
best-performing
correctly
filter
out
over
90%
low-performing
unseen
illustrating
their
applicability
other
MOF
datasets.
Atmosphere,
Journal Year:
2024,
Volume and Issue:
15(6), P. 706 - 706
Published: June 13, 2024
In
atmospheric
chemistry,
the
Henry’s
law
constant
(HLC)
is
crucial
for
understanding
distribution
of
organic
compounds
across
gas,
particle,
and
aqueous
phases.
Quantitative
structure–property
relationship
(QSPR)
models
described
in
scientific
research
are
generally
tailored
to
specific
groups
or
categories
substances
often
developed
using
a
limited
set
experimental
data.
This
study
machine
learning
model
an
extensive
dataset
HLCs
approximately
1100
compounds.
Molecular
descriptors
calculated
alvaDesc
software
(v
2.0)
were
used
train
models.
A
hybrid
approach
was
adopted
feature
selection,
ensuring
alignment
with
domain
knowledge.
Based
on
root
mean
squared
error
(RMSE)
training
test
data
after
cross-validation,
Gradient
Boosting
(GB)
selected
as
predicting
HLC.
The
hyperparameters
optimized
automated
hyperparameter
optimization
framework
Optuna.
impact
features
target
variable
assessed
SHapley
Additive
exPlanations
(SHAP).
demonstrated
strong
performance
training,
evaluation,
datasets,
achieving
coefficients
determination
(R2)
0.96,
0.78,
0.74,
respectively.
estimate
HLC
associated
carbon
capture
storage
(CCS)
emissions
secondary
aerosols.
Journal of Materials Chemistry A,
Journal Year:
2024,
Volume and Issue:
unknown
Published: Dec. 17, 2024
This
review
provides
an
overview
of
machine
learning
(ML)
workflows
in
MOFs.
It
discusses
three
rational
design
methods,
focusing
on
future
challenges
and
opportunities
to
enhance
understanding
guide
ML-based
MOF
research.
Scientific Reports,
Journal Year:
2024,
Volume and Issue:
14(1)
Published: Nov. 9, 2024
Thanks
to
their
unique
properties
such
as
ultra
high
porosity
and
surface
area,
metal-organic
frameworks
(MOFs)
are
highly
regarded
materials
for
gas
adsorption
applications.
However,
combinatorial
nature
results
in
a
vast
chemical
space,
precluding
its
exploration
with
traditional
techniques.
Recently,
machine
learning
(ML)
pipelines
have
been
established
the
go-to
method
large
scale
screening
by
means
of
predictive
models.
These
typically
built
descriptor-based
manner,
meaning
that
structure
must
be
first
coarse-grained
into
1D
fingerprint
before
it
is
fed
ML
algorithm.
As
such,
latter
can
not
fully
exploit
3D
structural
information,
potentially
resulting
model
lower
quality.
In
this
work,
we
propose
descriptor-free
framework
called
"AIdsorb",
which
directly
process
raw
information
predicting
properties.
To
accomplish
that,
treated
point
cloud
then
passed
deep
algorithm
suitable
analysis.
proof
concept,
AIdsorb
applied
In
recent
data-driven
approaches
to
materials
discov-
ery,
scenarios
where
target
quantities
are
expensive
compute
or
measure
often
overlooked.
such
cases,
it
becomes
imperative
construct
a
training
set
that
includes
the
most
diverse,
representative,
and
informative
samples.
Here,
novel
regression
tree-based
active
learning
algorithm
is
employed
for
purpose.
It
applied
predict
band
gap
adsorption
properties
of
metal-organic
frameworks
(MOFs),
class
results
from
virtually
infinite
combinations
their
building
units.
Simpler
low
dimensional
descrip-
tors,
as
Stoichiometric-120
geometric
properties,
found
here
better
represent
MOFs
in
data
regime,
used
feature
space
this
model.
The
partition
given
by
tree
constructed
on
labeled
part
dataset
select
new
samples
be
added
set,
thereby
limiting
its
size
while
maximizing
prediction
quality.
Through
tests
QMOF,
hMOF,
dMOF
sets,
we
show
our
method
effective
constructing
small
sets
learn
models
well
thus
reducing
label-
ing
cost.
Specifically,
approach
highly
beneficial
when
labels
unevenly
distributed
descriptor
label
distribution
imbalanced,
which
case
real
world
data.
This
offers
unique
tool
efficiently
analyze
complex
structure-property
relationships
accelerate
discovery.