Frontiers in Public Health,
Journal Year:
2021,
Volume and Issue:
9
Published: Feb. 17, 2021
Background:
Previous
studies
have
constructed
prediction
models
for
type
2
diabetes
mellitus
(T2DM),
but
machine
learning
was
rarely
used
and
few
focused
on
genetic
prediction.
This
study
aimed
to
establish
an
effective
T2DM
tool
further
explore
the
potential
of
risk
scores
(GRS)
via
various
classifiers
among
rural
adults.
Methods:
In
this
prospective
study,
GRS
a
total
5,712
participants
from
Henan
Rural
Cohort
Study
calculated.
Cox
proportional
hazards
(CPH)
regression
analyze
associations
between
T2DM.
CPH,
artificial
neural
network
(ANN),
random
forest
(RF),
gradient
boosting
(GBM)
were
models,
respectively.
The
area
under
receiver
operating
characteristic
curve
(AUC)
net
reclassification
index
(NRI)
assess
discrimination
ability
models.
decision
plotted
determine
clinical-utility
Results:
Compared
with
individuals
in
lowest
quintile
GRS,
HR
(95%
CI)
2.06
(1.40
3.03)
those
highest
(
P
trend
<
0.05).
Based
conventional
predictors,
AUCs
model
0.815,
0.816,
0.843,
0.851
ANN,
RF,
GBM,
Changes
integration
GBM
0.001,
0.002,
0.018,
0.033,
reclassifications
significantly
improved
all
when
adding
(NRI:
41.2%
CPH;
41.0%
ANN;
46.4%
45.1%
GBM).
Decision
analysis
indicated
clinical
benefits
combined
GRS.
Conclusion:
may
provide
incremental
predictions
performance
beyond
factors
T2DM,
which
demonstrated
use
markers
screen
vulnerable
populations.
Clinical
Trial
Registration:
is
registered
Chinese
Register
(Registration
number:
ChiCTR-OOC-15006699).
http://www.chictr.org.cn/showproj.aspx?proj=11375
.
Abstract
Diabetes
Mellitus
is
a
severe,
chronic
disease
that
occurs
when
blood
glucose
levels
rise
above
certain
limits.
Over
the
last
years,
machine
and
deep
learning
techniques
have
been
used
to
predict
diabetes
its
complications.
However,
researchers
developers
still
face
two
main
challenges
building
type
2
predictive
models.
First,
there
considerable
heterogeneity
in
previous
studies
regarding
used,
making
it
challenging
identify
optimal
one.
Second,
lack
of
transparency
about
features
models,
which
reduces
their
interpretability.
This
systematic
review
aimed
at
providing
answers
challenges.
The
followed
PRISMA
methodology
primarily,
enriched
with
one
proposed
by
Keele
Durham
Universities.
Ninety
were
included,
model,
complementary
techniques,
dataset,
performance
parameters
reported
extracted.
Eighteen
different
types
models
compared,
tree-based
algorithms
showing
top
performances.
Deep
Neural
Networks
proved
suboptimal,
despite
ability
deal
big
dirty
data.
Balancing
data
feature
selection
helpful
increase
model’s
efficiency.
Models
trained
on
tidy
datasets
achieved
almost
perfect
Preventive Medicine Reports,
Journal Year:
2023,
Volume and Issue:
35, P. 102358 - 102358
Published: Aug. 19, 2023
Diabetes
is
a
chronic
metabolic
disease
characterized
by
hyperglycemia,
the
follow-up
management
of
diabetes
patients
mostly
in
community,
but
relationship
between
key
lifestyle
indicators
community
and
risk
unclear.
In
order
to
explore
association
life
characteristic
diabetes,
252,176
records
people
with
from
2016
2023
were
obtained
Haizhu
District,
Guangzhou.
According
data,
that
affect
are
determined,
optimal
feature
subset
through
selection
technology
accurately
assess
diabetes.
A
assessment
model
based
on
random
forest
classifier
was
designed,
which
used
parameter
algorithm
comparison,
an
accuracy
91.24%
AUC
corresponding
ROC
curve
97%.
improve
applicability
clinical
real
life,
score
card
designed
tested
using
original
95.15%,
reliability
high.
The
prediction
big
data
mining
can
be
for
large-scale
screening
early
warning
doctors
patient
further
promoting
prevention
control
strategies,
also
wearable
devices
or
intelligent
biosensors
individual
self
examination,
reduce
factor
levels.
BMC Medical Informatics and Decision Making,
Journal Year:
2022,
Volume and Issue:
22(1)
Published: Feb. 10, 2022
Early
detection
and
prediction
of
type
two
diabetes
mellitus
incidence
by
baseline
measurements
could
reduce
associated
complications
in
the
future.
The
low
rate
comparison
with
non-diabetes
makes
accurate
minority
class
more
challenging.Deep
neural
network
(DNN),
extremely
gradient
boosting
(XGBoost),
random
forest
(RF)
performance
is
compared
predicting
Tehran
Lipid
Glucose
Study
(TLGS)
cohort
data.
impact
changing
threshold,
cost-sensitive
learning,
over
under-sampling
strategies
as
solutions
to
imbalance
have
been
improving
algorithms
performance.DNN
highest
accuracy
diabetes,
54.8%,
outperformed
XGBoost
RF
terms
AUROC,
g-mean,
f1-measure
original
imbalanced
Changing
threshold
based
on
maximum
improved
three
algorithms.
Repeated
edited
nearest
neighbors
(RENN)
DNN
learning
tree-based
were
best
tackle
issue.
RENN
increased
ROC
Precision-Recall
AUCs,
g-mean
from
0.857,
0.603,
0.713,
0.575
0.862,
0.608,
0.773,
0.583,
respectively
DNN.
Weighing
0.667,
0.554
0.776,
0.588
XGBoost,
0.659,
0.543
0.775,
0.566
RF,
respectively.
Also,
AUCs
0.840,
0.578
0.846,
0.591,
respectively.G-mean
experienced
most
increase
all
solutions.
efficient
strategies,
resampling
methods
are
faster
handle
imbalance.
Among
sampling
had
better
than
others.
Journal of Personalized Medicine,
Journal Year:
2022,
Volume and Issue:
12(6), P. 905 - 905
Published: May 31, 2022
Early
identification
of
individuals
at
high
risk
diabetes
is
crucial
for
implementing
early
intervention
strategies.
However,
algorithms
specific
to
elderly
Chinese
adults
are
lacking.
The
aim
this
study
build
effective
prediction
models
based
on
machine
learning
(ML)
the
type
2
mellitus
(T2DM)
in
elderly.
A
retrospective
cohort
was
conducted
using
health
screening
data
older
than
65
years
Wuhan,
China
from
2018
2020.
With
a
strict
filtration,
127,031
records
eligible
participants
were
utilized.
Overall,
8298
diagnosed
with
incident
T2DM
during
2-year
follow-up
(2019-2020).
dataset
randomly
split
into
training
set
(n
=
101,625)
and
test
25,406).
We
developed
four
ML
algorithms:
logistic
regression
(LR),
decision
tree
(DT),
random
forest
(RF),
extreme
gradient
boosting
(XGBoost).
Using
LASSO
regression,
21
features
selected.
Random
under-sampling
(RUS)
applied
address
class
imbalance,
Shapley
Additive
Explanations
(SHAP)
used
calculate
visualize
feature
importance.
Model
performance
evaluated
by
area
under
receiver
operating
characteristic
curve
(AUC),
sensitivity,
specificity,
accuracy.
XGBoost
model
achieved
best
(AUC
0.7805,
sensitivity
0.6452,
specificity
0.7577,
accuracy
0.7503).
Fasting
plasma
glucose
(FPG),
education,
exercise,
gender,
waist
circumference
(WC)
top
five
important
predictors.
This
showed
that
can
be
screen
phrase,
which
has
strong
potential
intelligent
prevention
control
diabetes.
key
could
also
useful
developing
targeted
interventions.
Healthcare Analytics,
Journal Year:
2023,
Volume and Issue:
5, P. 100297 - 100297
Published: Dec. 30, 2023
Diabetes
is
a
prevalent
chronic
condition
that
poses
significant
challenges
to
early
diagnosis
and
identifying
at-risk
individuals.
Machine
learning
plays
crucial
role
in
diabetes
detection
by
leveraging
its
ability
process
large
volumes
of
data
identify
complex
patterns.
However,
imbalanced
data,
where
the
number
diabetic
cases
substantially
smaller
than
non-diabetic
cases,
complicates
identification
individuals
with
using
machine
algorithms.
This
study
focuses
on
predicting
whether
person
at
risk
diabetes,
considering
individual's
health
socio-economic
conditions
while
mitigating
posed
data.
We
employ
several
augmentation
techniques,
such
as
oversampling
(Synthetic
Minority
Over
Sampling
for
Nominal
Data,
i.e.SMOTE-N),
undersampling
(Edited
Nearest
Neighbor,
i.e.
ENN),
hybrid
sampling
techniques
(SMOTE-Tomek
SMOTE-ENN)
training
before
applying
algorithms
minimize
impact
Our
sheds
light
significance
carefully
utilizing
without
any
leakage
enhance
effectiveness
Moreover,
it
offers
complete
structure
healthcare
practitioners,
from
obtaining
prediction,
enabling
them
make
informed
decisions.
Machine Learning with Applications,
Journal Year:
2020,
Volume and Issue:
3, P. 100011 - 100011
Published: Dec. 18, 2020
Data
mining
(DM)
is
an
instrument
of
pattern
detection
and
retrieval
knowledge
from
a
large
quantity
data.
Many
robust
early
services
other
health-related
technologies
have
developed
clinical
diagnostic
evidence
in
both
the
DM
healthcare
sectors.
Artificial
Intelligence
(AI)
commonly
used
research
health
care
Classification
or
predictive
analytics
key
part
AI
machine
learning
(ML).
Present
analyses
new
models
founded
on
ML
methods
demonstrate
promise
area
scientific
research.
Healthcare
professionals
need
accurate
predictions
outcomes
various
illnesses
that
patients
suffer
from.
In
addition,
timing
another
significant
aspect
affects
choices
for
precise
predictions.
this
regard,
authors
reviewed
numerous
publications
terms
method,
algorithms,
performance.
This
review
paper
summarized
documentation
examined
accordance
with
approaches,
styles,
activities,
processes.
The
assessment
techniques
selected
papers
are
discussed
appraisal
findings
presented
to
conclude
article.
statistical
remedies
been
scientifically
uncertainty
between
has
now
clarified.
study
related
reveals
prediction
existing
forecasting
differs
even
if
same
dataset
used.
Predictive
also
essential,
approaches
be
improved.