PLoS ONE,
Journal Year:
2024,
Volume and Issue:
19(11), P. e0311370 - e0311370
Published: Nov. 27, 2024
The
increasing
availability
of
massive
genetic
sequencing
data
in
the
clinical
setting
has
triggered
need
for
appropriate
tools
to
help
fully
exploit
wealth
information
these
possess.
GFPrint™
is
a
proprietary
streaming
algorithm
designed
meet
that
need.
By
extracting
most
relevant
functional
features,
transforms
high-dimensional,
noisy
into
an
embedded
representation,
allowing
unsupervised
models
create
clusters
can
be
re-mapped
original
information.
Ultimately,
this
allows
identification
genes
and
pathways
disease
onset
progression.
been
tested
validated
using
two
cancer
genomic
datasets
publicly
available.
Analysis
TCGA
dataset
identified
panels
whose
mutations
appear
negatively
influence
survival
non-metastatic
colorectal
(15
genes),
epidermoid
non-small
cell
lung
(167
genes)
pheochromocytoma
(313
patients.
Likewise,
analysis
Broad
Institute
75
involved
related
extracellular
matrix
reorganization
dictate
worse
prognosis
breast
accessible
through
secure
web
portal
used
any
therapeutic
area
where
profile
patients
influences
evolution.
PLoS ONE,
Journal Year:
2024,
Volume and Issue:
19(9), P. e0310748 - e0310748
Published: Sept. 27, 2024
Brain
tumors
are
one
of
the
leading
diseases
imposing
a
huge
morbidity
rate
across
world
every
year.
Classifying
brain
accurately
plays
crucial
role
in
clinical
diagnosis
and
improves
overall
healthcare
process.
ML
techniques
have
shown
promise
classifying
based
on
medical
imaging
data
such
as
MRI
scans.
These
aid
detecting
planning
treatment
early,
improving
patient
outcomes.
However,
image
datasets
frequently
affected
by
significant
class
imbalance,
especially
when
benign
outnumber
malignant
number.
This
study
presents
an
explainable
ensemble-based
pipeline
for
tumor
classification
that
integrates
Dual-GAN
mechanism
with
feature
extraction
techniques,
specifically
designed
highly
imbalanced
data.
facilitates
generation
synthetic
minority
samples,
addressing
imbalance
issue
without
compromising
original
quality
Additionally,
integration
different
methods
capturing
precise
informative
features.
proposes
novel
deep
ensemble
(DeepEFE)
framework
surpasses
other
benchmark
learning
models
accuracy
98.15%.
focuses
achieving
high
while
prioritizing
stable
performance.
By
incorporating
Grad-CAM,
it
enhances
transparency
interpretability
research
identifies
most
relevant
contributing
parts
input
images
toward
accurate
outcomes
enhancing
reliability
proposed
pipeline.
The
significantly
improved
Precision,
Sensitivity
F1-Score
demonstrate
effectiveness
handling
accuracy.
Furthermore,
explainability
process
to
establish
reliable
model
classification,
encouraging
their
adoption
practice
promoting
trust
decision-making
processes.
Cancer Informatics,
Journal Year:
2024,
Volume and Issue:
23
Published: Jan. 1, 2024
Under
the
classification
of
multicategory
survival
outcomes
cancer
patients,
it
is
crucial
to
identify
biomarkers
that
affect
specific
outcome
categories.
The
from
transcriptomic
data
has
been
thoroughly
investigated
in
computational
biology.
Nevertheless,
several
challenges
must
be
addressed,
including
ultra-high-dimensional
feature
space,
contamination,
and
imbalance,
all
which
contribute
instability
diagnostic
model.
Furthermore,
although
most
methods
achieve
accurate
predicted
performance
for
binary
with
high-dimensional
data,
their
extension
multi-class
not
straightforward.
Journal of Statistical Computation and Simulation,
Journal Year:
2024,
Volume and Issue:
unknown, P. 1 - 24
Published: Oct. 18, 2024
Classification
in
high
dimensions
has
been
highlighted
for
the
past
two
decades
since
Fisher's
linear
discriminant
analysis
(LDA)
is
not
optimal
a
smaller
sample
size
n
comparing
number
of
covariates
p,
i.e.
p>n,
which
mostly
due
to
singularity
covariance
matrix.
Rather
than
modifying
how
estimate
and
mean
vector
constructing
classifier,
we
build
types
high-dimensional
classifiers
using
data
splitting,
single
splitting
(SDS)
multiple
(MDS).
Moreover,
introduce
weighted
version
MDS
classifier
that
improves
classification
performance
as
illustrated
numerical
studies.
Each
split
sets
compared
so
LDA
applicable,
results
can
be
combined
with
respect
minimizing
misclassification
rate.
We
present
theoretical
justification
backing
up
our
proposed
methods
by
rates
dimension.
also
conduct
wide
range
simulations
analyse
four
microarray
sets,
demonstrates
outperform
some
existing
or
at
least
yield
comparable
performances.
PLoS ONE,
Journal Year:
2024,
Volume and Issue:
19(11), P. e0311370 - e0311370
Published: Nov. 27, 2024
The
increasing
availability
of
massive
genetic
sequencing
data
in
the
clinical
setting
has
triggered
need
for
appropriate
tools
to
help
fully
exploit
wealth
information
these
possess.
GFPrint™
is
a
proprietary
streaming
algorithm
designed
meet
that
need.
By
extracting
most
relevant
functional
features,
transforms
high-dimensional,
noisy
into
an
embedded
representation,
allowing
unsupervised
models
create
clusters
can
be
re-mapped
original
information.
Ultimately,
this
allows
identification
genes
and
pathways
disease
onset
progression.
been
tested
validated
using
two
cancer
genomic
datasets
publicly
available.
Analysis
TCGA
dataset
identified
panels
whose
mutations
appear
negatively
influence
survival
non-metastatic
colorectal
(15
genes),
epidermoid
non-small
cell
lung
(167
genes)
pheochromocytoma
(313
patients.
Likewise,
analysis
Broad
Institute
75
involved
related
extracellular
matrix
reorganization
dictate
worse
prognosis
breast
accessible
through
secure
web
portal
used
any
therapeutic
area
where
profile
patients
influences
evolution.