Voice
disorders
affect
a
significant
portion
of
the
global
population,
particularly
those
in
vocally
demanding
professions
such
as
singers,
actors,
teachers,
and
lawyers.
Early
detection
diagnosis
voice
pathology
diseases
are
critical
to
improving
treatment
outcomes
preventing
further
damage
vocal
cords.
Digital
processing
speech
signals
has
emerged
promising
technique
for
analyzing
vibrations
identifying
deformities
cord
function.
In
this
paper,
cost-effective
computational
method
involves
signal
by
passing
stack
band-pass
filters,
dividing
processed
each
filter
set
overlapped
frames,
applying
autocorrelation
formula
every
single
frame,
using
entropy
extract
features.
The
shown
promise
reliably
detecting
classifying
diseases,
but
research
is
required
confirm
its
efficacy
reliability.
Deep
learning
algorithms
Mel
spectrogram
feature
extraction
techniques
present
paper
detection.
VGG16,
VGG19,
ResNet50
compared.
system
demonstrated
high
prediction
accuracy
results
on
training
testing
dataset.
shows
potential
clinical
applications
disorder
assessment
diagnosis.
also
holds
telemedicine
tool,
enabling
remote
monitoring
patients'
health.
This
paper
introduces
a
computerized
non-invasive
voice
pathology
detection
system
using
deep
transfer
learning
network
(DTLN)
feature
fusion.
The
takes
both
healthy
and
pathological
samples
as
input
converts
them
into
mel-spectrogram
visual
representations.
Subsequently,
it
employs
three
architectures,
namely
(a)
AlexNet,
(b)
ResNet-50,
(c)
Inception-V3,
to
extract
complex
features
from
the
signal's
spectrograms.
As
vector
dimensions
grow
due
aggregation
of
these
CNN
models,
study
an
infinite
selection
algorithm
identify
most
distinguishing
features.
These
selected
optimal
are
then
used
classify
speech
either
or
pathological,
utilizing
K-nearest
neighbor
(KNN)
classifier.
effectiveness
this
method
is
evaluated
on
well-established
datasets,
AVPD,
SVD,
PdA,
metrics
such
precision,
specificity,
sensitivity,
F-measure,
accuracy.
experimental
results
reveal
that
proposed
fusion
approach
achieves
accuracy
rates
97.86%,
95%,
96.83%
for
PdA
respectively.
TURKISH JOURNAL OF ELECTRICAL ENGINEERING & COMPUTER SCIENCES,
Journal Year:
2024,
Volume and Issue:
32(4), P. 590 - 604
Published: July 26, 2024
Automated
voice
disorder
systems
that
distinguish
pathological
voices
from
healthy
ones
have
been
developed
with
the
aid
of
machine
learning
methods.
Both
clinicians
and
patients
can
benefit
these
as
they
provide
many
advantages,
compared
to
invasive
techniques.
These
produce
binary
(healthy/pathological)
or
multi-class
(healthy/selected
pathologies)
decisions.
However,
multiple
disorders
might
exist
in
an
individual's
voice.
Multi-label
classification
should
be
considered
such
cases.
By
this
time,
only
a
single
report
is
available
on
topic,
where
hand-crafted
features
were
used,
data
augmentation
technique
was
utilized
overcome
class
imbalances.
In
study,
similar
experimental
setup
followed
investigate
suitability
raw
signals
inputs
for
multi-label
classification.
A
deep
model
which
consists
residual
blocks
novel
gating
mechanism
proposed.
The
weighs
channels
block's
output
based
both
its
previous
layer's
output.
Using
SincNet
filterbank
operates
directly
waveform
initial
layer,
0.99
accuracy
0.98
F1
score
observed
natural
/a/
vowels
Saarbruecken
Voice
Database
time
domain
balance
samples.
On
other
hand,
reducing
number
augmented
samples
decreased
performance
systems,
indicating
need
balanced
dataset
avoid
oversampling
underrepresented
classes.
proposed
architecture
performed
consistently
better
than
ResNet18
connected
attention,
verified
effectiveness
mechanism.
Journal of Machine and Computing,
Journal Year:
2024,
Volume and Issue:
unknown, P. 463 - 471
Published: April 5, 2024
With
the
demand
for
better,
more
user-friendly
HMIs,
voice
recognition
systems
have
risen
in
prominence
recent
years.
The
use
of
computer-assisted
vocal
pathology
categorization
tools
allows
accurate
detection
diseases.
By
using
these
methods,
disorders
may
be
diagnosed
early
on
and
treated
accordingly.
An
effective
Deep
Learning-based
tool
feature
extraction-based
identification
is
goal
this
project.
This
research
presents
results
EfficientNet,
a
pre-trained
Convolutional
Neural
Network
(CNN),
speech
dataset
order
to
achieve
highest
possible
classification
accuracy.
Artificial
Rabbit
Optimization
Algorithm
(AROA)-tuned
set
parameters
complements
model's
mobNet
building
elements,
which
include
linear
stack
divisible
convolution
max-pooling
layers
activated
by
Swish.
In
make
suggested
approach
applicable
broad
variety
disorder
problems,
study
also
suggests
unique
training
method
along
with
several
methodologies.
One
database,
Saarbrücken
database
(SVD),
has
been
used
test
proposed
technology.
Using
up
96%
accuracy,
experimental
findings
demonstrate
that
CNN
capable
detecting
pathologies.
demonstrates
great
potential
real-world
clinical
settings,
where
it
provide
classifications
as
little
three
seconds
expedite
automated
diagnosis
treatment.