IAES International Journal of Artificial Intelligence,
Journal Year:
2023,
Volume and Issue:
13(1), P. 1050 - 1050
Published: Dec. 25, 2023
<p><span>Speaker
identification
is
biometrics
that
classifies
or
identifies
a
person
from
other
speakers
based
on
speech
characteristics.
Recently,
deep
learning
models
outperformed
conventional
machine
in
speaker
identification.
Spectrograms
of
the
have
been
used
as
input
learning-based
using
clean
speech.
However,
performance
systems
gets
degraded
under
noisy
conditions.
Cochleograms
shown
better
results
than
spectrograms
recognition
and
mismatched
Moreover,
hybrid
convolutional
neural
network
(CNN)
recurrent
(RNN)
variants
CNN
RNN
recent
studies.
there
no
attempt
conducted
to
use
enhanced
cochleogram
enhance
In
this
study,
gated
unit
(GRU)
proposed
for
conditions
input.
VoxCeleb1
audio
dataset
with
real-world
noises,
white
Gaussian
noises
(WGN)
without
additive
were
employed
experiments.
The
experiment
comparison
existing
works
show
model
performs
study
works</span><span>.</span></p>
Information,
Journal Year:
2025,
Volume and Issue:
16(4), P. 288 - 288
Published: April 3, 2025
This
paper
addresses
the
issue
of
distinguishing
commercially
played
songs
from
non-music
audio
in
radio
broadcasts,
where
automatic
song
identification
systems
are
commonly
employed
for
reporting
purposes.
Service
call
costs
increase
because
these
need
to
remain
continuously
active,
even
when
music
is
not
being
broadcast.
Our
solution
serves
as
a
preliminary
filter
determine
whether
an
segment
constitutes
“music”
and
thus
warrants
subsequent
service
identifier.
We
collected
139
h
non-consecutive
5
s
samples
various
labeling
segments
talk
shows
or
advertisements
“non-music”.
implemented
multiple
data
augmentation
strategies,
including
FM-like
pre-processing,
trained
custom
Convolutional
Neural
Network,
then
built
live
inference
platform
capable
monitoring
web
streams.
was
validated
using
1360
newly
samples,
evaluating
performance
on
both
chunks
15
buffers.
The
system
demonstrated
consistently
high
previously
unseen
stations,
achieving
average
accuracy
96%
maximum
98.23%.
intensive
pre-processing
contributed
performances
with
benefit
making
inherently
suitable
FM
radio.
has
been
incorporated
into
commercial
product
currently
utilized
by
Italian
clients
royalty
calculation
Journal of Materials Chemistry A,
Journal Year:
2024,
Volume and Issue:
12(33), P. 21626 - 21676
Published: Jan. 1, 2024
Reactive
sulfur,
oxygen
and
nitrogen
species
(reactive
SON
species)
are
important
topics
in
redox
biology
their
recognition
by
rhodamine-derived
probes
is
impactful
the
bio-medical
research
field.
Applied Sciences,
Journal Year:
2023,
Volume and Issue:
13(15), P. 8562 - 8562
Published: July 25, 2023
Parkinson’s
Disease
and
Adductor-type
Spasmodic
Dysphonia
are
two
neurological
disorders
that
greatly
decrease
the
quality
of
life
millions
patients
worldwide.
Despite
this
great
diffusion,
related
diagnoses
often
performed
empirically,
while
it
could
be
relevant
to
count
on
objective
measurable
biomarkers,
among
which
researchers
have
been
considering
features
voice
impairment
can
useful
indicators
but
sometimes
lead
confusion.
Therefore,
here,
our
purpose
was
aimed
at
developing
a
robust
Machine
Learning
approach
for
multi-class
classification
based
6373
extracted
from
convenient
dataset
made
sustained
vowel/e/
an
ad
hoc
selected
Italian
sentence,
by
111
healthy
subjects,
51
disease
patients,
60
dysphonic
patients.
Correlation,
Information
Gain,
Gain
Ratio,
Genetic
Algorithm-based
methodologies
were
compared
feature
selection,
build
subsets
analyzed
means
Naïve
Bayes,
Random
Forest,
Multi-Layer
Perceptron
classifiers,
trained
with
10-fold
cross-validation.
As
result,
spectral,
cepstral,
prosodic,
voicing-related
assessed
as
most
relevant,
Algorithm
effective
selector,
adopted
classifiers
similarly.
In
particular,
+
Bayes
brought
one
highest
accuracies
in
analysis,
being
95.70%
vowel
99.46%
sentence.
Applied Sciences,
Journal Year:
2023,
Volume and Issue:
13(17), P. 9567 - 9567
Published: Aug. 24, 2023
The
rapid
momentum
of
deep
neural
networks
(DNNs)
in
recent
years
has
yielded
state-of-the-art
performance
various
machine-learning
tasks
using
speaker
identification
systems.
Speaker
is
based
on
the
speech
signals
and
features
that
can
be
extracted
from
them.
In
this
article,
we
proposed
a
system
developed
DNNs
models.
acoustic
prosodic
signal,
such
as
pitch
frequency
(vocal
cords
vibration
rate),
energy
(loudness
speech),
their
derivations,
any
additional
features.
Additionally,
article
investigates
existing
recurrent
(RNNs)
models
adapts
them
to
design
public
YOHO
LDC
dataset.
average
accuracy
was
91.93%
best
experiment
for
identification.
Furthermore,
paper
helps
uncover
reasons
analyzing
speakers
tokens
yielding
major
errors
increase
system’s
robustness
regarding
feature
selection
tune-up.
Applied Sciences,
Journal Year:
2024,
Volume and Issue:
14(23), P. 11446 - 11446
Published: Dec. 9, 2024
Reverberation
and
background
noise
are
common
unavoidable
real-world
phenomena
that
hinder
automatic
speaker
recognition
systems,
particularly
because
these
systems
typically
trained
on
noise-free
data.
Most
models
rely
fixed
audio
feature
sets.
To
evaluate
the
dependency
of
features
reverberation
noise,
this
study
proposes
augmenting
commonly
used
mel-frequency
cepstral
coefficients
(MFCCs)
with
relative
spectral
(RASTA)
features.
The
performance
was
assessed
using
noisy
data
generated
by
applying
pink
to
DEMoS
dataset,
which
includes
56
speakers.
Verification
were
clean
MFCCs,
RASTA
features,
or
their
combination
as
inputs.
They
validated
augmented
progressively
increasing
levels.
results
indicate
MFCCs
struggle
identify
main
speaker,
while
method
has
difficulty
opposite
class.
hybrid
set,
derived
from
combination,
demonstrates
best
overall
a
compromise
between
two.
Although
MFCC
is
standard
performs
well
training
data,
it
shows
significant
tendency
misclassify
in
scenarios,
critical
limitation
for
modern
user-centric
verification
applications.
therefore,
proves
effective
balanced
solution,
optimizing
both
sensitivity
specificity.
International Journal of Emerging Technology and Advanced Engineering,
Journal Year:
2023,
Volume and Issue:
13(9), P. 25 - 35
Published: Oct. 3, 2023
One
of
the
authentication
models
that
are
currently
often
used
is
based
on
biometrics,
such
as
eye
retina,
fingerprint,
and
speech
recognition.
Moreover,
textindependent
speaker
identification
one
domains
recognition
has
been
widely
studied.
Short
duration
in
process
challenges
field
Accuracy
a
great
issue
when
shorter,
besides
system
to
be
general
enough
various
languages
with
different
dialects
which
have
their
own
characteristic
tribe
region.
Therefore,
author
this
study
introduces
multi
comprise
regional,
Indonesian,
English
short
utterance.
Researchers
MFCC
technique
extract
voice
features
CNN
classification
model.
There
two
kinds
dataset
used,
open
for
regional
language,
Indonesian.
Own
recording
18
persons
gender
who
each
read
text
several
paragraphs
sentences
Whereas
public
language
consisted
80
speakers,
41
Sundanese
39
Javanese.
As
dataset,
126
male
speakers
125
female
were
taken
from
LibriSpeech.
Tests
carried
out
separately
variety
duration,
about
3
seconds
languages,
1
The
result,
best
accuracy
obtained
by
95%
(regional
dataset),
94%
(English
98%
(private
dataset).