Computational bioacoustics with deep learning: a review and roadmap
PeerJ,
Journal Year:
2022,
Volume and Issue:
10, P. e13152 - e13152
Published: March 21, 2022
Animal
vocalisations
and
natural
soundscapes
are
fascinating
objects
of
study,
contain
valuable
evidence
about
animal
behaviours,
populations
ecosystems.
They
studied
in
bioacoustics
ecoacoustics,
with
signal
processing
analysis
an
important
component.
Computational
has
accelerated
recent
decades
due
to
the
growth
affordable
digital
sound
recording
devices,
huge
progress
informatics
such
as
big
data,
machine
learning.
Methods
inherited
from
wider
field
deep
learning,
including
speech
image
processing.
However,
tasks,
demands
data
characteristics
often
different
those
addressed
or
music
analysis.
There
remain
unsolved
problems,
tasks
for
which
is
surely
present
many
acoustic
signals,
but
not
yet
realised.
In
this
paper
I
perform
a
review
state
art
learning
computational
bioacoustics,
aiming
clarify
key
concepts
identify
analyse
knowledge
gaps.
Based
on
this,
offer
subjective
principled
roadmap
learning:
topics
that
community
should
aim
address,
order
make
most
future
developments
AI
informatics,
use
audio
answering
zoological
ecological
questions.
Language: Английский
A review of automatic recognition technology for bird vocalizations in the deep learning era
Jiangjian Xie,
No information about this author
Yujie Zhong,
No information about this author
Junguo Zhang
No information about this author
et al.
Ecological Informatics,
Journal Year:
2022,
Volume and Issue:
73, P. 101927 - 101927
Published: Nov. 25, 2022
Language: Английский
You Only Hear Once: A YOLO-like Algorithm for Audio Segmentation and Sound Event Detection
Applied Sciences,
Journal Year:
2022,
Volume and Issue:
12(7), P. 3293 - 3293
Published: March 24, 2022
Audio
segmentation
and
sound
event
detection
are
crucial
topics
in
machine
listening
that
aim
to
detect
acoustic
classes
their
respective
boundaries.
It
is
useful
for
audio-content
analysis,
speech
recognition,
audio-indexing,
music
information
retrieval.
In
recent
years,
most
research
articles
adopt
segmentation-by-classification.
This
technique
divides
audio
into
small
frames
individually
performs
classification
on
these
frames.
this
paper,
we
present
a
novel
approach
called
You
Only
Hear
Once
(YOHO),
which
inspired
by
the
YOLO
algorithm
popularly
adopted
Computer
Vision.
We
convert
of
boundaries
regression
problem
instead
frame-based
classification.
done
having
separate
output
neurons
presence
an
class
predict
its
start
end
points.
The
relative
improvement
F-measure
YOHO,
compared
state-of-the-art
Convolutional
Recurrent
Neural
Network,
ranged
from
1%
6%
across
multiple
datasets
detection.
As
YOHO
more
end-to-end
has
fewer
predict,
speed
inference
at
least
6
times
faster
than
addition,
as
predicts
directly,
post-processing
smoothing
about
7
faster.
Language: Английский
Unsupervised classification to improve the quality of a bird song recording dataset
Félix Michaud,
No information about this author
Jérôme Sueur,
No information about this author
Maxime Le Cesne
No information about this author
et al.
Ecological Informatics,
Journal Year:
2022,
Volume and Issue:
74, P. 101952 - 101952
Published: Dec. 12, 2022
Language: Английский
SILIC: A cross database framework for automatically extracting robust biodiversity information from soundscape recordings based on object detection and a tiny training dataset
Shih-Hung Wu,
No information about this author
Hsueh‐Wen Chang,
No information about this author
Ruey‐Shing Lin
No information about this author
et al.
Ecological Informatics,
Journal Year:
2021,
Volume and Issue:
68, P. 101534 - 101534
Published: Dec. 20, 2021
Passive
acoustic
monitoring
(PAM)
offers
many
advantages
comparing
with
other
survey
methods
and
gains
an
increasing
use
in
terrestrial
ecology,
but
the
massive
effort
needed
to
extract
species
information
from
a
large
number
of
recordings
limits
its
application.
The
convolutional
neural
network
(CNN)
has
been
demonstrated
high
performance
effectiveness
identifying
sound
sources
automatically.
However,
requiring
amount
training
data
still
constitutes
challenge.
Object
detection
is
used
detect
multiple
objects
photos
or
videos
effective
at
detecting
small
complex
context,
such
as
animal
sounds
spectrogram
shows
opportunity
build
good
model
dataset.
Therefore,
we
developed
Sound
Identification
Labeling
Intelligence
for
Creatures
(SILIC),
which
integrates
online
databases,
PAM
databases
object
detection-based
model,
extracting
on
soundscape
recordings.
We
six
owl
Taiwan
demonstrate
effectiveness,
efficiency
application
potential
SILIC
framework.
Using
only
786
labels
133
recordings,
our
successfully
identified
species'
collected
five
stations,
macro-average
AUC
0.89
mAP
0.83.
also
provided
time
frequency
information,
duration
bandwidth,
sounds.
To
best
knowledge,
this
first
that
algorithm
identify
wildlife
species.
With
sound-labeling
platform
embedded
novel
preprocessing
approach
(i.e.,
rainbow
mapping)
applied,
robust
species,
based
tiny
dataset
acquired
existing
databases.
can
help
expand
tool
evaluate
state
change
biodiversity
by,
example,
providing
temporal
resolution
continuous
presence
across
network.
Language: Английский
On the role of audio frontends in bird species recognition
Ecological Informatics,
Journal Year:
2024,
Volume and Issue:
81, P. 102573 - 102573
Published: March 26, 2024
Automatic
acoustic
monitoring
of
bird
populations
and
their
diversity
is
in
demand
for
conservation
planning.
This
requirement
recent
advances
deep
learning
have
inspired
sophisticated
species
recognizers.
However,
there
are
still
open
challenges
creating
reliable
systems
natural
habitats.
One
many
questions
whether
predominantly
used
audio
features
like
mel-filterbanks
appropriate
such
analysis
since
design
follows
human's
perception
the
sound,
making
them
susceptible
to
discarding
fine
details
from
other
animals'
vocalization.
Although
research
shows
that
different
work
better
particular
tasks
datasets,
it
hard
attribute
all
advantages
input
experimental
setups
vary.
A
general
solution
a
learnable
frontend
extract
task-relevant
raw
waveform
contains
information
features.
The
current
paper
thoroughly
analyzes
role
frontends
recognition,
which
helped
evaluate
adequacy
traditional
time-frequency
representations
(static
frontends)
capturing
relevant
In
particular,
this
main
performance
gain
comes
normalization
compression
operations
rather
than
data-driven
frequency
selectivity
functional
form
filters.
We
observed
no
significant
discrepancy
between
bands
learned
static
was
much
higher,
we
will
show
adequate
enhance
accuracy
by
more
16%
achieve
comparable
results
recognition.
Ablation
studies
under
configurations
detailed
noise
robustness
provide
evidence
conclusions,
validate
use
similar
prior
works,
guidelines
designing
future
code
available
at
https://github.com/houtan-ghaffari/bird-frontends.
Language: Английский
Deep Learning for Recognizing Bat Species and Bat Behavior in Audio Recordings
Published: June 1, 2023
Monitoring
and
mitigating
the
continuous
decline
of
biodiversity
is
a
key
global
challenge
to
preserve
existential
basis
human
life.
Bats
as
one
most
widespread
species
among
terrestrial
mammals
are
excellent
indicators
for
hence
health
an
ecosystem.
Typically,
bats
monitored
by
analyzing
ultrasonic
sound
recordings.
Stateof-the-art
deep
learning
approaches
automatic
bat
detection
recognition
commonly
rely
on
audio
spectrogram
classification
models
based
fixed
time
segments,
lacking
exact
call
boundaries.
While
great
progress
has
been
made
using
echolocation
calls,
little
attention
paid
behavior
that
provides
valuable
additional
information
about
populations.
In
this
paper,
we
present
novel
end-to-end
approach
neural
network
object
detection.
contrast
state-of-the-art
approaches,
presented
model
accurate
It
recognizes
19
distinguishes
between
three
different
behaviors:
orientation
(echolocation
calls),
hunting
(feeding
buzzes),
social
(social
calls).
Our
experiments
with
two
data
sets
show
our
method
clearly
outperforms
previous
recognition,
achieving
up
86.2%
mean
average
precision.
also
performs
very
well
reaching
98.4%,
98.3%,
95.6%
precision
recognizing
feeding
buzzes,
respectively.
Language: Английский
NEAL: an open-source tool for audio annotation
PeerJ,
Journal Year:
2023,
Volume and Issue:
11, P. e15913 - e15913
Published: Aug. 25, 2023
Passive
acoustic
monitoring
is
used
widely
in
ecology,
biodiversity,
and
conservation
studies.
Data
sets
collected
via
are
often
extremely
large
built
to
be
processed
automatically
using
artificial
intelligence
machine
learning
models,
which
aim
replicate
the
work
of
domain
experts.
These
being
supervised
algorithms,
need
trained
on
high
quality
annotations
produced
by
Since
experts
resource-limited,
a
cost-effective
process
for
annotating
audio
needed
get
maximal
use
out
data.
We
present
an
open-source
interactive
data
annotation
tool,
NEAL
(Nature+Energy
Audio
Labeller).
Built
R
associated
Shiny
framework,
tool
provides
reactive
environment
where
users
can
quickly
annotate
files
adjust
settings
that
change
corresponding
elements
user
interface.
The
app
has
been
designed
with
goal
having
both
expert
birders
citizen
scientists
contribute
projects.
popularity
flexibility
programming
bioacoustics
means
modified
other
bird
labelling
sets,
or
even
generic
tasks.
demonstrate
from
wind
farm
sites
across
Ireland.
Language: Английский
Sound Event Detection Based on Mel Spectral Envelope Estimation and Regression Detection
Maocun Tian,
No information about this author
Ruwei Li,
No information about this author
Weidong An
No information about this author
et al.
2022 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC),
Journal Year:
2023,
Volume and Issue:
10, P. 1 - 5
Published: Nov. 14, 2023
Binary
metrics
is
employed
in
traditional
deep
learning
methods
of
sound
event
detection(SED)
to
determine
the
presence
or
absence
an
event.
However,
these
binary
activity
inadequately
characterize
nuanced
states
events,
which
limiting
performance
current
detection
algorithms,
particularly
scenarios
involving
overlaps.
Concurrently,
conventional
algorithms
suffer
from
sluggish
speeds,
resulting
substantial
temporal
costs.
To
solve
above
problems,
a
novel
algorithm
based
on
amplitude
envelope
estimation
and
regression
detection(EERD)
proposed
this
paper.
In
algorithm,
firstly
Mel
Frequency
Cepstrum
Coefficient(MFCC)
audio
signal
estimatied,
thereby
enhanced
information
concerning
events
obtained.
Secondly,
regression-based
introduced
into
network
model,
so
that
algorithm's
reliance
post-processing
reduced
concomitantly
speed
improved.
Empirical
validation
conducted
TUT
dataset.
Experiments
show
paper
attains
superior
F-measure
for
contrast
benchmark
hence
heightened
substantiated.
At
same
time,
achieved
at
least
sixfold
faster
than
segmentation-by-class
approach.
Language: Английский