With
the
demand
for
music
continuing
to
grow
as
people
seek
variety
and
personal
resonance,
many
works
focus
on
generation.
In
this
study,
we
propose
GENPIA,
a
genre-conditioned
piano
generation
system.
The
system
encompasses
Anime,
R&B,
Jazz,
Classical
genres.
To
build
our
system,
collect
label
audio
data
of
various
genres
specific
objective
research.
REMI
representation
with
genre
information
extension
is
applied
during
pre-processing
present
better
structure.
Transformer-XL
implemented
model
learn
knowledge
about
extended
generate
desired
output
audio.
An
external
dataset,
called
Ailabs.
tw
lK7,
utilized
pre-training
purposes.
results
obtained
from
listening
questionnaire
show
that
GENPIA
can
pieces
conditioned
different
compared
prior
state-of-the-art
work.
IEEE Open Journal of the Communications Society,
Journal Year:
2024,
Volume and Issue:
5, P. 4691 - 4709
Published: Jan. 1, 2024
Networked
Music
Performances
(NMPs)
involve
geographically-displaced
musicians
performing
together
in
real-time.
To
date,
scarce
research
has
been
conducted
on
how
to
integrate
NMP
systems
with
immersive
audio
rendering
techniques
able
enrich
the
musicians'
perception
of
sharing
same
acoustic
environment.
In
addition,
use
wireless
technologies
for
NMPs
largely
overlooked.
this
paper,
we
propose
two
architectures
Immersive
(INMPs),
which
differ
physical
positions
computing
blocks
constituting
3D
toolchain.
These
leverage
a
backend
specifically
conceived
support
remote
musical
practices
via
Software
Defined
Networking
methods,
and
takes
advantage
orchestration,
slicing,
Multi-access
Edge
Computing
(MEC)
capabilities
5G.
Moreover,
illustrate
machine
learning
algorithms
network
traffic
prediction
packet
loss
concealment.
Traffic
predictions
at
multiple
time
scales
are
utilized
achieve
an
optimized
placement
Virtual
Network
Functions
hosting
mixing
processing
functionalities
within
available
MEC
sites,
depending
users'
geographical
locations
current
load
conditions.
An
analysis
technical
requirements
INMPs
using
is
provided,
along
their
performance
assessment
simulators.
IEEE Access,
Journal Year:
2024,
Volume and Issue:
12, P. 62818 - 62833
Published: Jan. 1, 2024
The
use
of
internet-based
and
networking
technology
in
computer
music
systems
has
greatly
increased
the
past
few
years.
Such
efforts
fall
remits
emerging
filed
Internet
Musical
Things
(IoMusT),
extension
paradigm
to
musical
domain.
Given
increasing
importance
connected
devices
domain,
it
is
essential
reflect
on
relationship
between
such
sustainability
at
environmental
social
levels.
In
this
paper,
we
address
aspect
from
two
perspectives:
1)
how
design
IoMusT
a
sustainable
way,
2)
can
support
sustainability.
To
end,
relied
three
lenses,
combining
literature
green
IoT
(lens
1),
Sustainable
HCI
2),
Development
Goals
United
Nations
3).
By
these
developed
five
strategies
for
IoMusT,
which
are
extensively
presented
discussed
providing
critical
reflections.
In
this
paper,
we
propose
a
deep-learning-based
system
for
the
task
of
deepfake
audio
detection.
This
work
is
part
proposed
toolchain
speech
analysis
in
EUCINF
(EUropean
Cyber
and
INFormation)
project,
which
an
European
project
with
multiple
partners
Europe.
particular,
raw
input
first
transformed
into
various
spectrograms
using
three
transformation
methods
Short-time
Fourier
Transform
(STFT),
Constant-Q
(CQT),
Wavelet
(WT)
combined
different
auditory-
based
filters
Mel,
Gammatone,
linear
(LF),
discrete
cosine
transform
(DCT).
Given
spectrograms,
evaluate
wide
range
classification
models
on
deep
learning
approaches.
The
approach
to
train
our
baseline
CNN-based
model
(CNN-
baseline),
RNN-based
(RNN-baseline),
C-RNN
(C-RNN
baseline).
Meanwhile,
second
apply
transfer
from
computer
vision
such
as
ResNet-
18,
MobileNet-V3,
EfficientNet-BO,
DenseNet-121,
SuffleNet-
V2,
Swint,
Convnext-
Tiny,
GoogLeNet,
MNASsnet,
Reg-
Net.
third
approach,
leverage
state-of-the-art
pre-trained
Whisper,
Seamless,
Speechbrain,
Pyannote
extract
embed
dings
spectrograms.
Then,
are
explored
by
Multilayer
perceptron
(MLP)
detect
fake
or
real
samples.
Finally,
high-performance
these
approaches
fused
achieve
best
performance.
We
evaluated
ASVspoof
2019
benchmark
dataset.
Our
ensemble
achieved
Equal
Error
Rate
(EER)
0.03,
highly
competitive
top-performing
systems
ASVspoofing
challenge.
Experimental
results
also
highlight
potential
selective
enhance
performance
The
"Musical
Metaverse"
(MM)
promises
a
new
dimension
of
musical
expression,
creation,
and
education
through
shared
virtual
environments.
However,
the
research
on
MM
is
in
its
infancy.
Little
work
has
been
done
to
understand
capabilities
user
experience.
One
cause
can
be
found
lack
technologies
capable
providing
high-quality
audio
streaming,
complex
enough
interactions
within
worlds.
Two
promising
candidates
for
bridging
these
gaps
are
web
such
as
WebXR
Web
Audio,
whose
combination
potentially
allow
more
accessible
interoperable
networked
immersive
experiences.
To
explore
this
possibility,
we
developed
two
prototypes
playgrounds.
We
leveraged
Networked-AFrame,
Audio
with
Tone.js
Essentia.js
create
test
sonic
experiences
that
conveniently
run
browsers
integrated
into
commercially
available
standalone
Head-Mounted
Displays.
first
playground
focuses
facilitating
creation
multi-user
application
real-time
sound
synthesis
binaural
rendering.
second
explores
analysis
music
information
retrieval
creating
audio-reactive
A
preliminary
evaluation
playgrounds
also
presented,
which
revealed
some
usability
issues
as:
accessing
URLs
headset,
ambiguity
ownership
tools,
impact
algorithms
perceived
audio-visual
latency.
Finally,
paper
outlines
future
discusses
possible
developments
applications
web-based
Computers in Human Behavior Reports,
Journal Year:
2024,
Volume and Issue:
15, P. 100451 - 100451
Published: July 4, 2024
People
often
use
audio-only
communication
to
connect
with
others.
Spatialization
of
audio
has
been
previously
found
improve
immersion,
presence,
and
social
presence
during
conversations.
We
propose
that
spatial
improves
connectedness
between
dyads.
Participants
engaged
in
three
8-min
semi-structured
conversations
an
acquainted
partner
conditions:
in-person
communication,
monaural
communication.
Using
Media
Naturalness
Theory
as
our
theoretical
framework,
we
the
benefited
aspects
connectedness.
While
yielded
greatest
connectedness,
better
facilitated
than
traditional
Spatial
improved
feelings
being
physically
same
room
on
wavelength
produced
more
nonverbal
behaviors
associated
rapport
building
Spatial
audio
technologies
are
becoming
a
fundamental
requirement
for
guaranteeing
immersive
auditory
experiences
in
various
applications
such
as
Augmented
and
Virtual
Reality
up
to
the
Metaverse.
With
rise
of
mobile
edge
computing,
there
is
growing
interest
exploring
spatial
algorithms
performance
on
infrastructures.
This
paper
presents
an
evaluation
two
different
potential
offloading
real
time
processing
Mobile
Edge
Computing
(MEC)
infrastructure.
The
presented
results
were
obtained
through
evaluations
performed
operator
network,
they
demonstrate
feasibility
computation
at
network
edge.
No
difference
terms
between
was
observed
under
assumed
scenario.
Monitoring
biodiversity
at
scale
is
challenging.
De-tecting
and
identifying
species
in
fine
grained
taxonomies
requires
highly
accurate
machine
learning
(ML)
methods.
Training
such
models
large
high
quality
data
sets.
And
deploying
these
to
low
power
devices
novel
compression
techniques
model
architectures.
While
classification
methods
have
profited
from
sets
advances
ML
methods,
particular
neural
networks,
state-of-the-art
remains
difficult.
Here
we
present
a
comprehensive
empirical
comparison
of
various
tinyML
network
architectures
for
classification.
We
focus
on
the
example
bird
song
detection,
more
concretely
set
curated
studying
corn
bunting
species.
publish
along
with
all
code
experiments
this
study.
In
our
comparatively
evaluate
predictive
performance,
memory
time
complexity
spectrogram-based
recent
approaches
operating
directly
raw
audio
signal.
Our
results
demonstrate
that
TinyChirp
-
approach
can
robustly
detect
individual
precisions
over
0.98
reduce
energy
consumption
compared
state-of-the-art,
an
autonomous
recording
unit
lifetime
single
battery
charge
extended
2
weeks
8
weeks,
almost
entire
season.
SoundSignature
is
a
music
application
that
integrates
custom
OpenAI
Assistant
to
analyze
users'
favorite
songs.
The
system
incorporates
state-of-the-art
Music
Information
Retrieval
(MIR)
Python
packages
combine
extracted
acoustic/musical
features
with
the
assistant's
extensive
knowledge
of
artists
and
bands.
Capitalizing
on
this
combined
knowledge,
leverages
semantic
audio
principles
from
emerging
Internet
Sounds
(IoS)
ecosystem,
integrating
MIR
AI
provide
users
personalized
insights
into
acoustic
properties
their
music,
akin
musical
preference
personality
report.
Users
can
then
interact
chatbot
explore
deeper
inquiries
about
analyses
performed
how
they
relate
taste.
This
interactivity
transforms
application,
acting
not
only
as
an
informative
resource
familiar
and/or
songs,
but
also
educational
platform
enables
deepen
understanding
features,
theory,
commonly
used
in
signal
processing,
behind
music.
Beyond
general
usability,
several
well-established
open-source
musician-specific
tools,
such
chord
recognition
algorithm
(CREMA),
source
separation
(DEMUCS),
audio-to-MIDI
converter
(basic-pitch).
These
allow
without
coding
skills
access
advanced,
processing
algorithms
simply
by
interacting
(e.g.,
you
give
me
stems
song?).
In
paper,
we
highlight
application's
innovative
potential,
present
findings
pilot
user
study
evaluates
its
efficacy
usability.
Neural
Vocoders
convert
time-frequency
representations,
such
as
mel-spectrograms,
into
corresponding
time
representations.
are
essential
for
generative
applications
in
audio
(e.g.
text-to-speech
and
text-to-audio).
This
paper
presents
a
scalable
vocoder
architecture
small-footprint
edge
devices,
inspired
by
Vocos
adapted
with
XiNets
PhiNets.
We
test
the
developed
model
capabilities
qualitatively
quantitatively
on
single-speaker
multi-speaker
datasets
benchmark
inference
speed
memory
consumption
four
microcontrollers.
Additionally,
we
study
power
an
ARM
Cortex-M7-powered
board.
Our
results
demonstrate
feasibility
of
deploying
neural
vocoders
resource-constrained
potentially
enabling
new
Internet
Sounds
(IoS)
Embedded
Audio
scenarios.
best-performing
achieves
MOS
score
3.95/5
while
utilizing
1.5MiB
FLASH
517KiB
RAM
consuming
252
mW
1s
clip
inference.