GENPIA: A Genre-Conditioned Piano Music Generation System DOI
Quoc-Viet Nguyen,

Hao-Wei Lai,

Khanh-Duy Nguyen

et al.

Published: Sept. 30, 2024

With the demand for music continuing to grow as people seek variety and personal resonance, many works focus on generation. In this study, we propose GENPIA, a genre-conditioned piano generation system. The system encompasses Anime, R&B, Jazz, Classical genres. To build our system, collect label audio data of various genres specific objective research. REMI representation with genre information extension is applied during pre-processing present better structure. Transformer-XL implemented model learn knowledge about extended generate desired output audio. An external dataset, called Ailabs. tw lK7, utilized pre-training purposes. results obtained from listening questionnaire show that GENPIA can pieces conditioned different compared prior state-of-the-art work.

Language: Английский

5G-Enabled Internet of Musical Things Architectures for Remote Immersive Musical Practices DOI Creative Commons
Luca Turchet, Claudia Rinaldi, Carlo Centofanti

et al.

IEEE Open Journal of the Communications Society, Journal Year: 2024, Volume and Issue: 5, P. 4691 - 4709

Published: Jan. 1, 2024

Networked Music Performances (NMPs) involve geographically-displaced musicians performing together in real-time. To date, scarce research has been conducted on how to integrate NMP systems with immersive audio rendering techniques able enrich the musicians' perception of sharing same acoustic environment. In addition, use wireless technologies for NMPs largely overlooked. this paper, we propose two architectures Immersive (INMPs), which differ physical positions computing blocks constituting 3D toolchain. These leverage a backend specifically conceived support remote musical practices via Software Defined Networking methods, and takes advantage orchestration, slicing, Multi-access Edge Computing (MEC) capabilities 5G. Moreover, illustrate machine learning algorithms network traffic prediction packet loss concealment. Traffic predictions at multiple time scales are utilized achieve an optimized placement Virtual Network Functions hosting mixing processing functionalities within available MEC sites, depending users' geographical locations current load conditions. An analysis technical requirements INMPs using is provided, along their performance assessment simulators.

Language: Английский

Citations

12

Sustainable Internet of Musical Things: Strategies to Account for Environmental and Social Sustainability in Network-Based Interactive Music Systems DOI Creative Commons
Raul Masu, Nicoló Merendino, Antonio Rodà

et al.

IEEE Access, Journal Year: 2024, Volume and Issue: 12, P. 62818 - 62833

Published: Jan. 1, 2024

The use of internet-based and networking technology in computer music systems has greatly increased the past few years. Such efforts fall remits emerging filed Internet Musical Things (IoMusT), extension paradigm to musical domain. Given increasing importance connected devices domain, it is essential reflect on relationship between such sustainability at environmental social levels. In this paper, we address aspect from two perspectives: 1) how design IoMusT a sustainable way, 2) can support sustainability. To end, relied three lenses, combining literature green IoT (lens 1), Sustainable HCI 2), Development Goals United Nations 3). By these developed five strategies for IoMusT, which are extensively presented discussed providing critical reflections.

Language: Английский

Citations

4

Deepfake Audio Detection Using Spectrogram-based Feature and Ensemble of Deep Learning Models DOI

Lam Pham,

P. C. B. Lam, Truong Thanh Nguyen

et al.

Published: Sept. 30, 2024

In this paper, we propose a deep-learning-based system for the task of deepfake audio detection. This work is part proposed toolchain speech analysis in EUCINF (EUropean Cyber and INFormation) project, which an European project with multiple partners Europe. particular, raw input first transformed into various spectrograms using three transformation methods Short-time Fourier Transform (STFT), Constant-Q (CQT), Wavelet (WT) combined different auditory- based filters Mel, Gammatone, linear (LF), discrete cosine transform (DCT). Given spectrograms, evaluate wide range classification models on deep learning approaches. The approach to train our baseline CNN-based model (CNN- baseline), RNN-based (RNN-baseline), C-RNN (C-RNN baseline). Meanwhile, second apply transfer from computer vision such as ResNet- 18, MobileNet-V3, EfficientNet-BO, DenseNet-121, SuffleNet- V2, Swint, Convnext- Tiny, GoogLeNet, MNASsnet, Reg- Net. third approach, leverage state-of-the-art pre-trained Whisper, Seamless, Speechbrain, Pyannote extract embed dings spectrograms. Then, are explored by Multilayer perceptron (MLP) detect fake or real samples. Finally, high-performance these approaches fused achieve best performance. We evaluated ASVspoof 2019 benchmark dataset. Our ensemble achieved Equal Error Rate (EER) 0.03, highly competitive top-performing systems ASVspoofing challenge. Experimental results also highlight potential selective enhance performance

Language: Английский

Citations

3

Musical Metaverse Playgrounds: exploring the design of shared virtual sonic experiences on web browsers DOI
Alberto Boem, Luca Turchet

Published: Oct. 26, 2023

The "Musical Metaverse" (MM) promises a new dimension of musical expression, creation, and education through shared virtual environments. However, the research on MM is in its infancy. Little work has been done to understand capabilities user experience. One cause can be found lack technologies capable providing high-quality audio streaming, complex enough interactions within worlds. Two promising candidates for bridging these gaps are web such as WebXR Web Audio, whose combination potentially allow more accessible interoperable networked immersive experiences. To explore this possibility, we developed two prototypes playgrounds. We leveraged Networked-AFrame, Audio with Tone.js Essentia.js create test sonic experiences that conveniently run browsers integrated into commercially available standalone Head-Mounted Displays. first playground focuses facilitating creation multi-user application real-time sound synthesis binaural rendering. second explores analysis music information retrieval creating audio-reactive A preliminary evaluation playgrounds also presented, which revealed some usability issues as: accessing URLs headset, ambiguity ownership tools, impact algorithms perceived audio-visual latency. Finally, paper outlines future discusses possible developments applications web-based

Language: Английский

Citations

9

Social connectedness in spatial audio calling contexts DOI Creative Commons
Vanessa Y. Oviedo, Khia A. Johnson,

Madeline Huberth

et al.

Computers in Human Behavior Reports, Journal Year: 2024, Volume and Issue: 15, P. 100451 - 100451

Published: July 4, 2024

People often use audio-only communication to connect with others. Spatialization of audio has been previously found improve immersion, presence, and social presence during conversations. We propose that spatial improves connectedness between dyads. Participants engaged in three 8-min semi-structured conversations an acquainted partner conditions: in-person communication, monaural communication. Using Media Naturalness Theory as our theoretical framework, we the benefited aspects connectedness. While yielded greatest connectedness, better facilitated than traditional Spatial improved feelings being physically same room on wavelength produced more nonverbal behaviors associated rapport building

Language: Английский

Citations

2

Edge-Enabled Spatial Audio Service: Implementation and Performance Analysis on a MEC 5G Infrastructure DOI
Federico Martusciello, Carlo Centofanti, Claudia Rinaldi

et al.

Published: Oct. 26, 2023

Spatial audio technologies are becoming a fundamental requirement for guaranteeing immersive auditory experiences in various applications such as Augmented and Virtual Reality up to the Metaverse. With rise of mobile edge computing, there is growing interest exploring spatial algorithms performance on infrastructures. This paper presents an evaluation two different potential offloading real time processing Mobile Edge Computing (MEC) infrastructure. The presented results were obtained through evaluations performed operator network, they demonstrate feasibility computation at network edge. No difference terms between was observed under assumed scenario.

Language: Английский

Citations

5

Musical atmosphere as a (dis)tractive facet of user interfaces: An experiment on sustainable consumption decisions in eCommerce DOI
Hui Xu, Yang Wu, Juho Hamari

et al.

International Journal of Information Management, Journal Year: 2023, Volume and Issue: 75, P. 102715 - 102715

Published: Oct. 5, 2023

Language: Английский

Citations

4

TinyChirp: Bird Song Recognition Using TinyML Models on Low-power Wireless Acoustic Sensors DOI

Ziling Huang,

Adrien Tousnakhoff,

Polina Kozyr

et al.

Published: Sept. 30, 2024

Monitoring biodiversity at scale is challenging. De-tecting and identifying species in fine grained taxonomies requires highly accurate machine learning (ML) methods. Training such models large high quality data sets. And deploying these to low power devices novel compression techniques model architectures. While classification methods have profited from sets advances ML methods, particular neural networks, state-of-the-art remains difficult. Here we present a comprehensive empirical comparison of various tinyML network architectures for classification. We focus on the example bird song detection, more concretely set curated studying corn bunting species. publish along with all code experiments this study. In our comparatively evaluate predictive performance, memory time complexity spectrogram-based recent approaches operating directly raw audio signal. Our results demonstrate that TinyChirp - approach can robustly detect individual precisions over 0.98 reduce energy consumption compared state-of-the-art, an autonomous recording unit lifetime single battery charge extended 2 weeks 8 weeks, almost entire season.

Language: Английский

Citations

1

SoundSignature: What Type of Music do you Like? DOI

Brandon James Carone,

Pablo Ripollés

Published: Sept. 30, 2024

SoundSignature is a music application that integrates custom OpenAI Assistant to analyze users' favorite songs. The system incorporates state-of-the-art Music Information Retrieval (MIR) Python packages combine extracted acoustic/musical features with the assistant's extensive knowledge of artists and bands. Capitalizing on this combined knowledge, leverages semantic audio principles from emerging Internet Sounds (IoS) ecosystem, integrating MIR AI provide users personalized insights into acoustic properties their music, akin musical preference personality report. Users can then interact chatbot explore deeper inquiries about analyses performed how they relate taste. This interactivity transforms application, acting not only as an informative resource familiar and/or songs, but also educational platform enables deepen understanding features, theory, commonly used in signal processing, behind music. Beyond general usability, several well-established open-source musician-specific tools, such chord recognition algorithm (CREMA), source separation (DEMUCS), audio-to-MIDI converter (basic-pitch). These allow without coding skills access advanced, processing algorithms simply by interacting (e.g., you give me stems song?). In paper, we highlight application's innovative potential, present findings pilot user study evaluates its efficacy usability.

Language: Английский

Citations

1

TinyVocos: Neural Vocoders on MCUs DOI

Stefano Ciapponi,

Francesco Paissan, Alberto Ancilotto

et al.

Published: Sept. 30, 2024

Neural Vocoders convert time-frequency representations, such as mel-spectrograms, into corresponding time representations. are essential for generative applications in audio (e.g. text-to-speech and text-to-audio). This paper presents a scalable vocoder architecture small-footprint edge devices, inspired by Vocos adapted with XiNets PhiNets. We test the developed model capabilities qualitatively quantitatively on single-speaker multi-speaker datasets benchmark inference speed memory consumption four microcontrollers. Additionally, we study power an ARM Cortex-M7-powered board. Our results demonstrate feasibility of deploying neural vocoders resource-constrained potentially enabling new Internet Sounds (IoS) Embedded Audio scenarios. best-performing achieves MOS score 3.95/5 while utilizing 1.5MiB FLASH 517KiB RAM consuming 252 mW 1s clip inference.

Language: Английский

Citations

1