Robustness study of speaker recognition based on ECAPA-TDNN-CIFG DOI
Chunli Wang,

Linming Xu,

Hongxin Zhu

и другие.

Journal of Computational Methods in Sciences and Engineering, Год журнала: 2024, Номер 24(4-5), С. 3287 - 3296

Опубликована: Авг. 14, 2024

This paper describes a study on speaker recognition using the ECAPA-TDNN architecture, which stands for Extended Context-Aware Parallel Aggregations Time-Delay Neural Network. It utilizes X-vectors, method extracting features by converting speech into fixed-length vectors, and introduces squeeze-and-excitation block to model dependencies between channels. In order better explore temporal relationships in context of improve algorithm’s generalization performance complex acoustic scenarios, this adds input gates forget combining them with CIFG (Convolutional LSTM Input Forget Gates) modules. These are embedded residual structure multi-layer aggregated features. A sub-center Arcface, an improved loss function based is used selecting sub-centers subclass discrimination, retaining advantageous enhance intra-class compactness strengthen robustness network. Experimental results demonstrate that ECAPA-TDNN-CIFG outperforms baseline model, yielding more accurate efficient results.

Язык: Английский

Exploring the Effectiveness of Deep Learning in Audio Compression and Restoration DOI

S. Pushparani,

K. Sashi Rekha,

V. M. Sivagami

и другие.

Опубликована: Март 22, 2024

Язык: Английский

Процитировано

1

Robustness study of speaker recognition based on ECAPA-TDNN-CIFG DOI
Chunli Wang,

Linming Xu,

Hongxin Zhu

и другие.

Journal of Computational Methods in Sciences and Engineering, Год журнала: 2024, Номер 24(4-5), С. 3287 - 3296

Опубликована: Авг. 14, 2024

This paper describes a study on speaker recognition using the ECAPA-TDNN architecture, which stands for Extended Context-Aware Parallel Aggregations Time-Delay Neural Network. It utilizes X-vectors, method extracting features by converting speech into fixed-length vectors, and introduces squeeze-and-excitation block to model dependencies between channels. In order better explore temporal relationships in context of improve algorithm’s generalization performance complex acoustic scenarios, this adds input gates forget combining them with CIFG (Convolutional LSTM Input Forget Gates) modules. These are embedded residual structure multi-layer aggregated features. A sub-center Arcface, an improved loss function based is used selecting sub-centers subclass discrimination, retaining advantageous enhance intra-class compactness strengthen robustness network. Experimental results demonstrate that ECAPA-TDNN-CIFG outperforms baseline model, yielding more accurate efficient results.

Язык: Английский

Процитировано

0