Journal of Computational Methods in Sciences and Engineering, Год журнала: 2024, Номер 24(4-5), С. 3287 - 3296
Опубликована: Авг. 14, 2024
This paper describes a study on speaker recognition using the ECAPA-TDNN architecture, which stands for Extended Context-Aware Parallel Aggregations Time-Delay Neural Network. It utilizes X-vectors, method extracting features by converting speech into fixed-length vectors, and introduces squeeze-and-excitation block to model dependencies between channels. In order better explore temporal relationships in context of improve algorithm’s generalization performance complex acoustic scenarios, this adds input gates forget combining them with CIFG (Convolutional LSTM Input Forget Gates) modules. These are embedded residual structure multi-layer aggregated features. A sub-center Arcface, an improved loss function based is used selecting sub-centers subclass discrimination, retaining advantageous enhance intra-class compactness strengthen robustness network. Experimental results demonstrate that ECAPA-TDNN-CIFG outperforms baseline model, yielding more accurate efficient results.
Язык: Английский