
Scientific Reports, Год журнала: 2024, Номер 14(1)
Опубликована: Ноя. 6, 2024
Despite the end of global Coronavirus Disease 2019 (COVID-19) pandemic, risk factors for COVID-19 severity continue to be a pivotal area research. Specifically, studying impact genomic diversity Severe Acute Respiratory Syndrome 2 (SARS-CoV-2) on is crucial predicting severe outcomes. Therefore, this study aimed investigate SARS-CoV-2 genome sequence, genotype, patient age, gender, and vaccination status COVID-19, develop accurate robust prediction models. The training set (n = 12,038), primary testing 4,006), secondary 2,845) consist sequences with information, which were obtained from Global Initiative Sharing all Individual Data (GISAID) spanning over four years. Four machine learning methods employed construct By extracting features, optimizing model parameters, integrating models, improved accuracy. Furthermore, Shapley Additive exPlanes (SHAP) was applied analyze interpretability identify factors, providing insights management cases. proposed ensemble achieved an F-score 88.842% Area Under Curve (AUC) 0.956 dataset. In addition such as status, 40 amino acid site mutation characteristics identified have significant COVID-19. This work has potential facilitate early identification patients high risks illness, thus effectively reducing rates cases mortality.
Язык: Английский