Modified Decision Tree with Custom Splitting Logic Improves Generalization across Multiple Brains’ Proteomic Data Sets of Alzheimer’s Disease DOI
Mark V. Ivanov,

Anna S. Kopeykina,

Elizaveta M. Kazakova

et al.

Journal of Proteome Research, Journal Year: 2025, Volume and Issue: unknown

Published: Feb. 21, 2025

Many factors negatively affect a generalization of the findings in discovery proteomics. They include differentiation between patient cohorts, variety experimental conditions, etc. We presented machine-learning-based workflow for proteomics data analysis, aiming at improving generalizability across multiple sets. In particular, we customized decision tree model by introducing new parameter, min_groups_leaf, which regulates presence samples from each set inside model's leaves. Further, analyzed trend feature importance's curve as function novel parameter selection to list proteins with significantly improved generalization. The developed was tested using five proteomic sets obtained post-mortem human brain Alzheimer's disease. consisted 535 LC–MS/MS acquisition files. results were two different pipelines processing: (1) MS1-only processing based on DirectMS1 search engine and (2) standard MS/MS-based one. Using workflow, found seven expression patterns that unique asymptomatic Alzheimer patients. Two them, Serotransferrin TRFE DNA repair nuclease APEX1, may be potentially important explaining lack dementia patients neuritic plaques neurofibrillary tangles.

Language: Английский

Modified Decision Tree with Custom Splitting Logic Improves Generalization across Multiple Brains’ Proteomic Data Sets of Alzheimer’s Disease DOI
Mark V. Ivanov,

Anna S. Kopeykina,

Elizaveta M. Kazakova

et al.

Journal of Proteome Research, Journal Year: 2025, Volume and Issue: unknown

Published: Feb. 21, 2025

Many factors negatively affect a generalization of the findings in discovery proteomics. They include differentiation between patient cohorts, variety experimental conditions, etc. We presented machine-learning-based workflow for proteomics data analysis, aiming at improving generalizability across multiple sets. In particular, we customized decision tree model by introducing new parameter, min_groups_leaf, which regulates presence samples from each set inside model's leaves. Further, analyzed trend feature importance's curve as function novel parameter selection to list proteins with significantly improved generalization. The developed was tested using five proteomic sets obtained post-mortem human brain Alzheimer's disease. consisted 535 LC–MS/MS acquisition files. results were two different pipelines processing: (1) MS1-only processing based on DirectMS1 search engine and (2) standard MS/MS-based one. Using workflow, found seven expression patterns that unique asymptomatic Alzheimer patients. Two them, Serotransferrin TRFE DNA repair nuclease APEX1, may be potentially important explaining lack dementia patients neuritic plaques neurofibrillary tangles.

Language: Английский

Citations

0