Estimation and validation of solubility of recombinant protein in E. coli strains via various advanced machine learning models DOI Creative Commons
Wael A. Mahdi, Adel Alhowyan, Ahmad J. Obaidullah

et al.

Scientific Reports, Journal Year: 2025, Volume and Issue: 15(1)

Published: April 14, 2025

This study presents a comprehensive approach to predicting solubility of recombinant protein in four E. coli samples by employing machine learning techniques and optimization algorithms. Various models, including AdaBoost, Decision Tree Regression (DT), Gaussian Process (GPR), K-Nearest Neighbors (KNN) are applied capture the intricate relationships between experimental factors solubility. The integration these models within an AdaBoost framework, coupled with advanced hyperparameter tuning via Firefly Algorithm (FA), demonstrates novel strategy for improving predictive accuracy model robustness. Key preprocessing such as Histogram-Based Outlier Detection (HBOD) Z-score normalization employed ensure data integrity consistency. utilizing 5-fold cross-validation fitness function, adeptly navigates complex spaces, enhancing performance across diverse partitions. (ADA-GPR) established be superior alternatives ADA-DT ADA-KNN, demonstrating great through high R2 test scores low Mean Squared Error. With standard deviation 0.05188 cross-validation, ADA-GPR demonstrated exceptional consistency robust generalization Using hybrid optimization, this sheds light on critical variables influencing solubility, providing scalable effective solution modeling bioprocesses.

Language: Английский

Estimation and validation of solubility of recombinant protein in E. coli strains via various advanced machine learning models DOI Creative Commons
Wael A. Mahdi, Adel Alhowyan, Ahmad J. Obaidullah

et al.

Scientific Reports, Journal Year: 2025, Volume and Issue: 15(1)

Published: April 14, 2025

This study presents a comprehensive approach to predicting solubility of recombinant protein in four E. coli samples by employing machine learning techniques and optimization algorithms. Various models, including AdaBoost, Decision Tree Regression (DT), Gaussian Process (GPR), K-Nearest Neighbors (KNN) are applied capture the intricate relationships between experimental factors solubility. The integration these models within an AdaBoost framework, coupled with advanced hyperparameter tuning via Firefly Algorithm (FA), demonstrates novel strategy for improving predictive accuracy model robustness. Key preprocessing such as Histogram-Based Outlier Detection (HBOD) Z-score normalization employed ensure data integrity consistency. utilizing 5-fold cross-validation fitness function, adeptly navigates complex spaces, enhancing performance across diverse partitions. (ADA-GPR) established be superior alternatives ADA-DT ADA-KNN, demonstrating great through high R2 test scores low Mean Squared Error. With standard deviation 0.05188 cross-validation, ADA-GPR demonstrated exceptional consistency robust generalization Using hybrid optimization, this sheds light on critical variables influencing solubility, providing scalable effective solution modeling bioprocesses.

Language: Английский

Citations

0