E-MuLA: An Ensemble Multi-Localized Attention Feature Extraction Network for Viral Protein Subcellular Localization DOI Creative Commons

Grace-Mercure Bakanina Kissanga,

Hasan Zulfiqar,

Shenghan Gao

et al.

Information, Journal Year: 2024, Volume and Issue: 15(3), P. 163 - 163

Published: March 13, 2024

Accurate prediction of subcellular localization viral proteins is crucial for understanding their functions and developing effective antiviral drugs. However, this task poses a significant challenge, especially when relying on expensive time-consuming classical biological experiments. In study, we introduced computational model called E-MuLA, based deep learning network that combines multiple local attention modules to enhance feature extraction from protein sequences. The superior performance the E-MuLA has been demonstrated through extensive comparisons with LSTM, CNN, AdaBoost, decision trees, KNN, other state-of-the-art methods. It noteworthy achieved an accuracy 94.87%, specificity 98.81%, sensitivity 84.18%, indicating potential become tool predicting virus localization.

Language: Английский

GPApred: The first computational predictor for identifying proteins with LPXTG-like motif using sequence-based optimal features DOI Creative Commons
Adeel Malik, Watshara Shoombuatong, Chang-Bae Kim

et al.

International Journal of Biological Macromolecules, Journal Year: 2022, Volume and Issue: 229, P. 529 - 538

Published: Dec. 31, 2022

The cell surface proteins of gram-positive bacteria are involved in many important biological functions, including the infection host cells. Owing to their virulent nature, these also considered strong candidates for potential drug or vaccine targets. Among various bacteria, LPXTG-like form a major class. These have highly conserved C-terminal wall sorting signal, which consists an LPXTG sequence motif, hydrophobic domain, and positively charged tail. targeted envelope by sortase enzyme via transpeptidation. A variety been experimentally characterized; however, number public databases has increased owing extensive bacterial genome sequencing without proper annotation. In absence experimental characterization, identifying annotating sequences is extremely challenging. Therefore, this study, we developed first machine learning-based predictor called GPApred, can identify from primary sequences. Using newly constructed benchmark dataset, explored different classifiers five feature encodings hybrids. Optimal features were derived using recursive elimination method, then trained support vector algorithm. performance models was evaluated independent datasets, final model (GPApred) selected based on consistency during cross-validation assessment. GPApred be effective tool predicting further employed functional characterization targeting. Availability: https://procarb.org/gpapred/.

Language: Английский

Citations

14

Computational prediction of promotors in Agrobacterium tumefaciens strain C58 by using the machine learning technique DOI Creative Commons
Hasan Zulfiqar,

Zahoor Ahmed,

Bakanina Kissanga Grace-Mercure

et al.

Frontiers in Microbiology, Journal Year: 2023, Volume and Issue: 14

Published: April 13, 2023

Promotors are those genomic regions on the upstream of genes, which bound by RNA polymerase for starting gene transcription. Because it is most critical element expression, recognition promoters crucial to understand regulation expression. This study aimed develop a machine learning-based model predict promotors in Agrobacterium tumefaciens ( A. ) strain C58. In model, promotor sequences were encoded three different kinds feature descriptors, namely, accumulated nucleotide frequency, k -mer composition, and binary encodings. The obtained features optimized using correlation mRMR-based algorithm. These inputted into random forest (RF) classifier discriminate from non-promotor examination 10-fold cross-validation showed that proposed could yield an overall accuracy 0.837. will provide help C58 strain.

Language: Английский

Citations

7

Computational prediction of protein folding rate using structural parameters and network centrality measures DOI Creative Commons

Saraswathy Nithiyanandam,

Vinoth Kumar Sangaraju,

Balachandran Manavalan

et al.

Computers in Biology and Medicine, Journal Year: 2023, Volume and Issue: 155, P. 106436 - 106436

Published: Feb. 15, 2023

Protein folding is a complex physicochemical process whereby polymer of amino acids samples numerous conformations in its unfolded state before settling on an essentially unique native three-dimensional (3D) structure. To understand this process, several theoretical studies have used set 3D structures, identified different structural parameters, and analyzed their relationships using the natural logarithmic protein rate (ln(kf)). Unfortunately, these parameters are specific to small proteins that not capable accurately predicting ln(kf) for both two-state (TS) non-two-state (NTS) proteins. overcome limitations statistical approach, few machine learning (ML)-based models been proposed limited training data. However, none methods can explain plausible mechanisms. In study, we evaluated predictive capabilities ten ML algorithms eight five network centrality measures based newly constructed datasets. comparison other nine regressors, support vector was found be most appropriate with mean absolute differences 1.856, 1.55, 1.745 TS, NTS, combined datasets, respectively. Furthermore, combining improves prediction performance compared individual indicating multiple factors involved process.

Language: Английский

Citations

7

TMSC-m7G: A transformer architecture based on multi-sense-scaled embedding features and convolutional neural network to identify RNA N7-methylguanosine sites DOI Creative Commons
Shengli Zhang,

Yujie Xu,

Yunyun Liang

et al.

Computational and Structural Biotechnology Journal, Journal Year: 2023, Volume and Issue: 23, P. 129 - 139

Published: Dec. 1, 2023

RNA N7-methylguanosine (m7G) is a crucial chemical modification of molecules, whose principal duty to maintain function and protein translation. Studying predicting sites aid in comprehending the biological development new drug therapy regimens. In present scenario, efficacy techniques, specifically deep learning machine learning, stands out prediction sites, leading improved accuracy identification efficiency. this study, we propose model leveraging transformer framework that integrates natural language processing predict m7G called TMSC-m7G. TMSC-m7G, combination multi-sense-scaled token embedding fixed-position used replace traditional word for extraction contextual information from sequences. Moreover, convolutional layer added encoder make up shortage local acquisition transformer. The model's robustness generalization are validated through 10-fold cross-validation an independent dataset test. Results demonstrate outstanding performance comparison most advanced models available. Among them, Accuracy TMSC-m7G reaches 98.70% 92.92% on benchmark dataset, respectively. To facilitate popularization use model, have developed intuitive online tool, which easily accessible free at http://39.105.212.81/.

Language: Английский

Citations

7

E-MuLA: An Ensemble Multi-Localized Attention Feature Extraction Network for Viral Protein Subcellular Localization DOI Creative Commons

Grace-Mercure Bakanina Kissanga,

Hasan Zulfiqar,

Shenghan Gao

et al.

Information, Journal Year: 2024, Volume and Issue: 15(3), P. 163 - 163

Published: March 13, 2024

Accurate prediction of subcellular localization viral proteins is crucial for understanding their functions and developing effective antiviral drugs. However, this task poses a significant challenge, especially when relying on expensive time-consuming classical biological experiments. In study, we introduced computational model called E-MuLA, based deep learning network that combines multiple local attention modules to enhance feature extraction from protein sequences. The superior performance the E-MuLA has been demonstrated through extensive comparisons with LSTM, CNN, AdaBoost, decision trees, KNN, other state-of-the-art methods. It noteworthy achieved an accuracy 94.87%, specificity 98.81%, sensitivity 84.18%, indicating potential become tool predicting virus localization.

Language: Английский

Citations

2