E-MuLA: An Ensemble Multi-Localized Attention Feature Extraction Network for Viral Protein Subcellular Localization DOI Creative Commons

Grace-Mercure Bakanina Kissanga,

Hasan Zulfiqar,

Shenghan Gao

et al.

Information, Journal Year: 2024, Volume and Issue: 15(3), P. 163 - 163

Published: March 13, 2024

Accurate prediction of subcellular localization viral proteins is crucial for understanding their functions and developing effective antiviral drugs. However, this task poses a significant challenge, especially when relying on expensive time-consuming classical biological experiments. In study, we introduced computational model called E-MuLA, based deep learning network that combines multiple local attention modules to enhance feature extraction from protein sequences. The superior performance the E-MuLA has been demonstrated through extensive comparisons with LSTM, CNN, AdaBoost, decision trees, KNN, other state-of-the-art methods. It noteworthy achieved an accuracy 94.87%, specificity 98.81%, sensitivity 84.18%, indicating potential become tool predicting virus localization.

Language: Английский

Accurately identifying hemagglutinin using sequence information and machine learning methods DOI Creative Commons

Xidan Zou,

Liping Ren, Peiling Cai

et al.

Frontiers in Medicine, Journal Year: 2023, Volume and Issue: 10

Published: Oct. 31, 2023

Hemagglutinin (HA) is responsible for facilitating viral entry and infection by promoting the fusion between host membrane virus. Given its significance in process of influenza virus infestation, HA has garnered attention as a target drug vaccine development. Thus, accurately identifying crucial development targeted drugs. However, identification using in-silico methods still lacking. This study aims to design computational model identify HA.

Language: Английский

Citations

73

Deepm5C: A deep-learning-based hybrid framework for identifying human RNA N5-methylcytosine sites using a stacking strategy DOI Creative Commons
Md Mehedi Hasan, Sho Tsukiyama, Jae Youl Cho

et al.

Molecular Therapy, Journal Year: 2022, Volume and Issue: 30(8), P. 2856 - 2867

Published: May 6, 2022

As one of the most prevalent post-transcriptional epigenetic modifications, N5-methylcytosine (m5C) plays an essential role in various cellular processes and disease pathogenesis. Therefore, it is important accurately identify m5C modifications order to gain a deeper understanding other possible functional mechanisms. Although few computational methods have been proposed, their respective models developed using small training datasets. Hence, practical application quite limited genome-wide detection. To overcome existing limitations, we propose Deepm5C, bioinformatics method for identifying RNA sites throughout human genome. develop constructed novel benchmarking dataset investigated mixture three conventional feature-encoding algorithms feature derived from word-embedding approaches. Afterward, four variants deep-learning classifiers commonly used were employed trained with encodings, ultimately obtaining 32 baseline models. A stacking strategy effectively utilized by integrating predicted output optimal one-dimensional (1D) convolutional neural network. result, Deepm5C predictor achieved excellent performance during cross-validation Matthews correlation coefficient accuracy 0.697 0.855, respectively. The corresponding metrics independent test 0.691 0.852, Overall, more accurate stable than significantly outperformed predictors, demonstrating effectiveness our proposed hybrid framework. Furthermore, expected assist community-wide efforts putative m5Cs formulate testable biological hypothesis.

Language: Английский

Citations

66

H2Opred: a robust and efficient hybrid deep learning model for predicting 2’-O-methylation sites in human RNA DOI Creative Commons
Nhat Truong Pham,

Rajan Rakkiyapan,

Jongsun Park

et al.

Briefings in Bioinformatics, Journal Year: 2023, Volume and Issue: 25(1)

Published: Nov. 22, 2023

Abstract 2’-O-methylation (2OM) is the most common post-transcriptional modification of RNA. It plays a crucial role in RNA splicing, stability and innate immunity. Despite advances high-throughput detection, chemical 2OM makes it difficult to detect map messenger Therefore, bioinformatics tools have been developed using machine learning (ML) algorithms identify sites. These made significant progress, but their performances remain unsatisfactory need further improvement. In this study, we introduced H2Opred, novel hybrid deep (HDL) model for accurately identifying sites human Notably, first application HDL developing four nucleotide-specific models [adenine (A2OM), cytosine (C2OM), guanine (G2OM) uracil (U2OM)] as well generic (N2OM). H2Opred incorporated both stacked 1D convolutional neural network (1D-CNN) blocks attention-based bidirectional gated recurrent unit (Bi-GRU-Att) blocks. 1D-CNN learned effective feature representations from 14 conventional descriptors, while Bi-GRU-Att five natural language processing-based embeddings extracted sequences. integrated these make final prediction. Rigorous cross-validation analysis demonstrated that consistently outperforms ML-based single-feature on different datasets. Moreover, remarkable performance training testing datasets, significantly outperforming existing predictor other models. To enhance accessibility usability, deployed user-friendly web server accessible at https://balalab-skku.org/H2Opred/. This platform will serve an invaluable tool predicting within RNA, thereby facilitating broader applications relevant research endeavors.

Language: Английский

Citations

25

GenoM7GNet: An Efficient N7-Methylguanosine Site Prediction Approach Based on a Nucleotide Language Model DOI
Chuang Li, Heshi Wang, Yanhua Wen

et al.

IEEE/ACM Transactions on Computational Biology and Bioinformatics, Journal Year: 2024, Volume and Issue: 21(6), P. 2258 - 2268

Published: Sept. 20, 2024

N

Language: Английский

Citations

13

Prediction of blood–brain barrier penetrating peptides based on data augmentation with Augur DOI Creative Commons
Zhi-Feng Gu,

Yu-Duo Hao,

Tianyu Wang

et al.

BMC Biology, Journal Year: 2024, Volume and Issue: 22(1)

Published: April 19, 2024

Abstract Background The blood–brain barrier serves as a critical interface between the bloodstream and brain tissue, mainly composed of pericytes, neurons, endothelial cells, tightly connected basal membranes. It plays pivotal role in safeguarding from harmful substances, thus protecting integrity nervous system preserving overall homeostasis. However, this remarkable selective transmission also poses formidable challenge realm central diseases treatment, hindering delivery large-molecule drugs into brain. In response to challenge, many researchers have devoted themselves developing drug systems capable breaching barrier. Among these, penetrating peptides emerged promising candidates. These had advantages high biosafety, ease synthesis, exceptional penetration efficiency, making them an effective solution. While previous studies developed few prediction models for peptides, their performance has often been hampered by issue limited positive data. Results study, we present Augur, novel model using borderline-SMOTE-based data augmentation machine learning. extract highly interpretable physicochemical properties while solving issues small sample size imbalance negative samples. Experimental results demonstrate superior Augur with AUC value 0.932 on training set 0.931 independent test set. Conclusions This newly demonstrates predicting offering valuable insights development targeting neurological disorders. breakthrough may enhance efficiency peptide-based discovery pave way innovative treatment strategies diseases.

Language: Английский

Citations

12

ac4C-AFL: A high-precision identification of human mRNA N4-acetylcytidine sites based on adaptive feature representation learning DOI Creative Commons
Nhat Truong Pham,

Annie Terrina Terrance,

Young-Jun Jeon

et al.

Molecular Therapy — Nucleic Acids, Journal Year: 2024, Volume and Issue: 35(2), P. 102192 - 102192

Published: April 24, 2024

RNA N4-acetylcytidine (ac4C) is a highly conserved modification that plays crucial role in controlling mRNA stability, processing, and translation. Consequently, accurate identification of ac4C sites across the genome critical for understanding gene expression regulation mechanisms. In this study, we have developed ac4C-AFL, bioinformatics tool precisely identifies from primary sequences. identified optimal sequence length model building implemented an adaptive feature representation strategy capable extracting most representative features RNA. To identify relevant features, proposed novel ensemble importance scoring to rank effectively. We then used information conduct sequential forward search, which individually determine set 16 sequence-derived descriptors. Utilizing these descriptors, constructed 176 baseline models using 11 popular classifiers. The efficient were two-step selection approach, whose predicted scores integrated trained with appropriate classifier develop final prediction model. Our rigorous cross-validations independent tests demonstrate ac4C-AFL surpasses contemporary tools predicting sites. Moreover, publicly accessible web server at https://balalab-skku.org/ac4C-AFL/.

Language: Английский

Citations

12

MIBPred: Ensemble Learning-Based Metal Ion-Binding Protein Classifier DOI Creative Commons
Hongqi Zhang,

Shanghua Liu,

Rui Li

et al.

ACS Omega, Journal Year: 2024, Volume and Issue: unknown

Published: Feb. 8, 2024

In biological organisms, metal ion-binding proteins participate in numerous metabolic activities and are closely associated with various diseases. To accurately predict whether a protein binds to ions the type of protein, this study proposed classifier named MIBPred. The incorporated advanced Word2Vec technology from field natural language processing extract semantic features sequence combined them position-specific score matrix (PSSM) features. Furthermore, an ensemble learning model was employed for classification task. model, we independently trained XGBoost, LightGBM, CatBoost algorithms integrated output results through SVM voting mechanism. This innovative combination has led significant breakthrough predictive performance our model. As result, achieved accuracies 95.13% 85.19%, respectively, predicting their types. Our research not only confirms effectiveness extracting information sequences but also highlights outstanding MIBPred problem provides reliable tool method in-depth exploration structure function proteins.

Language: Английский

Citations

9

N7-methylguanosine modification in cancers: from mechanisms to therapeutic potential DOI Creative Commons
Qihui Wu, Xiaodan Fu,

Guoqian Liu

et al.

Journal of Hematology & Oncology, Journal Year: 2025, Volume and Issue: 18(1)

Published: Jan. 29, 2025

N7-methylguanosine (m7G) is an important RNA modification involved in epigenetic regulation that commonly observed both prokaryotic and eukaryotic organisms. Their influence on the synthesis processing of messenger RNA, ribosomal transfer allows m7G modifications to affect diverse cellular, physiological, pathological processes. are pivotal human diseases, particularly cancer progression. On one hand, modification-associated modulate tumour progression malignant biological characteristics, including sustained proliferation signalling, resistance cell death, activation invasion metastasis, reprogramming energy metabolism, genome instability, immune evasion. This suggests they may be novel therapeutic targets for treatment. other aberrant expression molecules linked clinicopathological staging, lymph node unfavourable prognoses patients with cancer, indicating their potential as biomarkers. review consolidates discovery, identification, detection methodologies, functional roles modification, analysing mechanisms by which contribute development, exploring clinical applications diagnostics therapy, thereby providing innovative strategies identification targeted

Language: Английский

Citations

1

Advancing the accuracy of SARS-CoV-2 phosphorylation site detection via meta-learning approach DOI Creative Commons
Nhat Truong Pham,

Le Thi Phan,

Ji-Min Seo

et al.

Briefings in Bioinformatics, Journal Year: 2023, Volume and Issue: 25(1)

Published: Nov. 22, 2023

Abstract The worldwide appearance of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has generated significant concern and posed a considerable challenge to global health. Phosphorylation is common post-translational modification that affects many vital cellular functions closely associated with SARS-CoV-2 infection. Precise identification phosphorylation sites could provide more in-depth insight into the processes underlying infection help alleviate continuing COVID-19 crisis. Currently, available computational tools for predicting these lack accuracy effectiveness. In this study, we designed an innovative meta-learning model, Meta-Learning Serine/Threonine (MeL-STPhos), precisely identify protein sites. We initially performed comprehensive assessment 29 unique sequence-derived features, establishing prediction models each using 14 renowned machine learning methods, ranging from traditional classifiers advanced deep algorithms. then selected most effective model feature by integrating predicted values. Rigorous selection strategies were employed optimal base classifier(s) cell-specific dataset. To best our knowledge, first study report two generic site utilizing extensive range features Extensive cross-validation independent testing revealed MeL-STPhos surpasses existing state-of-the-art prediction. also developed publicly accessible platform at https://balalab-skku.org/MeL-STPhos. believe will serve as valuable tool accelerating discovery serine/threonine elucidating their role in regulation.

Language: Английский

Citations

20

Computational prediction and interpretation of druggable proteins using a stacked ensemble-learning framework DOI Creative Commons
Phasit Charoenkwan, Nalini Schaduangrat, Píetro Lió

et al.

iScience, Journal Year: 2022, Volume and Issue: 25(9), P. 104883 - 104883

Published: Aug. 5, 2022

Discovery of potential drugs requires rapid and precise identification drug targets. Although traditional experimental methodologies can accurately identify targets, they are time-consuming inappropriate for high-throughput screening. Computational approaches based on machine learning (ML) algorithms expedite the prediction druggable proteins; however, performance existing computational methods remains unsatisfactory. This study proposes a tool, SPIDER, to enhance accurate proteins. SPIDER employs various feature descriptors pertaining several aspects, including physicochemical properties, compositional information, composition-transition-distribution coupled with well-known ML facilitate construction final meta-predictor. The results showed that enabled more robust proteins than baseline models current in terms independent test dataset. An online web server was established made freely available online.

Language: Английский

Citations

23