Research progress of reduced amino acid alphabets in protein analysis and prediction DOI Creative Commons

Yuchao Liang,

Siqi Yang, Lei Zheng

и другие.

Computational and Structural Biotechnology Journal, Год журнала: 2022, Номер 20, С. 3503 - 3510

Опубликована: Янв. 1, 2022

Proteins are the executors of cellular physiological activities, and accurate structural function elucidation crucial for refined mapping proteins. As a feature engineering method, reduction amino acid composition is not only an important method protein structure analysis, but also opens broad horizon complex field machine learning. Representing sequences with fewer types greatly reduces complexity noise traditional in dimension, provides more interpretable predictive models learning to capture key features. In this paper, we systematically reviewed strategy studies reduced (RAA) alphabets, summarized its main research sequence alignment, functional classification, prediction properties, respectively. end, gave comprehensive analysis 672 RAA alphabets from 74 methods.

Язык: Английский

A First Computational Frame for Recognizing Heparin-Binding Protein DOI Creative Commons
Wen Zhu, Shi-Shi Yuan, Jian Li

и другие.

Diagnostics, Год журнала: 2023, Номер 13(14), С. 2465 - 2465

Опубликована: Июль 24, 2023

Heparin-binding protein (HBP) is a cationic antibacterial derived from multinuclear neutrophils and an important biomarker of infectious diseases. The correct identification HBP great significance to the study This work provides first recognition framework based on machine learning accurately identify HBP. By using four sequence descriptors, non-HBP samples were represented by discrete numbers. inputting these features into support vector (SVM) random forest (RF) algorithm comparing prediction performances methods training data independent test data, it found that SVM-based classifier has greatest potential model could produce auROC 0.981 ± 0.028 10-fold cross-validation overall accuracy 95.0% data. As for recognition, will provide some help diseases stimulate further research in related fields.

Язык: Английский

Процитировано

69

pAtbP-EnC: Identifying Anti-Tubercular Peptides Using Multi-Feature Representation and Genetic Algorithm-Based Deep Ensemble Model DOI Creative Commons
Shahid Akbar, Ali Raza, Tamara Al Shloul

и другие.

IEEE Access, Год журнала: 2023, Номер 11, С. 137099 - 137114

Опубликована: Янв. 1, 2023

Mycobacterium tuberculosis, a highly perilous pathogen in humans, serves as the causative agent of tuberculosis (TB), affecting nearly 33% global population. With increasing prevalence multidrug-resistant TB, there is needs for novel and efficacious alternative therapies. Peptide therapies have emerged favorable due to its remarkable specificity targeting effected cells without effecting healthy cells. However, experimental identification anti-tubercular peptides (AtbPs) labor-intensive costly. Therefore, accurate prediction AtbPs has become challenging large number peptide samples. In this paper, we propose an ensemble learning model enhance outcomes by addressing limitations individual models. We formulate training samples utilizing four distinct representation methods: AAindex, Composition/Transition/Distribution, Dipeptide Deviation from Expected Mean, Enhanced Grouped Amino Acid Composition numerically encode The feature vectors extracted these methods are fused develop compact vector. evaluate rates using three different classification models, employing both heterogeneous vectors. Furthermore, capabilities proposed predicted labels classifiers implementing deep via genetic algorithm. Through evaluation on datasets independent datasets, our learner achieves impressive accuracies 97.80%, 95.13%, 93.91%, 94.17%, RD training, MD independent, respectively. Our findings demonstrate that pAtbP-EnC outperforms existing predictors reporting approximately 11% higher accuracy. conclude predictor will be considerable tool field pharmaceutical design research academia. used source code publicly available at https://github.com/Intelligent-models/pAtbP-EnC2023.

Язык: Английский

Процитировано

54

Immune cell infiltration-based signature for prognosis and immunogenomic analysis in breast cancer DOI
Shiyuan Wang, Qi Zhang,

Chunlu Yu

и другие.

Briefings in Bioinformatics, Год журнала: 2020, Номер 22(2), С. 2020 - 2031

Опубликована: Фев. 17, 2020

Abstract Breast cancer is one of the most human malignant diseases and leading cause cancer-related death in world. However, prognostic therapeutic benefits breast patients cannot be predicted accurately by current stratifying system. In this study, an immune-related score was established 22 cohorts with a total 6415 samples. An extensive immunogenomic analysis conducted to explore relationships between immune score, significance, infiltrating cells, genotypes potential escape mechanisms. Our revealed that promising biomarker for estimating overall survival cancer. This associated important immunophenotypic factors, such as mutation load. Further high scores exhibited from chemotherapy immunotherapy. Based on these results, we can conclude may useful tool prediction treatment guidance

Язык: Английский

Процитировано

130

Deep-AntiFP: Prediction of antifungal peptides using distanct multi-informative features incorporating with deep neural networks DOI
Ashfaq Ahmad, Shahid Akbar, Salman Khan

и другие.

Chemometrics and Intelligent Laboratory Systems, Год журнала: 2020, Номер 208, С. 104214 - 104214

Опубликована: Дек. 1, 2020

Язык: Английский

Процитировано

85

Computational identification of N6-methyladenosine sites in multiple tissues of mammals DOI Creative Commons
Fanny Dao, Hao Lv, Yuhe R. Yang

и другие.

Computational and Structural Biotechnology Journal, Год журнала: 2020, Номер 18, С. 1084 - 1091

Опубликована: Янв. 1, 2020

N6-methyladenosine (m6A) is the methylation of adenosine at nitrogen-6 position, which most abundant RNA modification and involves a series important biological processes. Accurate identification m6A sites in genome-wide invaluable for better understanding their functions. In this work, an ensemble predictor named iRNA-m6A was established to identify multiple tissues human, mouse rat based on data from high-throughput sequencing techniques. proposed predictor, sequences were encoded by physical-chemical property matrix, mono-nucleotide binary encoding nucleotide chemical property. Subsequently, these features optimized using minimum Redundancy Maximum Relevance (mRMR) feature selection method. Based optimal subset, best classification models trained Support Vector Machine (SVM) with 5-fold cross-validation test. Prediction results independent dataset showed that our method could produce excellent generalization ability. We also user-friendly webserver called can be freely accessible http://lin-group.cn/server/iRNA-m6A. This tool will provide more convenience users studying different tissues.

Язык: Английский

Процитировано

84

Deep-4mCW2V: A sequence-based predictor to identify N4-methylcytosine sites in Escherichia coli DOI
Hasan Zulfiqar,

Zi‐Jie Sun,

Qin-Lai Huang

и другие.

Methods, Год журнала: 2021, Номер 203, С. 558 - 563

Опубликована: Авг. 3, 2021

Язык: Английский

Процитировано

54

Application of Machine Learning for Drug–Target Interaction Prediction DOI Creative Commons

Lei Xu,

Xiaoqing Ru, Rong Song

и другие.

Frontiers in Genetics, Год журнала: 2021, Номер 12

Опубликована: Июнь 21, 2021

Exploring drug–target interactions by biomedical experiments requires a lot of human, financial, and material resources. To save time cost to meet the needs present generation, machine learning methods have been introduced into prediction interactions. The large amount available drug target data in existing databases, evolving innovative computer technologies, inherent characteristics various types made techniques mainstream method for interaction research. In this review, details specific applications are summarized, each algorithm analyzed, issues that need be further addressed explored future research discussed. aim review is provide sound basis construction high-performance models.

Язык: Английский

Процитировано

50

Identification of cyclin protein using gradient boost decision tree algorithm DOI Creative Commons
Hasan Zulfiqar, Shi-Shi Yuan, Qin-Lai Huang

и другие.

Computational and Structural Biotechnology Journal, Год журнала: 2021, Номер 19, С. 4123 - 4131

Опубликована: Янв. 1, 2021

Cyclin proteins are capable to regulate the cell cycle by forming a complex with cyclin-dependent kinases activate cycle. Correct recognition of cyclin could provide key clues for studying their functions. However, sequences share low similarity, which results in poor prediction sequence similarity-based methods. Thus, it is urgent construct machine learning model identify proteins. This study aimed develop computational discriminate from non-cyclin In our model, protein were encoded seven kinds features that amino acid composition, composition k-spaced pairs, tri peptide pseudo geary correlation, normalized moreau-broto autocorrelation and composition/transition/distribution. Afterward, these optimized using analysis variance (ANOVA) minimum redundancy maximum relevance (mRMR) incremental feature selection (IFS) technique. A gradient boost decision tree (GBDT) classifier was trained on optimal features. Five-fold cross-validated showed would cyclins an accuracy 93.06% AUC value 0.971, higher than two recent studies same data.

Язык: Английский

Процитировано

50

RFhy-m2G: Identification of RNA N2-methylguanosine modification sites based on random forest and hybrid features DOI
Chunyan Ao, Quan Zou, Liang Yu

и другие.

Methods, Год журнала: 2021, Номер 203, С. 32 - 39

Опубликована: Май 24, 2021

Язык: Английский

Процитировано

43

Towards a better prediction of subcellular location of long non-coding RNA DOI
Zhao‐Yue Zhang,

Zi‐Jie Sun,

Yuhe Yang

и другие.

Frontiers of Computer Science, Год журнала: 2022, Номер 16(5)

Опубликована: Янв. 4, 2022

Язык: Английский

Процитировано

35