Machine learning in RNA structure prediction: Advances and challenges DOI
Sicheng Zhang, Jun Li, Shi‐Jie Chen

et al.

Biophysical Journal, Journal Year: 2024, Volume and Issue: 123(17), P. 2647 - 2657

Published: Jan. 30, 2024

Language: Английский

Ankh ☥: Optimized Protein Language Model Unlocks General-Purpose Modelling DOI Creative Commons
Ahmed Elnaggar,

Hazem Essam,

Wafaa Salah-Eldin

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2023, Volume and Issue: unknown

Published: Jan. 18, 2023

Abstract As opposed to scaling-up protein language models (PLMs), we seek improving performance via protein-specific optimization. Although the proportionality between model size and richness of its learned representations is validated, prioritize accessibility pursue a path data-efficient, cost-reduced, knowledge-guided Through over twenty experiments ranging from masking, architecture, pre-training data, derive insights experimentation into building that interprets life, optimally. We present Ankh, first general-purpose PLM trained on Google’s TPU-v4 surpassing state-of-the-art with fewer parameters (<10% for pre-training, <7% inference, <30% embedding dimension). provide representative range structure function benchmarks where Ankh excels. further variant generation analysis High-N One-N input data scales succeeds in learning evolutionary conservation-mutation trends introducing functional diversity while retaining key structural-functional characteristics. dedicate our work promoting research innovation attainable resources.

Language: Английский

Citations

78

Recent Progress in the Discovery and Design of Antimicrobial Peptides Using Traditional Machine Learning and Deep Learning DOI Creative Commons
Jielu Yan, Jianxiu Cai, Bob Zhang

et al.

Antibiotics, Journal Year: 2022, Volume and Issue: 11(10), P. 1451 - 1451

Published: Oct. 21, 2022

Antimicrobial resistance has become a critical global health problem due to the abuse of conventional antibiotics and rise multi-drug-resistant microbes. peptides (AMPs) are group natural that show promise as next-generation their low toxicity host, broad spectrum biological activity, including antibacterial, antifungal, antiviral, anti-parasitic activities, great therapeutic potential, such anticancer, anti-inflammatory, etc. Most importantly, AMPs kill bacteria by damaging cell membranes using multiple mechanisms action rather than targeting single molecule or pathway, making it difficult for bacterial drug develop. However, experimental approaches used discover design new very expensive time-consuming. In recent years, there been considerable interest in silico methods, traditional machine learning (ML) deep (DL) approaches, discovery. While few papers summarizing computational AMP prediction none them focused on DL methods. this review, we aim survey latest methods achieved approaches. First, biology background is introduced, then various feature encoding represent features peptide sequences presented. We explain most popular techniques highlight works based classify novel sequences. Finally, discuss limitations challenges prediction.

Language: Английский

Citations

71

NLP techniques for automating responses to customer queries: a systematic review DOI Creative Commons
Peter Adebowale Olujimi, Abejide Ade-Ibijola

Discover Artificial Intelligence, Journal Year: 2023, Volume and Issue: 3(1)

Published: May 15, 2023

Abstract The demand for automated customer support approaches in customer-centric environments has increased significantly the past few years. Natural Language Processing (NLP) advancement enabled conversational AI to comprehend human language and respond enquiries from customers automatically independent of intervention humans. Customers can now access prompt responses NLP chatbots without interacting with agents. This application been implemented numerous business sectors, including banking, manufacturing, education, law, healthcare, among others. study reviewed earlier studies on automating queries using approaches. Using a systematic review methodology, 73 articles were analysed reputable digital resources. evaluated result offers an in-depth prior investigating use techniques service responses, details existing studies, benefits, potential future topics applications. implications results discussed and, recommendations made.

Language: Английский

Citations

58

Guiding questions to avoid data leakage in biological machine learning applications DOI
Judith Bernett, David B. Blumenthal, Dominik G. Grimm

et al.

Nature Methods, Journal Year: 2024, Volume and Issue: 21(8), P. 1444 - 1453

Published: Aug. 1, 2024

Language: Английский

Citations

25

Evaluating the advancements in protein language models for encoding strategies in protein function prediction: a comprehensive review DOI Creative Commons
Jiaying Chen, Jingfu Wang, Yue Hu

et al.

Frontiers in Bioengineering and Biotechnology, Journal Year: 2025, Volume and Issue: 13

Published: Jan. 21, 2025

Protein function prediction is crucial in several key areas such as bioinformatics and drug design. With the rapid progress of deep learning technology, applying protein language models has become a research focus. These utilize increasing amount large-scale sequence data to deeply mine its intrinsic semantic information, which can effectively improve accuracy prediction. This review comprehensively combines current status latest It provides an exhaustive performance comparison with traditional methods. Through in-depth analysis experimental results, significant advantages enhancing depth tasks are fully demonstrated.

Language: Английский

Citations

3

A novel antibacterial peptide recognition algorithm based on BERT DOI
Yue Zhang,

Jianyuan Lin,

L.M. Zhao

et al.

Briefings in Bioinformatics, Journal Year: 2021, Volume and Issue: 22(6)

Published: May 5, 2021

Abstract As the best substitute for antibiotics, antimicrobial peptides (AMPs) have important research significance. Due to high cost and difficulty of experimental methods identifying AMPs, more researches are focused on using computational solve this problem. Most existing calculation can identify AMPs through sequence itself, but there is still room improvement in recognition accuracy, a problem that constructed model cannot be universal each dataset. The pre-training strategy has been applied many tasks natural language processing (NLP) achieved gratifying results. It also great application prospects field AMP prediction. In paper, we apply training classifiers propose novel algorithm. Our based BERT model, pre-trained with protein data from UniProt, then fine-tuned evaluated six datasets large differences. superior achieves goal accurate identification small sample size. We try different word segmentation peptide chains prove influence steps balancing effect. find number diverse data, followed by fine-tuning new beneficial capturing both data’s specific features common between sequences. Finally, construct dataset, which train general model.

Language: Английский

Citations

75

Transformer models used for text-based question answering systems DOI
Khalid Nassiri, Moulay A. Akhloufi

Applied Intelligence, Journal Year: 2022, Volume and Issue: 53(9), P. 10602 - 10635

Published: Aug. 20, 2022

Language: Английский

Citations

70

TMbed: transmembrane proteins predicted through language model embeddings DOI Creative Commons
Michael Bernhofer, Burkhard Rost

BMC Bioinformatics, Journal Year: 2022, Volume and Issue: 23(1)

Published: Aug. 8, 2022

Despite the immense importance of transmembrane proteins (TMP) for molecular biology and medicine, experimental 3D structures TMPs remain about 4-5 times underrepresented compared to non-TMPs. Today's top methods such as AlphaFold2 accurately predict many TMPs, but annotating regions remains a limiting step proteome-wide predictions.

Language: Английский

Citations

57

Novel machine learning approaches revolutionize protein knowledge DOI Creative Commons
Nicola Bordin, Christian Dallago, Michael Heinzinger

et al.

Trends in Biochemical Sciences, Journal Year: 2022, Volume and Issue: 48(4), P. 345 - 359

Published: Dec. 9, 2022

Breakthrough methods in machine learning (ML), protein structure prediction, and novel ultrafast structural aligners are revolutionizing biology. Obtaining accurate models of proteins annotating their functions on a large scale is no longer limited by time resources. The most recent method to be top ranked the Critical Assessment Structure Prediction (CASP) assessment, AlphaFold 2 (AF2), capable building with an accuracy comparable that experimental structures. Annotations 3D keeping pace deposition structures due advancements language (pLMs) help validate these transferred annotations. In this review we describe how developments ML for science making large-scale bioinformatics available general scientific community.

Language: Английский

Citations

49

Multiple sequence alignment-based RNA language model and its application to structural inference DOI Creative Commons

Yikun Zhang,

Mei Lang,

Jiuhong Jiang

et al.

Nucleic Acids Research, Journal Year: 2023, Volume and Issue: 52(1), P. e3 - e3

Published: Nov. 6, 2023

Compared with proteins, DNA and RNA are more difficult languages to interpret because four-letter coded DNA/RNA sequences have less information content than 20-letter protein sequences. While BERT (Bidirectional Encoder Representations from Transformers)-like language models been developed for RNA, they ineffective at capturing the evolutionary homologous unlike conserved. Here, we an unsupervised multiple sequence alignment-based model (RNA-MSM) by utilizing automatic pipeline, RNAcmap, as it can provide significantly manually annotated Rfam. We demonstrate that resulting unsupervised, two-dimensional attention maps one-dimensional embeddings RNA-MSM contain structural information. In fact, be directly mapped high accuracy 2D base pairing probabilities 1D solvent accessibilities, respectively. Further fine-tuning led improved performance on these two downstream tasks compared existing state-of-the-art techniques including SPOT-RNA2 RNAsnap2. By comparison, RNA-FM, a BERT-based model, performs worse one-hot encoding its embedding in pair solvent-accessible surface area prediction. anticipate pre-trained fine-tuned many other related structure function.

Language: Английский

Citations

38