Biophysical Journal, Journal Year: 2024, Volume and Issue: 123(17), P. 2647 - 2657
Published: Jan. 30, 2024
Language: Английский
Biophysical Journal, Journal Year: 2024, Volume and Issue: 123(17), P. 2647 - 2657
Published: Jan. 30, 2024
Language: Английский
bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2023, Volume and Issue: unknown
Published: Jan. 18, 2023
Abstract As opposed to scaling-up protein language models (PLMs), we seek improving performance via protein-specific optimization. Although the proportionality between model size and richness of its learned representations is validated, prioritize accessibility pursue a path data-efficient, cost-reduced, knowledge-guided Through over twenty experiments ranging from masking, architecture, pre-training data, derive insights experimentation into building that interprets life, optimally. We present Ankh, first general-purpose PLM trained on Google’s TPU-v4 surpassing state-of-the-art with fewer parameters (<10% for pre-training, <7% inference, <30% embedding dimension). provide representative range structure function benchmarks where Ankh excels. further variant generation analysis High-N One-N input data scales succeeds in learning evolutionary conservation-mutation trends introducing functional diversity while retaining key structural-functional characteristics. dedicate our work promoting research innovation attainable resources.
Language: Английский
Citations
78Antibiotics, Journal Year: 2022, Volume and Issue: 11(10), P. 1451 - 1451
Published: Oct. 21, 2022
Antimicrobial resistance has become a critical global health problem due to the abuse of conventional antibiotics and rise multi-drug-resistant microbes. peptides (AMPs) are group natural that show promise as next-generation their low toxicity host, broad spectrum biological activity, including antibacterial, antifungal, antiviral, anti-parasitic activities, great therapeutic potential, such anticancer, anti-inflammatory, etc. Most importantly, AMPs kill bacteria by damaging cell membranes using multiple mechanisms action rather than targeting single molecule or pathway, making it difficult for bacterial drug develop. However, experimental approaches used discover design new very expensive time-consuming. In recent years, there been considerable interest in silico methods, traditional machine learning (ML) deep (DL) approaches, discovery. While few papers summarizing computational AMP prediction none them focused on DL methods. this review, we aim survey latest methods achieved approaches. First, biology background is introduced, then various feature encoding represent features peptide sequences presented. We explain most popular techniques highlight works based classify novel sequences. Finally, discuss limitations challenges prediction.
Language: Английский
Citations
71Discover Artificial Intelligence, Journal Year: 2023, Volume and Issue: 3(1)
Published: May 15, 2023
Abstract The demand for automated customer support approaches in customer-centric environments has increased significantly the past few years. Natural Language Processing (NLP) advancement enabled conversational AI to comprehend human language and respond enquiries from customers automatically independent of intervention humans. Customers can now access prompt responses NLP chatbots without interacting with agents. This application been implemented numerous business sectors, including banking, manufacturing, education, law, healthcare, among others. study reviewed earlier studies on automating queries using approaches. Using a systematic review methodology, 73 articles were analysed reputable digital resources. evaluated result offers an in-depth prior investigating use techniques service responses, details existing studies, benefits, potential future topics applications. implications results discussed and, recommendations made.
Language: Английский
Citations
58Nature Methods, Journal Year: 2024, Volume and Issue: 21(8), P. 1444 - 1453
Published: Aug. 1, 2024
Language: Английский
Citations
25Frontiers in Bioengineering and Biotechnology, Journal Year: 2025, Volume and Issue: 13
Published: Jan. 21, 2025
Protein function prediction is crucial in several key areas such as bioinformatics and drug design. With the rapid progress of deep learning technology, applying protein language models has become a research focus. These utilize increasing amount large-scale sequence data to deeply mine its intrinsic semantic information, which can effectively improve accuracy prediction. This review comprehensively combines current status latest It provides an exhaustive performance comparison with traditional methods. Through in-depth analysis experimental results, significant advantages enhancing depth tasks are fully demonstrated.
Language: Английский
Citations
3Briefings in Bioinformatics, Journal Year: 2021, Volume and Issue: 22(6)
Published: May 5, 2021
Abstract As the best substitute for antibiotics, antimicrobial peptides (AMPs) have important research significance. Due to high cost and difficulty of experimental methods identifying AMPs, more researches are focused on using computational solve this problem. Most existing calculation can identify AMPs through sequence itself, but there is still room improvement in recognition accuracy, a problem that constructed model cannot be universal each dataset. The pre-training strategy has been applied many tasks natural language processing (NLP) achieved gratifying results. It also great application prospects field AMP prediction. In paper, we apply training classifiers propose novel algorithm. Our based BERT model, pre-trained with protein data from UniProt, then fine-tuned evaluated six datasets large differences. superior achieves goal accurate identification small sample size. We try different word segmentation peptide chains prove influence steps balancing effect. find number diverse data, followed by fine-tuning new beneficial capturing both data’s specific features common between sequences. Finally, construct dataset, which train general model.
Language: Английский
Citations
75Applied Intelligence, Journal Year: 2022, Volume and Issue: 53(9), P. 10602 - 10635
Published: Aug. 20, 2022
Language: Английский
Citations
70BMC Bioinformatics, Journal Year: 2022, Volume and Issue: 23(1)
Published: Aug. 8, 2022
Despite the immense importance of transmembrane proteins (TMP) for molecular biology and medicine, experimental 3D structures TMPs remain about 4-5 times underrepresented compared to non-TMPs. Today's top methods such as AlphaFold2 accurately predict many TMPs, but annotating regions remains a limiting step proteome-wide predictions.
Language: Английский
Citations
57Trends in Biochemical Sciences, Journal Year: 2022, Volume and Issue: 48(4), P. 345 - 359
Published: Dec. 9, 2022
Breakthrough methods in machine learning (ML), protein structure prediction, and novel ultrafast structural aligners are revolutionizing biology. Obtaining accurate models of proteins annotating their functions on a large scale is no longer limited by time resources. The most recent method to be top ranked the Critical Assessment Structure Prediction (CASP) assessment, AlphaFold 2 (AF2), capable building with an accuracy comparable that experimental structures. Annotations 3D keeping pace deposition structures due advancements language (pLMs) help validate these transferred annotations. In this review we describe how developments ML for science making large-scale bioinformatics available general scientific community.
Language: Английский
Citations
49Nucleic Acids Research, Journal Year: 2023, Volume and Issue: 52(1), P. e3 - e3
Published: Nov. 6, 2023
Compared with proteins, DNA and RNA are more difficult languages to interpret because four-letter coded DNA/RNA sequences have less information content than 20-letter protein sequences. While BERT (Bidirectional Encoder Representations from Transformers)-like language models been developed for RNA, they ineffective at capturing the evolutionary homologous unlike conserved. Here, we an unsupervised multiple sequence alignment-based model (RNA-MSM) by utilizing automatic pipeline, RNAcmap, as it can provide significantly manually annotated Rfam. We demonstrate that resulting unsupervised, two-dimensional attention maps one-dimensional embeddings RNA-MSM contain structural information. In fact, be directly mapped high accuracy 2D base pairing probabilities 1D solvent accessibilities, respectively. Further fine-tuning led improved performance on these two downstream tasks compared existing state-of-the-art techniques including SPOT-RNA2 RNAsnap2. By comparison, RNA-FM, a BERT-based model, performs worse one-hot encoding its embedding in pair solvent-accessible surface area prediction. anticipate pre-trained fine-tuned many other related structure function.
Language: Английский
Citations
38