Biophysical Journal, Год журнала: 2024, Номер 123(17), С. 2647 - 2657
Опубликована: Янв. 30, 2024
Язык: Английский
Biophysical Journal, Год журнала: 2024, Номер 123(17), С. 2647 - 2657
Опубликована: Янв. 30, 2024
Язык: Английский
Nucleic Acids Research, Год журнала: 2023, Номер 52(1), С. e3 - e3
Опубликована: Ноя. 6, 2023
Compared with proteins, DNA and RNA are more difficult languages to interpret because four-letter coded DNA/RNA sequences have less information content than 20-letter protein sequences. While BERT (Bidirectional Encoder Representations from Transformers)-like language models been developed for RNA, they ineffective at capturing the evolutionary homologous unlike conserved. Here, we an unsupervised multiple sequence alignment-based model (RNA-MSM) by utilizing automatic pipeline, RNAcmap, as it can provide significantly manually annotated Rfam. We demonstrate that resulting unsupervised, two-dimensional attention maps one-dimensional embeddings RNA-MSM contain structural information. In fact, be directly mapped high accuracy 2D base pairing probabilities 1D solvent accessibilities, respectively. Further fine-tuning led improved performance on these two downstream tasks compared existing state-of-the-art techniques including SPOT-RNA2 RNAsnap2. By comparison, RNA-FM, a BERT-based model, performs worse one-hot encoding its embedding in pair solvent-accessible surface area prediction. anticipate pre-trained fine-tuned many other related structure function.
Язык: Английский
Процитировано
38PROTEOMICS, Год журнала: 2023, Номер 23(23-24)
Опубликована: Июнь 29, 2023
Abstract In recent years, the rapid growth of biological data has increased interest in using bioinformatics to analyze and interpret this data. Proteomics, which studies structure, function, interactions proteins, is a crucial area bioinformatics. Using natural language processing (NLP) techniques proteomics an emerging field that combines machine learning text mining Recently, transformer‐based NLP models have gained significant attention for their ability process variable‐length input sequences parallel, self‐attention mechanisms capture long‐range dependencies. review paper, we discuss advancements proteome examine advantages, limitations, potential applications improve accuracy efficiency various tasks. Additionally, highlight challenges future directions these research. Overall, provides valuable insights into revolutionize
Язык: Английский
Процитировано
37Nature Machine Intelligence, Год журнала: 2023, Номер 5(5), С. 485 - 496
Опубликована: Апрель 6, 2023
Язык: Английский
Процитировано
35Nature Communications, Год журнала: 2024, Номер 15(1)
Опубликована: Май 22, 2024
Abstract Phages are increasingly considered promising alternatives to target drug-resistant bacterial pathogens. However, their often-narrow host range can make it challenging find matching phages against bacteria of interest. Current computational tools do not accurately predict interactions at the strain level in a way that is relevant and properly evaluated for practical use. We present PhageHostLearn, machine learning system predicts strain-level between receptor-binding proteins receptors Klebsiella phage-bacteria pairs. evaluate this both silico laboratory, clinically setting finding strains. PhageHostLearn reaches cross-validated ROC AUC up 81.8% maintains performance laboratory validation. Our approach provides framework developing evaluating phage-host prediction methods useful practice, which we believe be meaningful contribution machine-learning-guided development phage therapeutics diagnostics.
Язык: Английский
Процитировано
16Biophysical Journal, Год журнала: 2024, Номер 123(17), С. 2647 - 2657
Опубликована: Янв. 30, 2024
Язык: Английский
Процитировано
10