Machine learning in RNA structure prediction: Advances and challenges DOI
Sicheng Zhang, Jun Li, Shi‐Jie Chen

и другие.

Biophysical Journal, Год журнала: 2024, Номер 123(17), С. 2647 - 2657

Опубликована: Янв. 30, 2024

Язык: Английский

Multiple sequence alignment-based RNA language model and its application to structural inference DOI Creative Commons

Yikun Zhang,

Mei Lang,

Jiuhong Jiang

и другие.

Nucleic Acids Research, Год журнала: 2023, Номер 52(1), С. e3 - e3

Опубликована: Ноя. 6, 2023

Compared with proteins, DNA and RNA are more difficult languages to interpret because four-letter coded DNA/RNA sequences have less information content than 20-letter protein sequences. While BERT (Bidirectional Encoder Representations from Transformers)-like language models been developed for RNA, they ineffective at capturing the evolutionary homologous unlike conserved. Here, we an unsupervised multiple sequence alignment-based model (RNA-MSM) by utilizing automatic pipeline, RNAcmap, as it can provide significantly manually annotated Rfam. We demonstrate that resulting unsupervised, two-dimensional attention maps one-dimensional embeddings RNA-MSM contain structural information. In fact, be directly mapped high accuracy 2D base pairing probabilities 1D solvent accessibilities, respectively. Further fine-tuning led improved performance on these two downstream tasks compared existing state-of-the-art techniques including SPOT-RNA2 RNAsnap2. By comparison, RNA-FM, a BERT-based model, performs worse one-hot encoding its embedding in pair solvent-accessible surface area prediction. anticipate pre-trained fine-tuned many other related structure function.

Язык: Английский

Процитировано

38

Leveraging transformers‐based language models in proteome bioinformatics DOI
Nguyen Quoc Khanh Le

PROTEOMICS, Год журнала: 2023, Номер 23(23-24)

Опубликована: Июнь 29, 2023

Abstract In recent years, the rapid growth of biological data has increased interest in using bioinformatics to analyze and interpret this data. Proteomics, which studies structure, function, interactions proteins, is a crucial area bioinformatics. Using natural language processing (NLP) techniques proteomics an emerging field that combines machine learning text mining Recently, transformer‐based NLP models have gained significant attention for their ability process variable‐length input sequences parallel, self‐attention mechanisms capture long‐range dependencies. review paper, we discuss advancements proteome examine advantages, limitations, potential applications improve accuracy efficiency various tasks. Additionally, highlight challenges future directions these research. Overall, provides valuable insights into revolutionize

Язык: Английский

Процитировано

37

Linguistically inspired roadmap for building biologically reliable protein language models DOI
Mai Ha Vu, Rahmad Akbar, Philippe A. Robert

и другие.

Nature Machine Intelligence, Год журнала: 2023, Номер 5(5), С. 485 - 496

Опубликована: Апрель 6, 2023

Язык: Английский

Процитировано

35

Prediction of Klebsiella phage-host specificity at the strain level DOI Creative Commons
Dimitri Boeckaerts, Michiel Stock, Celia Ferriol-González

и другие.

Nature Communications, Год журнала: 2024, Номер 15(1)

Опубликована: Май 22, 2024

Abstract Phages are increasingly considered promising alternatives to target drug-resistant bacterial pathogens. However, their often-narrow host range can make it challenging find matching phages against bacteria of interest. Current computational tools do not accurately predict interactions at the strain level in a way that is relevant and properly evaluated for practical use. We present PhageHostLearn, machine learning system predicts strain-level between receptor-binding proteins receptors Klebsiella phage-bacteria pairs. evaluate this both silico laboratory, clinically setting finding strains. PhageHostLearn reaches cross-validated ROC AUC up 81.8% maintains performance laboratory validation. Our approach provides framework developing evaluating phage-host prediction methods useful practice, which we believe be meaningful contribution machine-learning-guided development phage therapeutics diagnostics.

Язык: Английский

Процитировано

16

Machine learning in RNA structure prediction: Advances and challenges DOI
Sicheng Zhang, Jun Li, Shi‐Jie Chen

и другие.

Biophysical Journal, Год журнала: 2024, Номер 123(17), С. 2647 - 2657

Опубликована: Янв. 30, 2024

Язык: Английский

Процитировано

10