Computational scoring and experimental evaluation of enzymes generated by neural networks DOI Creative Commons
Sean R. Johnson, Xiaozhi Fu, Sandra Viknander

и другие.

Nature Biotechnology, Год журнала: 2024, Номер unknown

Опубликована: Апрель 23, 2024

In recent years, generative protein sequence models have been developed to sample novel sequences. However, predicting whether generated proteins will fold and function remains challenging. We evaluate a set of 20 diverse computational metrics assess the quality enzyme sequences produced by three contrasting models: ancestral reconstruction, adversarial network language model. Focusing on two families, we expressed purified over 500 natural with 70-90% identity most similar benchmark for in vitro activity. Over rounds experiments, filter that improved rate experimental success 50-150%. The proposed drive engineering research serving as helping select active variants testing.

Язык: Английский

Large language models improve annotation of prokaryotic viral proteins DOI
Zachary Flamholz, Steven J. Biller, Libusha Kelly

и другие.

Nature Microbiology, Год журнала: 2024, Номер 9(2), С. 537 - 549

Опубликована: Янв. 29, 2024

Язык: Английский

Процитировано

34

Multimodal Large Language Models in Healthcare: Applications, Challenges, and Future Outlook (Preprint) DOI Creative Commons
Rawan AlSaad, Alaa Abd‐Alrazaq, Sabri Boughorbel

и другие.

Journal of Medical Internet Research, Год журнала: 2024, Номер 26, С. e59505 - e59505

Опубликована: Авг. 20, 2024

In the complex and multidimensional field of medicine, multimodal data are prevalent crucial for informed clinical decisions. Multimodal span a broad spectrum types, including medical images (eg, MRI CT scans), time-series sensor from wearable devices electronic health records), audio recordings heart respiratory sounds patient interviews), text notes research articles), videos surgical procedures), omics genomics proteomics). While advancements in large language models (LLMs) have enabled new applications knowledge retrieval processing field, most LLMs remain limited to unimodal data, typically text-based content, often overlook importance integrating diverse modalities encountered practice. This paper aims present detailed, practical, solution-oriented perspective on use (M-LLMs) field. Our investigation spanned M-LLM foundational principles, current potential applications, technical ethical challenges, future directions. By connecting these elements, we aimed provide comprehensive framework that links aspects M-LLMs, offering unified vision their care. approach guide both practical implementations M-LLMs care, positioning them as paradigm shift toward integrated, data–driven We anticipate this work will spark further discussion inspire development innovative approaches next generation systems.

Язык: Английский

Процитировано

31

Rapid in silico directed evolution by a protein language model with EVOLVEpro DOI
Kaiyi Jiang, Zhaoqing Yan, Matteo Di Bernardo

и другие.

Science, Год журнала: 2024, Номер unknown

Опубликована: Ноя. 21, 2024

Directed protein evolution is central to biomedical applications but faces challenges like experimental complexity, inefficient multi-property optimization, and local maxima traps. While in silico methods using language models (PLMs) can provide modeled fitness landscape guidance, they struggle generalize across diverse families map activity. We present EVOLVEpro, a few-shot active learning framework that combines PLMs regression rapidly improve EVOLVEpro surpasses current methods, yielding up 100-fold improvements desired properties. demonstrate its effectiveness six proteins RNA production, genome editing, antibody binding applications. These results highlight the advantages of with minimal data over zero-shot predictions. opens new possibilities for AI-guided engineering biology medicine.

Язык: Английский

Процитировано

29

Feature Reuse and Scaling: Understanding Transfer Learning with Protein Language Models DOI Creative Commons
Francesca-Zhoufan Li, Ava P. Amini, Yisong Yue

и другие.

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2024, Номер unknown

Опубликована: Фев. 8, 2024

Abstract Large pretrained protein language models (PLMs) have improved property and structure prediction from sequences via transfer learning, in which weights representations PLMs are repurposed for downstream tasks. Although shown great promise, currently there is little understanding of how the features learned by pretraining relate to useful We perform a systematic analysis learning using PLMs, conducting 370 experiments across comprehensive suite factors including different tasks, architectures, model sizes, depths, time. observe that while almost all down-stream tasks do benefit compared naive sequence representations, majority performance does not scale with pretraining, instead relies on low-level early pretraining. Our results point mismatch between current PLM paradigms most applications these models, indicating need better methods.

Язык: Английский

Процитировано

27

Computational scoring and experimental evaluation of enzymes generated by neural networks DOI Creative Commons
Sean R. Johnson, Xiaozhi Fu, Sandra Viknander

и другие.

Nature Biotechnology, Год журнала: 2024, Номер unknown

Опубликована: Апрель 23, 2024

In recent years, generative protein sequence models have been developed to sample novel sequences. However, predicting whether generated proteins will fold and function remains challenging. We evaluate a set of 20 diverse computational metrics assess the quality enzyme sequences produced by three contrasting models: ancestral reconstruction, adversarial network language model. Focusing on two families, we expressed purified over 500 natural with 70-90% identity most similar benchmark for in vitro activity. Over rounds experiments, filter that improved rate experimental success 50-150%. The proposed drive engineering research serving as helping select active variants testing.

Язык: Английский

Процитировано

27