Unlocking the Capabilities of Large Language Models for Accelerating Drug Development DOI Creative Commons

Wes Anderson,

Ian Braun,

Roopal Bhatnagar

и другие.

Clinical Pharmacology & Therapeutics, Год журнала: 2024, Номер 116(1), С. 38 - 41

Опубликована: Апрель 22, 2024

Recent breakthroughs in natural language processing (NLP), particularly large models (LLMs), offer substantial advantages model-informed drug development (MIDD). With billions of parameters and comprehensive pre-training on diverse data, these effectively extract information from unstructured structured data throughout the lifecycle. This perspective envisions LLMs supporting MIDD, enhancing development, emphasizes C-Path's strategic use LLM innovations for actionable real-world evidence (RWD). Previously, our work has succinctly summarized potential opportunities implementation NLP emphasizing areas improvement smaller scale models.1 advancements showcase unique abilities that are not present (Figure 1).2 One is known as prompt engineering, referring to ability an complete a task through response based without further training both zero- or few-shot settings. In zero-shot setting, model learns entirely new being given specific examples, while learning (or in-context [ICL]) involves presenting with few demonstrations ("shots"). The can then produce anticipated output test instances by completing input text sequence need gradient updates. Prompting methods involve providing intermediate reasoning steps make up path depicting how solve problem (i.e., chain-of-thought [COT] prompting), well multiple combinatorial space Tree Thought [TOT] have also been developed improve performance prompting setting. Considerations such clarity, specificity, domain understanding, context topic, iterative evaluation adjustment crucial engineering. Using techniques effective principles, along ICL, COT, TOT, similar methods, enable single be reused across various tasks (including MIDD) minimal adaptation retraining needed different tasks, enabling rapid extensive task-specific datasets. C-Path explored utility curation pipelines, especially relevant downstream decisions free-text semi-structured fields. Unlike traditional biomedical pipelines entities relationships clinical notes fine-tuning volumes annotated zero-or-few shot question answering configurable templated prompts tuning.3 leveraging this property simple templates target structures perform arbitrary cleaning knowledge extraction tasks. Recently, approach was demonstrated defining automating "on-the-fly,"4 redirecting time more code-intensive data-wrangling mapping approaches manual review revision LLM-enabled automated processes. Leveraging engineering define LLM-assisted dynamically schemas extends discovery process. Large excel translating inquiries into queries conform defined setting retrieve graphs tabular databases. We anticipate technique (e.g., identifying datasets research questions). Within C-Path, another opportunity synthetic textual generation, where GPT-3.5 strategically generate representative content structure desired end point ICL several settings, including weak supervision5). enhances effectiveness MIDD settings), reducing burden human annotation collection supervised fine-tuning.3, 5 future within related lifecycle biomarker named entity recognition assertion status detection using EHR LLMs) 2) presents significant opportunities. massive size many popular prohibits their operation desktop computers. Furthermore, proprietary nature accessible only via paid API hinder reproducibility, incur costs, raise concerns about patient privacy security. There is, however, utilize open-source, commercially nonrestricted stored readily available download platforms HuggingFace (https://huggingface.co/). Such allow flexibility usage, form fine-tuning, do require API. fine-tune important adapting generic existing domain-specific experts done fine-tuning.3 Full demands storage memory allocation each task, incurring computational costs. However, recent parameter-efficient fine tuning (PEFT) methodologies, Low-Rank Adaptation (LoRA)7 LLMs, mitigate demands. includes freezing pre-trained weights introduction trainable rank decomposition matrices layers transformer architecture. action diminishes count during phase, capacity train incorporate LoRAs tailored distinct inference all base LLM. Creating impactful essential zero requiring training. crafting descriptions clear, decomposable goals, offering model-friendly examples Essential components ensuring verification, utilizing external tools when needed, sequencing thoughtfully, assigning roles humans, like clinician). Notably, design lacks consistency models, subtle syntax variations, often nonintuitive result changes. As progresses, incorporating insights will enhance MIDD. Evaluation continued expand since onset GPT-3.5). dependent method evaluation, interest. scope rigorous evaluations must take place determine level at which they assist field Existing benchmarks combine general stratified assessments. Manual relies introducing high variance stability challenges due cultural individual differences among reviewers.8 Similarly, employs standard scenario-based metrics accuracy, F1-score, ROUGE, BLEU), limitations static necessitating holistic those Liang et al.9 benefit datasets, just publicly MIMIC-III). its position leader acquisition harmonization, contribute work. same sense, furthers explicit definition scenarios emergent capabilities achieve purported benefits occur.10 Finally, we combination details schema amount used, other details) get reported performance, it demonstrates (clinical specifically) exhibit improved sample efficiency.10 Obtaining proper understanding thriving successful helps warning against bias inaccuracies may certain recognizes exist around addressing ethical considerations LLMs. For example, include noise generated difficulty representativeness distribution compared actual data.6 Addressing optimizing scenarios. hallucinations, false misleading LLM, leads inaccurate outputs space, there increased risk patients decision making (depending addressed LLM). Although retrieval augmented generation aim limit improvements needed. Concerns sharing open-science outside Interpretability challenging complexity, them opaque, introduces additional layer potentially skewing derived models. evolve alongside role discovery, address dual poor interpretability bias. Developing methodologies unravel intricate decision-making process transparency, mitigating advancing application advancement key organizations working continue implementing design, lack determining head-on technology full potential. Continued respect increasing identified strategies, supervision extraction, computationally efficient methodologies. progress harnessing used appropriate successful. AI were employed drafting assistance, aiding creation figures tables, improving overall readability language. Critical Path Institute supported Food Drug Administration (FDA) Department Health Human Services (HHS) 55% funded FDA/HHS, totalling $17,612,250, 45% non-government source(s), $14,203,111. contents author(s) necessarily represent official views of, nor endorsement by, FDA/HHS US Government. authors declared no competing interests

Язык: Английский

Ontology engineering with Large Language Models DOI

Patricia Mateiu,

Adrian Groza

Опубликована: Сен. 11, 2023

We tackle the task of enriching ontologies by automatically translating natural language (NL) into Description Logic (DL). Since Large Language Models (LLMs) are best tools for translations, we fine-tuned a GPT-3 model to convert NL OWL Functional Syntax. For fine-tuning, designed pairs sentences in and corresponding translations. This training cover various aspects from ontology engineering: instances, class subsumption, domain range relations, object properties relationships, disjoint classes, complements, or cardinality restrictions. The resulted axioms used enrich an ontology, human supervised manner. developed tool is publicly provided as Protégé plugin.

Язык: Английский

Процитировано

15

Planteome 2024 Update: Reference Ontologies and Knowledgebase for Plant Biology DOI Creative Commons
Laurel Cooper, Justin Elser, Marie‐Angélique Laporte

и другие.

Nucleic Acids Research, Год журнала: 2023, Номер 52(D1), С. D1548 - D1555

Опубликована: Дек. 6, 2023

The Planteome project (https://planteome.org/) provides a suite of reference and crop-specific ontologies an integrated knowledgebase plant genomics data. data in the has been obtained through manual automated curation sourced from more than 40 partner databases resources. Here, we report on updates to ontologies, namely, Plant Ontology (PO), Trait (TO), Experimental Conditions (PECO), integration species/crop-specific vocabularies our partners, Crop (CO) into TO ontology graph. Currently, 11 CO are with addition yam, sorghum, potato since 2018. In addition, size annotation database increased by 34%, number bioentities (genes, proteins, etc.) 125 taxa 72%. We developed new tools facilitate user requests improvements vocabularies, allow fast searching browsing PO terms definitions. These enhancements future changes automate TO-CO mappings knowledge discovery ensure that will continue be valuable resource for biology.

Язык: Английский

Процитировано

13

Towards Next-Generation Urban Decision Support Systems through AI-Powered Construction of Scientific Ontology Using Large Language Models—A Case in Optimizing Intermodal Freight Transportation DOI Creative Commons
Jose Tupayachi, Haowen Xu, Olufemi A. Omitaomu

и другие.

Smart Cities, Год журнала: 2024, Номер 7(5), С. 2392 - 2421

Опубликована: Авг. 31, 2024

The incorporation of Artificial Intelligence (AI) models into various optimization systems is on the rise. However, addressing complex urban and environmental management challenges often demands deep expertise in domain science informatics. This essential for deriving data simulation-driven insights that support informed decision-making. In this context, we investigate potential leveraging pre-trained Large Language Models (LLMs) to create knowledge representations supporting operations research. By adopting ChatGPT-4 API as reasoning core, outline an applied workflow encompasses natural language processing, Methontology-based prompt tuning, Generative Pre-trained Transformer (GPT), automate construction scenario-based ontologies using existing research articles technical manuals datasets simulations. From these ontologies, graphs can be derived widely adopted formats protocols, guiding tasks towards data-informed decision support. performance our methodology evaluated through a comparative analysis contrasts AI-generated ontology with recognized pizza ontology, commonly used tutorials popular software. We conclude real-world case study optimizing system multi-modal freight transportation. Our approach advances by enhancing metadata modeling, improving integration simulation coupling, development strategies software components.

Язык: Английский

Процитировано

4

Generative artificial intelligence in the agri-food value chain - overview, potential, and research challenges DOI Creative Commons
Christian Krupitzer

Frontiers in Food Science and Technology, Год журнала: 2024, Номер 4

Опубликована: Сен. 25, 2024

ChatGPT uses a so called Large Language Model (LLM) to provide textual output of analyzed data. Those LLMs are one example for Generative Artificial Intelligence (AI), which focuses on creating new content, e.g., text, images, or music, based learned patterns. Recently, applications in the food industry and agriculture started apply AI. This mini review provides an overview about AI agri-food supply chain discusses open research challenges, also combination with digital twins.

Язык: Английский

Процитировано

4

From text to insight: large language models for chemical data extraction DOI Creative Commons
Mara Schilling-Wilhelmi, Martiño Ríos-García, Sherjeel Shabih

и другие.

Chemical Society Reviews, Год журнала: 2024, Номер unknown

Опубликована: Дек. 20, 2024

Large language models (LLMs) allow for the extraction of structured data from unstructured sources, such as scientific papers, with unprecedented accuracy and performance.

Язык: Английский

Процитировано

4

Connecting AI: Merging Large Language Models and Knowledge Graph DOI Open Access
Mladjan Jovanovic, Mark Campbell

Computer, Год журнала: 2023, Номер 56(11), С. 103 - 108

Опубликована: Окт. 16, 2023

Combining the generative abilities of large language models with logical and factual coherence knowledge graphs using a connected artificial intelligence architecture minimizes each system's shortcomings amplifies their strengths across many real-world domains.

Язык: Английский

Процитировано

10

Ten quick tips to build a Model Life Cycle DOI Creative Commons
Timothée Poisot, Daniel Becker, Cole B. Brookson

и другие.

PLoS Computational Biology, Год журнала: 2025, Номер 21(2), С. e1012731 - e1012731

Опубликована: Фев. 3, 2025

Язык: Английский

Процитировано

0

Enhancing Knowledge Graph Construction: Evaluating with Emphasis on Hallucination, Omission, and Graph Similarity Metrics DOI

Hussam Ghanem,

Christophe Cruz

Lecture notes in computer science, Год журнала: 2025, Номер unknown, С. 32 - 46

Опубликована: Янв. 1, 2025

Язык: Английский

Процитировано

0

Computational tools and data integration to accelerate vaccine development: challenges, opportunities, and future directions DOI Creative Commons
Lindsey Anderson, Charles Tapley Hoyt, Jeremy Zucker

и другие.

Frontiers in Immunology, Год журнала: 2025, Номер 16

Опубликована: Март 7, 2025

The development of effective vaccines is crucial for combating current and emerging pathogens. Despite significant advances in the field vaccine there remain numerous challenges including lack standardized data reporting curation practices, making it difficult to determine correlates protection from experimental clinical studies. Significant gaps knowledge integration can hinder which relies on a comprehensive understanding interplay between pathogens host immune system. In this review, we explore landscape development, highlighting computational challenges, limitations, opportunities associated with integrating diverse types leveraging artificial intelligence (AI) machine learning (ML) techniques design. We discuss role natural language processing, semantic integration, causal inference extracting valuable insights published literature unstructured sources, as well modeling responses. Furthermore, highlight specific uncertainty quantification emphasize importance establishing formats ontologies facilitate analysis heterogeneous data. Through harmonization safe be accelerated improve public health outcomes. Looking future, need collaborative efforts among researchers, scientists, experts realize full potential AI-assisted design streamline process.

Язык: Английский

Процитировано

0

Unlocking ancient wisdom with modern tools: A new approach to the revitalization of ancient texts based on generative artificial intelligence DOI
Yi Zhao, Wenjie Zhou

The Journal of Academic Librarianship, Год журнала: 2025, Номер 51(3), С. 103055 - 103055

Опубликована: Апрель 12, 2025

Язык: Английский

Процитировано

0