Generative AI and Large Language Models in Reducing Medication Related Harm and Adverse Drug Events – A Scoping Review DOI Creative Commons
Jasmine Chiat Ling Ong,

Chen Michael,

Ning Ng

et al.

medRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: Sept. 14, 2024

Abstract Background Medication-related harm has a significant impact on global healthcare costs and patient outcomes, accounting for deaths in 4.3 per 1000 patients. Generative artificial intelligence (GenAI) emerged as promising tool mitigating risks of medication-related harm. In particular, large language models (LLMs) well-developed generative adversarial networks (GANs) showing promise related tasks. This review aims to explore the scope effectiveness AI reducing harm, identifying existing development challenges research. Methods We searched peer reviewed articles PubMed, Web Science, Embase, Scopus literature published from January 2012 February 2024. included studies focusing or application risk during entire medication use process. excluded using traditional methods only, those unrelated settings, concerning non-prescribed uses such supplements. Extracted variables study characteristics, model specifics performance, any outcome evaluated. Findings A total 2203 were identified, 14 met criteria inclusion into final review. found that used few key applications: drug-drug interaction identification prediction; clinical decision support pharmacovigilance. While performance utility these varied, they generally showed areas like early classification adverse drug events decision-making management. However, no tested prospectively, suggesting need further investigation integration real-world tools improve safety outcomes effectively. Interpretation shows harms, but there are gaps research rigor ethical considerations. Future should focus creation high-quality, task-specific benchmarking datasets implementation outcomes.

Language: Английский

Evaluating Accuracy and Reproducibility of Large Language Model Performance in Pharmacy Education DOI Creative Commons
Amoreena Most, Mengxuan Hu,

Huibo Yang

et al.

medRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: March 24, 2024

Abstract The purpose of this study was to compare performance ChatGPT (GPT-3.5), (GPT-4), Claude2, Llama2-7b, and Llama2-13b on 219 multiple-choice questions focusing critical care pharmacotherapy. To further assess the ability engineering LLMs improve reasoning abilities performance, we examined responses with a zero-shot Chain-of-Thought (CoT) approach, CoT prompting, custom built GPT (PharmacyGPT). A focused pharmacotherapy topics used in Doctor Pharmacy curricula from two accredited colleges pharmacy compiled for study. total five were evaluated: Llama2-13b. primary outcome response accuracy. Of tested, GPT-4 showed highest average accuracy rate at 71.6%. larger variance indicates lower consistency reduced confidence its answers. had lowest (0.070) all LLMs, but performed an 41.5%. Following analaysis overall accuracy, knowledge- vs. skill-based assessed. All demonstrated higher knowledge-based compared questions. questions, 87% 67%, respectively. Response domain clinical can be improved by using prompt techniques.

Language: Английский

Citations

0

Assessing the potential of ChatGPT-4 to accurately identify drug-drug interactions and provide clinical pharmacotherapy recommendations DOI Creative Commons
Amoreena Most, Aaron Chase, Andrea Sikora

et al.

medRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: June 30, 2024

Abstract Background Large language models (LLMs) such as ChatGPT have emerged promising artificial intelligence tools to support clinical decision making. The ability of evaluate medication regimens, identify drug-drug interactions (DDIs), and provide recommendations is unknown. purpose this study examine the performance GPT-4 clinically relevant DDIs assess accuracy provided. Methods A total 15 regimens were created containing commonly encountered that considered either significant or unimportant. Two separate prompts developed for regimen evaluation. primary outcome was if identified most DDI within regimen. Secondary outcomes included rating GPT-4’s interaction rationale, relevance ranking, overall recommendations. Interrater reliability determined using kappa statistic. Results intended in 90% provided (27/30). categorized 86% highly compared 53% being by expert opinion. Inappropriate potentially causing patient harm 14% responses (2/14), 63% contained accurate information but incomplete (19/30). Conclusions While demonstrated promise its DDIs, application cases remains an area investigation. Findings from may assist future development refinement LLMs queries decision-making.

Language: Английский

Citations

0

Large language models management of complex medication regimens: a case-based evaluation DOI
Steven Xu, Amoreena Most,

Aaron Chase

et al.

medRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: July 8, 2024

Abstract Background Large language models (LLMs) have shown capability in diagnosing complex medical cases and passing licensing exams, but to date, only limited evaluations studied how LLMs interpret, analyze, optimize medication regimens. The purpose of this evaluation was test four ability identify errors appropriate interventions on patient from the intensive care unit (ICU). Methods A series eight were developed by critical pharmacists including history present illness, laboratory values, vital signs, Then, (ChatGPT (GPT-3.5), ChatGPT (GPT-4), Claude2, Llama2-7b) prompted develop a regimen for patient. LLM generated regimens then reviewed panel seven assess presence clinical relevance. For each recommended LLM, clinicians asked if they would continue medication, perceived medications recommended, life-threatening choices, rank overall agreement 5-point Likert scale. Results clinician rated therapies between 55.8-67.9% time. Clinicians 1.57-4.29 per regimen, recommendations 15.0-55.3% Level 1.85-2.67 LLMs. Conclusions demonstrated potential serve as decision support management with further domain specific training; however, caution should be used when employing given capabilities.

Language: Английский

Citations

0

Generative AI and Large Language Models in Reducing Medication Related Harm and Adverse Drug Events – A Scoping Review DOI Creative Commons
Jasmine Chiat Ling Ong,

Chen Michael,

Ning Ng

et al.

medRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: Sept. 14, 2024

Abstract Background Medication-related harm has a significant impact on global healthcare costs and patient outcomes, accounting for deaths in 4.3 per 1000 patients. Generative artificial intelligence (GenAI) emerged as promising tool mitigating risks of medication-related harm. In particular, large language models (LLMs) well-developed generative adversarial networks (GANs) showing promise related tasks. This review aims to explore the scope effectiveness AI reducing harm, identifying existing development challenges research. Methods We searched peer reviewed articles PubMed, Web Science, Embase, Scopus literature published from January 2012 February 2024. included studies focusing or application risk during entire medication use process. excluded using traditional methods only, those unrelated settings, concerning non-prescribed uses such supplements. Extracted variables study characteristics, model specifics performance, any outcome evaluated. Findings A total 2203 were identified, 14 met criteria inclusion into final review. found that used few key applications: drug-drug interaction identification prediction; clinical decision support pharmacovigilance. While performance utility these varied, they generally showed areas like early classification adverse drug events decision-making management. However, no tested prospectively, suggesting need further investigation integration real-world tools improve safety outcomes effectively. Interpretation shows harms, but there are gaps research rigor ethical considerations. Future should focus creation high-quality, task-specific benchmarking datasets implementation outcomes.

Language: Английский

Citations

0