Cited by Architecting Utopias: How AI in Healthcare Envisions Societal Ideals and Human Flourishing

A future role for health applications of large language models depends on regulators enforcing safety standards DOI

Oscar Freyer, Isabella C. Wiest, Jakob Nikolas Kather

et al.

The Lancet Digital Health, Journal Year: 2024, Volume and Issue: 6(9), P. e662 - e672

Published: Aug. 23, 2024

Among the rapid integration of artificial intelligence in clinical settings, large language models (LLMs), such as Generative Pre-trained Transformer-4, have emerged multifaceted tools that potential for health-care delivery, diagnosis, and patient care. However, deployment LLMs raises substantial regulatory safety concerns. Due to their high output variability, poor inherent explainability, risk so-called AI hallucinations, LLM-based applications serve a medical purpose face challenges approval devices under US EU laws, including recently passed Artificial Intelligence Act. Despite unaddressed risks patients, misdiagnosis unverified advice, are available on market. The ambiguity surrounding these creates an urgent need frameworks accommodate unique capabilities limitations. Alongside development frameworks, existing regulations should be enforced. If regulators fear enforcing market dominated by supply or technology companies, consequences layperson harm will force belated action, damaging potentiality advice.

Language: Английский

Citations

The TRIPOD-LLM reporting guideline for studies using large language models DOI

Jack Gallifant,

Majid Afshar,

Saleem Ameen

et al.

Nature Medicine, Journal Year: 2025, Volume and Issue: unknown

Published: Jan. 8, 2025

Language: Английский

Citations

GPT-4 assistance for improvement of physician performance on patient care tasks: a randomized controlled trial DOI

Ethan Goh, Robert J. Gallo, Eric Strong

et al.

Nature Medicine, Journal Year: 2025, Volume and Issue: unknown

Published: Feb. 5, 2025

Language: Английский

Citations

Development of a Preliminary Patient Safety Classification System for Generative AI DOI

Bat‐Zion Hose,

Jessica L Handley,

Joshua Biro

et al.

BMJ Quality & Safety, Journal Year: 2025, Volume and Issue: unknown, P. bmjqs - 017918

Published: Jan. 3, 2025

Generative artificial intelligence (AI) technologies have the potential to revolutionise healthcare delivery but require classification and monitoring of patient safety risks. To address this need, we developed evaluated a preliminary system for categorising generative AI errors. Our is organised around two stages (input output) with specific error types by stage. We applied our applications assess its effectiveness in issues: patient-facing conversational large language models (LLMs) an ambient digital scribe (ADS) clinical documentation. In LLM analysis, identified 45 errors across 27 medical queries, omission being most common (42% errors). Of errors, 50% were categorised as low significance, 25% moderate significance high significance. Similarly, ADS simulation, 66 11 visits, (83% 55% 45% These findings demonstrate system’s utility output from different applications, providing starting point developing robust process better understand AI-enabled

Language: Английский

Citations

Can Large Language Models Aid Caregivers of Pediatric Cancer Patients in Information Seeking? A Cross‐Sectional Investigation DOI

Emre Sezgın, D Jackson, A. Baki Kocaballı

et al.

Cancer Medicine, Journal Year: 2025, Volume and Issue: 14(1)

Published: Jan. 1, 2025

ABSTRACT Purpose Caregivers in pediatric oncology need accurate and understandable information about their child's condition, treatment, side effects. This study assesses the performance of publicly accessible large language model (LLM)‐supported tools providing valuable reliable to caregivers children with cancer. Methods In this cross‐sectional study, we evaluated four LLM‐supported tools—ChatGPT (GPT‐4), Google Bard (Gemini Pro), Microsoft Bing Chat, SGE—against a set frequently asked questions (FAQs) derived from Children's Oncology Group Family Handbook expert input (In total, 26 FAQs 104 generated responses). Five experts assessed LLM responses using measures including accuracy, clarity, inclusivity, completeness, clinical utility, overall rating. Additionally, content quality was readability, AI disclosure, source credibility, resource matching, originality. We used descriptive analysis statistical tests Shapiro–Wilk, Levene's, Kruskal–Wallis H ‐tests, Dunn's post hoc for pairwise comparisons. Results ChatGPT shows high when by experts. also performed well, especially accuracy clarity responses, whereas Chat SGE had lower scores. Regarding disclosure being AI, it observed less which may have affected maintained balance between response clarity. most readable answered complexity. varied significantly ( p < 0.001) across all evaluations except inclusivity. Through our thematic free‐text comments, emotional tone empathy emerged as unique theme mixed feedback on expectations be empathetic. Conclusion can enhance caregivers' knowledge oncology. Each has strengths areas improvement, indicating careful selection based specific contexts. Further research is required explore application other medical specialties patient demographics, assessing broader applicability long‐term impacts.

Language: Английский

Citations

Large Language Model–Based Responses to Patients’ In-Basket Messages DOI

William Small, Batia M. Wiesenfeld, Beatrix Brandfield-Harvey

et al.

JAMA Network Open, Journal Year: 2024, Volume and Issue: 7(7), P. e2422399 - e2422399

Published: July 16, 2024

Importance Virtual patient-physician communications have increased since 2020 and negatively impacted primary care physician (PCP) well-being. Generative artificial intelligence (GenAI) drafts of patient messages could potentially reduce health professional (HCP) workload improve communication quality, but only if the are considered useful. Objectives To assess PCPs’ perceptions GenAI to examine linguistic characteristics associated with equity perceived empathy. Design, Setting, Participants This cross-sectional quality improvement study tested hypothesis that ratings (created using electronic record [EHR] standard prompts) would be equivalent HCP-generated responses on 3 dimensions. The was conducted at NYU Langone Health private patient-HCP internal medicine practices piloting GenAI. Exposures Randomly assigned coupled either an HCP message or draft response. Main Outcomes Measures PCPs rated responses’ information content (eg, relevance), a Likert scale, verbosity), whether they use start anew (usable vs unusable). Branching logic further probed for empathy, personalization, professionalism responses. Computational linguistics methods assessed differences in responses, focusing Results A total 16 (8 [50.0%] female) reviewed 344 (175 drafted; 169 drafted). Both were favorably. higher style than (mean [SD], 3.70 [1.15] 3.38 [1.20]; P = .01, U 12 568.5) similar HCPs 3.53 [1.26] 3.41 [1.27]; .37; 13 981.0) usable proportion 0.69 [0.48] 0.65 [0.47], .49, t −0.6842). Usable more empathetic (32 86 [37.2%] 79 [16.5%]; difference, 125.5%), possibly attributable subjective 0.54 [0.16] 0.31 [0.23]; &lt; .001; 74.2%) positive [SD] polarity, 0.21 [0.14] 0.13 [0.25]; .02; 61.5%) language; also numerically longer word count, 90.5 [32.0] 65.4 [62.6]; 38.4%), difference not statistically significant ( .07) linguistically complex score, 125.2 [47.8] 95.4 [58.8]; .002; 31.2%). Conclusions In this PCP EHR-integrated chatbot, found communicate better empathy HCPs, highlighting its potential enhance communication. However, less readable HCPs’, concern patients low English literacy.

Language: Английский

Citations

A Call for Artificial Intelligence Implementation Science Centers to Evaluate Clinical Effectiveness DOI

Chris Longhurst, Karandeep Singh, Aneesh Chopra

et al.

NEJM AI, Journal Year: 2024, Volume and Issue: 1(8)

Published: July 10, 2024

Language: Английский

Citations

The TRIPOD-LLM Statement: A Targeted Guideline For Reporting Large Language Models Use DOI

Jack Gallifant, Majid Afshar,

Saleem Ameen

et al.

medRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: July 25, 2024

Large Language Models (LLMs) are rapidly being adopted in healthcare, necessitating standardized reporting guidelines. We present TRIPOD-LLM, an extension of the TRIPOD+AI statement, addressing unique challenges LLMs biomedical applications. TRIPOD-LLM provides a comprehensive checklist 19 main items and 50 subitems, covering key aspects from title to discussion. The guidelines introduce modular format accommodating various LLM research designs tasks, with 14 32 subitems applicable across all categories. Developed through expedited Delphi process expert consensus, emphasizes transparency, human oversight, task-specific performance reporting. also interactive website ( https://tripod-llm.vercel.app/ ) facilitating easy guideline completion PDF generation for submission. As living document, will evolve field, aiming enhance quality, reproducibility, clinical applicability healthcare

Language: Английский

Citations

Bridging the gap: a practical step-by-step approach to warrant safe implementation of large language models in healthcare DOI

Jessica D. Workum, Davy van de Sande, Diederik Gommers

et al.

Frontiers in Artificial Intelligence, Journal Year: 2025, Volume and Issue: 8

Published: Jan. 27, 2025

Large Language Models (LLMs) offer considerable potential to enhance various aspects of healthcare, from aiding with administrative tasks clinical decision support. However, despite the growing use LLMs in a critical gap persists clear, actionable guidelines available healthcare organizations and providers ensure their responsible safe implementation. In this paper, we propose practical step-by-step approach bridge support warranting implementation into healthcare. The recommendations manuscript include protecting patient privacy, adapting models healthcare-specific needs, adjusting hyperparameters appropriately, ensuring proper medical prompt engineering, distinguishing between (CDS) non-CDS applications, systematically evaluating LLM outputs using structured approach, implementing solid model governance structure. We furthermore ACUTE mnemonic; for assessing responses based on Accuracy, Consistency, semantically Unaltered outputs, Traceability, Ethical considerations. Together, these aim provide clear pathway practice.

Language: Английский

Citations

Integrating Artificial Intelligence Support in Patient Care While Respecting Ethical Principles DOI

Marianne Sharko, Curtis L. Cole

JAMA Network Open, Journal Year: 2025, Volume and Issue: 8(3), P. e250462 - e250462

Published: March 11, 2025

Joanna S. Cavalier, MD; Benjamin A. Goldstein, PhD; Vardit Ravitsky, Jean-Christophe Bélisle-Pipon, Armando Bedoya, MD, MMCi; Jennifer Maddocks, PT, Sam Klotman, MPH; Matthew Roman, MHA, Jessica Sperling, Chun Xu, MB; Eric G. Poon, Anand Chowdhury, MMCi

Language: Английский

Citations