Evaluating the effectiveness of large language models in patient education for conjunctivitis DOI
Jingyuan Wang, Runhan Shi, Qihua Le

et al.

British Journal of Ophthalmology, Journal Year: 2024, Volume and Issue: unknown, P. bjo - 325599

Published: Aug. 30, 2024

To evaluate the quality of responses from large language models (LLMs) to patient-generated conjunctivitis questions.

Language: Английский

Utility of artificial intelligence‐based large language models in ophthalmic care DOI Creative Commons
Sayantan Biswas,

Leon N. Davies,

Amy L. Sheppard

et al.

Ophthalmic and Physiological Optics, Journal Year: 2024, Volume and Issue: 44(3), P. 641 - 671

Published: Feb. 25, 2024

With the introduction of ChatGPT, artificial intelligence (AI)-based large language models (LLMs) are rapidly becoming popular within scientific community. They use natural processing to generate human-like responses queries. However, application LLMs and comparison abilities among different with their human counterparts in ophthalmic care remain under-reported.

Language: Английский

Citations

31

A Preliminary Checklist (METRICS) to Standardize the Design and Reporting of Studies on Generative Artificial Intelligence–Based Models in Health Care Education and Practice: Development Study Involving a Literature Review DOI Creative Commons
Malik Sallam, Muna Barakat, Mohammed Sallam

et al.

Interactive Journal of Medical Research, Journal Year: 2024, Volume and Issue: 13, P. e54704 - e54704

Published: Jan. 26, 2024

Background Adherence to evidence-based practice is indispensable in health care. Recently, the utility of generative artificial intelligence (AI) models care has been evaluated extensively. However, lack consensus guidelines on design and reporting findings these studies poses a challenge for interpretation synthesis evidence. Objective This study aimed develop preliminary checklist standardize AI-based education practice. Methods A literature review was conducted Scopus, PubMed, Google Scholar. Published records with “ChatGPT,” “Bing,” or “Bard” title were retrieved. Careful examination methodologies employed included identify common pertinent themes possible gaps reporting. panel discussion held establish unified thorough AI The finalized used evaluate by 2 independent raters. Cohen κ as method interrater reliability. Results final data set that formed basis theme identification analysis comprised total 34 records. 9 collectively referred METRICS (Model, Evaluation, Timing, Range/Randomization, Individual factors, Count, Specificity prompts language). Their details are follows: (1) Model its exact settings; (2) Evaluation approach generated content; (3) Timing testing model; (4) Transparency source; (5) Range tested topics; (6) Randomization selecting queries; (7) factors queries reliability; (8) Count executed test (9) language used. overall mean score 3.0 (SD 0.58). acceptable, range 0.558 0.962 (P<.001 items). With classification per item, highest average recorded “Model” followed “Specificity” while lowest scores “Randomization” item (classified suboptimal) “Individual factors” satisfactory). Conclusions can facilitate guiding researchers toward best practices results. highlight need standardized algorithms care, considering variability observed proposed could be helpful base universally accepted which swiftly evolving research topic.

Language: Английский

Citations

24

Systematic Review of Large Language Models for Patient Care: Current Applications and Challenges DOI Creative Commons
Felix Busch, Lena Hoffmann, Christopher Rueger

et al.

medRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: March 5, 2024

Abstract The introduction of large language models (LLMs) into clinical practice promises to improve patient education and empowerment, thereby personalizing medical care broadening access knowledge. Despite the popularity LLMs, there is a significant gap in systematized information on their use care. Therefore, this systematic review aims synthesize current applications limitations LLMs using data-driven convergent synthesis approach. We searched 5 databases for qualitative, quantitative, mixed methods articles published between 2022 2023. From 4,349 initial records, 89 studies across 29 specialties were included, primarily examining based GPT-3.5 (53.2%, n=66 124 different examined per study) GPT-4 (26.6%, n=33/124) architectures question answering, followed by generation, including text summarization or translation, documentation. Our analysis delineates two primary domains LLM limitations: design output. Design included 6 second-order 12 third-order codes, such as lack domain optimization, data transparency, accessibility issues, while output 9 32 example, non-reproducibility, non-comprehensiveness, incorrectness, unsafety, bias. In conclusion, study first systematically map care, providing foundational framework taxonomy implementation evaluation healthcare settings.

Language: Английский

Citations

18

Current applications and challenges in large language models for patient care: a systematic review DOI Creative Commons
Felix Busch, Lena Hoffmann, Christopher Rueger

et al.

Communications Medicine, Journal Year: 2025, Volume and Issue: 5(1)

Published: Jan. 21, 2025

Abstract Background The introduction of large language models (LLMs) into clinical practice promises to improve patient education and empowerment, thereby personalizing medical care broadening access knowledge. Despite the popularity LLMs, there is a significant gap in systematized information on their use care. Therefore, this systematic review aims synthesize current applications limitations LLMs Methods We systematically searched 5 databases for qualitative, quantitative, mixed methods articles published between 2022 2023. From 4349 initial records, 89 studies across 29 specialties were included. Quality assessment was performed using Mixed Appraisal Tool 2018. A data-driven convergent synthesis approach applied thematic syntheses LLM free line-by-line coding Dedoose. Results show that most investigate Generative Pre-trained Transformers (GPT)-3.5 (53.2%, n = 66 124 different examined) GPT-4 (26.6%, 33/124) answering questions, followed by generation, including text summarization or translation, documentation. Our analysis delineates two primary domains limitations: design output. Design include 6 second-order 12 third-order codes, such as lack domain optimization, data transparency, accessibility issues, while output 9 32 example, non-reproducibility, non-comprehensiveness, incorrectness, unsafety, bias. Conclusions This maps care, providing foundational framework taxonomy implementation evaluation healthcare settings.

Language: Английский

Citations

6

Large Language Models for Chatbot Health Advice Studies DOI Creative Commons
Bright Huo,

Amy Boyle,

Nana Marfo

et al.

JAMA Network Open, Journal Year: 2025, Volume and Issue: 8(2), P. e2457879 - e2457879

Published: Feb. 4, 2025

Importance There is much interest in the clinical integration of large language models (LLMs) health care. Many studies have assessed ability LLMs to provide advice, but quality their reporting uncertain. Objective To perform a systematic review examine variability among peer-reviewed evaluating performance generative artificial intelligence (AI)–driven chatbots for summarizing evidence and providing advice inform development Chatbot Assessment Reporting Tool (CHART). Evidence Review A search MEDLINE via Ovid, Embase Elsevier, Web Science from inception October 27, 2023, was conducted with help sciences librarian yield 7752 articles. Two reviewers screened articles by title abstract followed full-text identify primary accuracy AI-driven (chatbot studies). then performed data extraction 137 eligible studies. Findings total were included. Studies examined topics surgery (55 [40.1%]), medicine (51 [37.2%]), care (13 [9.5%]). focused on treatment (91 [66.4%]), diagnosis (60 [43.8%]), or disease prevention (29 [21.2%]). Most (136 [99.3%]) evaluated inaccessible, closed-source did not enough information version LLM under evaluation. All lacked sufficient description characteristics, including temperature, token length, fine-tuning availability, layers, other details. describe prompt engineering phase study. The date querying reported 54 (39.4%) (89 [65.0%]) used subjective means define successful chatbot, while less than one-third addressed ethical, regulatory, patient safety implications LLMs. Conclusions Relevance In this chatbot studies, heterogeneous may CHART standards. Ethical, considerations are crucial as grows

Language: Английский

Citations

6

Utility of ChatGPT for Automated Creation of Patient Education Handouts: An Application in Neuro-Ophthalmology DOI
Brendan Tao, Armin Handzic,

Nicholas J. Hua

et al.

Journal of Neuro-Ophthalmology, Journal Year: 2024, Volume and Issue: unknown

Published: Jan. 4, 2024

Background: Patient education in ophthalmology poses a challenge for physicians because of time and resource limitations. ChatGPT (OpenAI, San Francisco) may assist with automating production patient handouts on common neuro-ophthalmic diseases. Methods: We queried ChatGPT-3.5 to generate 51 across 17 conditions. devised the “Quality Generated Language Outputs Patients” (QGLOP) tool assess domains accuracy/comprehensiveness, bias, currency, tone, each scored out 4 total 16. A fellowship-trained neuro-ophthalmologist passage. Handout readability was assessed using Simple Measure Gobbledygook (SMOG), which estimates years required understand text. Results: The QGLOP scores accuracy, tone were found be 2.43, 3, 3.43, 3.02 respectively. mean score 11.9 [95% CI 8.98, 14.8] 16 points, indicating performance 74.4% 56.1%, 92.5%]. SMOG responses as 10.9 9.36, 12.4] education. Conclusions: suggests that ophthalmologist have at-least moderate level satisfaction write-up quality conferred by ChatGPT. This still requires final review editing before dissemination. Comparatively, rarer 5% collectively either extreme would require very mild or extensive revision. Also, exceeded accepted upper limits grade 8 reading health-related handouts. In its current iteration, should used an efficiency initial draft neuro-ophthalmologist, who then refine accuracy lay readership.

Language: Английский

Citations

14

Assessing the Accuracy and Reliability of AI-Generated Responses to Patient Questions Regarding Spine Surgery DOI
Viknesh Kasthuri, Jacob Glueck, Han Pham

et al.

Journal of Bone and Joint Surgery, Journal Year: 2024, Volume and Issue: 106(12), P. 1136 - 1142

Published: Feb. 9, 2024

Background: In today’s digital age, patients increasingly rely on online search engines for medical information. The integration of large language models such as GPT-4 into Bing raises concerns over the potential transmission misinformation when information regarding spine surgery. Methods: SearchResponse.io, a database that archives People Also Ask (PAA) data from Google, was utilized to determine most popular patient questions 4 specific surgery topics: anterior cervical discectomy and fusion, lumbar laminectomy, spinal deformity. Bing’s responses these questions, along with cited sources, were recorded analysis. Two fellowship-trained surgeons assessed accuracy answers 6-point scale completeness 3-point scale. Inaccurate re-queried 2 weeks later. Cited sources categorized evaluated against Journal American Medical Association (JAMA) benchmark criteria. Interrater reliability measured use kappa statistic. A linear regression analysis explore relationship between answer type source, number mean JAMA score. Results: 71 PAA analyzed. average score 2.03 (standard deviation [SD], 0.36), 4.49 (SD, 1.10). Among question topics, deformity had lowest Re-querying initially low scores resulted in improved accuracy. commercial prevalent. across all averaged 2.63. Government highest (3.30), whereas social media (1.75). Conclusions: generally accurate adequately complete, incorrect rectified upon re-querying. plurality sourced websites. not significantly correlated These findings underscore importance ongoing evaluation improvement ensure reliable informative results seeking amid experience.

Language: Английский

Citations

14

ChatGPT enters the room: what it means for patient counseling, physician education, academics, and disease management DOI
Bita Momenaei, Hana A. Mansour, Ajay E. Kuriyan

et al.

Current Opinion in Ophthalmology, Journal Year: 2024, Volume and Issue: 35(3), P. 205 - 209

Published: Feb. 7, 2024

Purpose of review This seeks to provide a summary the most recent research findings regarding utilization ChatGPT, an artificial intelligence (AI)-powered chatbot, in field ophthalmology addition exploring limitations and ethical considerations associated with its application. Recent ChatGPT has gained widespread recognition demonstrated potential enhancing patient physician education, boosting productivity, streamlining administrative tasks. In various studies examining utility ophthalmology, exhibited fair good accuracy, iteration showcasing superior performance providing ophthalmic recommendations across disorders such as corneal diseases, orbital disorders, vitreoretinal uveitis, neuro-ophthalmology, glaucoma. proves beneficial for patients accessing information aids physicians triaging well formulating differential diagnoses. Despite benefits, that require acknowledgment including risk offering inaccurate or harmful information, dependence on outdated data, necessity high level education data comprehension, concerns privacy within domain. Summary is promising new tool could contribute healthcare research, potentially reducing work burdens. However, current necessitate complementary role human expert oversight.

Language: Английский

Citations

14

Applications of artificial intelligence-enabled robots and chatbots in ophthalmology: recent advances and future trends DOI
Yeganeh Madadi, Mohammad Delsoz, Albert S Khouri

et al.

Current Opinion in Ophthalmology, Journal Year: 2024, Volume and Issue: 35(3), P. 238 - 243

Published: Jan. 22, 2024

Purpose of review Recent advances in artificial intelligence (AI), robotics, and chatbots have brought these technologies to the forefront medicine, particularly ophthalmology. These been applied diagnosis, prognosis, surgical operations, patient-specific care It is thus both timely pertinent assess existing landscape, recent advances, trajectory trends AI, AI-enabled robots, findings Some developments integrated AI enabled robotics with procedures More recently, large language models (LLMs) like ChatGPT shown promise augmenting research capabilities diagnosing ophthalmic diseases. may portend a new era doctor-patient-machine collaboration. Summary Ophthalmology undergoing revolutionary change research, clinical practice, interventions. Ophthalmic chatbot based on LLMs are converging create digital Collectively, future which conventional knowledge will be seamlessly improve patient experience enhance therapeutic outcomes.

Language: Английский

Citations

11

Analysis of ChatGPT Responses to Ophthalmic Cases: Can ChatGPT Think Like an Ophthalmologist? DOI Creative Commons
Jimmy Chen,

Akshay J Reddy,

Eman Al-Sharif

et al.

Ophthalmology Science, Journal Year: 2024, Volume and Issue: 5(1), P. 100600 - 100600

Published: Aug. 23, 2024

Language: Английский

Citations

9