Current Urology Reports, Journal Year: 2024, Volume and Issue: 25(10), P. 261 - 265
Published: June 18, 2024
Language: Английский
Current Urology Reports, Journal Year: 2024, Volume and Issue: 25(10), P. 261 - 265
Published: June 18, 2024
Language: Английский
Prostate Cancer and Prostatic Diseases, Journal Year: 2024, Volume and Issue: unknown
Published: May 14, 2024
Abstract Background Generative Pretrained Model (GPT) chatbots have gained popularity since the public release of ChatGPT. Studies evaluated ability different GPT models to provide information about medical conditions. To date, no study has assessed quality ChatGPT outputs prostate cancer related questions from both physician and perspective while optimizing for patient consumption. Methods Nine cancer-related questions, identified through Google Trends (Global), were categorized into diagnosis, treatment, postoperative follow-up. These processed using 3.5, responses recorded. Subsequently, these re-inputted create simplified summaries understandable at a sixth-grade level. Readability original layperson was validated readability tools. A survey conducted among urology providers (urologists urologists in training) rate accuracy, completeness, clarity 5-point Likert scale. Furthermore, two independent reviewers on correctness trifecta: decision-making sufficiency. Public assessment summaries’ understandability carried out Amazon Mechanical Turk (MTurk). Participants rated demonstrated their understanding multiple-choice question. Results GPT-generated output deemed correct by 71.7% 94.3% raters (36 urologists, 17 residents) across 9 scenarios. this as accurate 8 (88.9%) scenarios sufficient make decision Mean higher than ([original v. ChatGPT, mean (SD), p -value] Flesch Reading Ease: 36.5(9.1) 70.2(11.2), <0.0001; Gunning Fog: 15.8(1.7) 9.5(2.0), < 0.0001; Grade Level: 12.8(1.2) 7.4(1.7), Coleman Liau: 13.7(2.1) 8.6(2.4), 0.0002; Smog index: 11.8(1.2) 6.7(1.8), Automated Index: 13.1(1.4) 7.5(2.1), 0.0001). MTurk workers ( n = 514) (89.5–95.7%) correctly understood content (63.0–87.4%). Conclusion shows promise education contents, but technology is not designed delivering patients information. Prompting model respond with may enhance its utility when used GPT-powered chatbots.
Language: Английский
Citations
20Nature Medicine, Journal Year: 2025, Volume and Issue: unknown
Published: Jan. 8, 2025
Language: Английский
Citations
13JAMA Network Open, Journal Year: 2025, Volume and Issue: 8(2), P. e2457879 - e2457879
Published: Feb. 4, 2025
Importance There is much interest in the clinical integration of large language models (LLMs) health care. Many studies have assessed ability LLMs to provide advice, but quality their reporting uncertain. Objective To perform a systematic review examine variability among peer-reviewed evaluating performance generative artificial intelligence (AI)–driven chatbots for summarizing evidence and providing advice inform development Chatbot Assessment Reporting Tool (CHART). Evidence Review A search MEDLINE via Ovid, Embase Elsevier, Web Science from inception October 27, 2023, was conducted with help sciences librarian yield 7752 articles. Two reviewers screened articles by title abstract followed full-text identify primary accuracy AI-driven (chatbot studies). then performed data extraction 137 eligible studies. Findings total were included. Studies examined topics surgery (55 [40.1%]), medicine (51 [37.2%]), care (13 [9.5%]). focused on treatment (91 [66.4%]), diagnosis (60 [43.8%]), or disease prevention (29 [21.2%]). Most (136 [99.3%]) evaluated inaccessible, closed-source did not enough information version LLM under evaluation. All lacked sufficient description characteristics, including temperature, token length, fine-tuning availability, layers, other details. describe prompt engineering phase study. The date querying reported 54 (39.4%) (89 [65.0%]) used subjective means define successful chatbot, while less than one-third addressed ethical, regulatory, patient safety implications LLMs. Conclusions Relevance In this chatbot studies, heterogeneous may CHART standards. Ethical, considerations are crucial as grows
Language: Английский
Citations
5medRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown
Published: July 25, 2024
Large Language Models (LLMs) are rapidly being adopted in healthcare, necessitating standardized reporting guidelines. We present TRIPOD-LLM, an extension of the TRIPOD+AI statement, addressing unique challenges LLMs biomedical applications. TRIPOD-LLM provides a comprehensive checklist 19 main items and 50 subitems, covering key aspects from title to discussion. The guidelines introduce modular format accommodating various LLM research designs tasks, with 14 32 subitems applicable across all categories. Developed through expedited Delphi process expert consensus, emphasizes transparency, human oversight, task-specific performance reporting. also interactive website ( https://tripod-llm.vercel.app/ ) facilitating easy guideline completion PDF generation for submission. As living document, will evolve field, aiming enhance quality, reproducibility, clinical applicability healthcare
Language: Английский
Citations
10Surgical Endoscopy, Journal Year: 2024, Volume and Issue: 38(5), P. 2320 - 2330
Published: April 17, 2024
Language: Английский
Citations
8Helicobacter, Journal Year: 2024, Volume and Issue: 29(1)
Published: Jan. 1, 2024
Abstract Background Large language models (LLMs) are promising medical counseling tools, but the reliability of responses remains unclear. We aimed to assess feasibility three popular LLMs as tools for Helicobacter pylori infection in different languages. Materials and Methods This study was conducted between November 20 December 1, 2023. Three large (ChatGPT 4.0 [LLM1], ChatGPT 3.5 [LLM2], ERNIE Bot [LLM3]) were input 15 H. related questions each, once English Chinese. Each chat using “New Chat” function avoid bias from correlation interference. Responses recorded blindly assigned reviewers scoring on established Likert scales: accuracy (ranged 1–6 point), completeness 1–3 comprehensibility point). The acceptable thresholds scales set at a minimum 4, 2, respectively. Final various source interlanguage comparisons made. Results overall mean (SD) score 4.80 (1.02), while 1.82 (0.78) 2.90 (0.36) score. proportions accuracy, completeness, 90%, 45.6%, 100%, proportion better than Chinese ( p = 0.034). For LLM3 0.0055). As LLM1 0.0257). comprehensibility, 0.0496). No differences found LLMs. Conclusions responded satisfactorily infection. But further improving reliability, along with considering nuances, is crucial optimizing performance.
Language: Английский
Citations
7International Journal of Impotence Research, Journal Year: 2024, Volume and Issue: 36(7), P. 796 - 797
Published: March 11, 2024
Language: Английский
Citations
5The Lancet Digital Health, Journal Year: 2024, Volume and Issue: 6(7), P. e441 - e443
Published: June 19, 2024
Language: Английский
Citations
5Journal of Medical Internet Research, Journal Year: 2024, Volume and Issue: 26, P. e60291 - e60291
Published: Sept. 12, 2024
Recent surveys indicate that 48% of consumers actively use generative artificial intelligence (AI) for health-related inquiries. Despite widespread adoption and the potential to improve health care access, scant research examines performance AI chatbot responses regarding emergency advice.
Language: Английский
Citations
5Annals of the Rheumatic Diseases, Journal Year: 2025, Volume and Issue: unknown
Published: Feb. 1, 2025
Language: Английский
Citations
0