Competency of Large Language Models in Evaluating Appropriate Responses to Suicidal Ideation: Comparative Study (Preprint) DOI
Ryan McBain, Jonathan Cantor, Li Ang Zhang

и другие.

Опубликована: Окт. 23, 2024

BACKGROUND With suicide rates in the United States at an all-time high, individuals experiencing suicidal ideation are increasingly turning to large language models (LLMs) for guidance and support. OBJECTIVE The objective of this study was assess competency 3 widely used LLMs distinguish appropriate versus inappropriate responses when engaging who exhibit ideation. METHODS This observational, cross-sectional evaluated revised Suicidal Ideation Response Inventory (SIRI-2) generated by ChatGPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro. Data collection analyses were conducted July 2024. A common training module mental health professionals, SIRI-2 provides 24 hypothetical scenarios which a patient exhibits depressive symptoms ideation, followed two clinician responses. Clinician scored from –3 (highly inappropriate) +3 appropriate). All provided with standardized set instructions rate We compared LLM those expert suicidologists, conducting linear regression converting z scores identify outliers (z score&gt;1.96 or &lt;–1.96; <i>P</i>&lt;0.05). Furthermore, we final produced professionals prior studies. RESULTS rated as more than ratings suicidologists. item-level mean difference 0.86 ChatGPT (95% CI 0.61-1.12; <i>P</i>&lt;.001), 0.61 0.41-0.81; 0.73 0.35-1.11; <i>P</i>&lt;.001). In terms scores, 19% (9 48) Similarly, 11% (5 Additionally, 36% (17 score 45.7, roughly equivalent master’s level counselors 36.7, exceeding performance after intervention skills training. 54.5, untrained K-12 school staff. CONCLUSIONS Current versions major demonstrated upward bias their evaluations ideation; however, 2 performed exceeded professionals.

Язык: Английский

Clinical Applications and Limitations of Large Language Models in Nephrology: A Systematic Review DOI Creative Commons

Zsuzsa Unger,

Shelly Soffer, Orly Efros

и другие.

medRxiv (Cold Spring Harbor Laboratory), Год журнала: 2024, Номер unknown

Опубликована: Ноя. 1, 2024

Abstract Background Large Language Models (LLMs) are emerging as promising tools in healthcare. This systematic review examines LLMs’ potential applications nephrology, highlighting their benefits and limitations. Methods We conducted a literature search PubMed Web of Science, selecting studies based on Preferred Reporting Items for Systematic Reviews Meta-Analyses (PRISMA) guidelines. The focuses the latest advancements LLMs nephrology from 2020 to 2024. PROSPERO registration number: CRD42024550169. Results Fourteen met inclusion criteria were categorized into five key areas nephrology: Streamlining workflow, disease prediction prognosis, laboratory data interpretation management, renal dietary patient education. showed high performance various clinical tasks, including managing continuous replacement therapy (CRRT) alarms (GPT-4 accuracy 90-94%) reducing intensive care unit (ICU) alarm fatigue, predicting chronic kidney diseases (CKD) progression (improved positive predictive value 6.7% 20.9%). In education, GPT-4 excelled at simplifying medical information by readability complexity, accurately translating transplant resources. Gemini provided most accurate responses frequently asked questions (FAQs) about CKD. Conclusions While incorporation shows promise across levels care, broad implementation is still premature. Further research required validate these terms accuracy, rare critical conditions, real-world performance.

Язык: Английский

Процитировано

0

Competency of Large Language Models in Evaluating Appropriate Responses to Suicidal Ideation: Comparative Study (Preprint) DOI
Ryan McBain, Jonathan Cantor, Li Ang Zhang

и другие.

Опубликована: Окт. 23, 2024

BACKGROUND With suicide rates in the United States at an all-time high, individuals experiencing suicidal ideation are increasingly turning to large language models (LLMs) for guidance and support. OBJECTIVE The objective of this study was assess competency 3 widely used LLMs distinguish appropriate versus inappropriate responses when engaging who exhibit ideation. METHODS This observational, cross-sectional evaluated revised Suicidal Ideation Response Inventory (SIRI-2) generated by ChatGPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro. Data collection analyses were conducted July 2024. A common training module mental health professionals, SIRI-2 provides 24 hypothetical scenarios which a patient exhibits depressive symptoms ideation, followed two clinician responses. Clinician scored from –3 (highly inappropriate) +3 appropriate). All provided with standardized set instructions rate We compared LLM those expert suicidologists, conducting linear regression converting z scores identify outliers (z score&gt;1.96 or &lt;–1.96; <i>P</i>&lt;0.05). Furthermore, we final produced professionals prior studies. RESULTS rated as more than ratings suicidologists. item-level mean difference 0.86 ChatGPT (95% CI 0.61-1.12; <i>P</i>&lt;.001), 0.61 0.41-0.81; 0.73 0.35-1.11; <i>P</i>&lt;.001). In terms scores, 19% (9 48) Similarly, 11% (5 Additionally, 36% (17 score 45.7, roughly equivalent master’s level counselors 36.7, exceeding performance after intervention skills training. 54.5, untrained K-12 school staff. CONCLUSIONS Current versions major demonstrated upward bias their evaluations ideation; however, 2 performed exceeded professionals.

Язык: Английский

Процитировано

0