Transforming healthcare with chatbots: Uses and applications—A scoping review DOI Creative Commons
Marina Gutiérrez,

David Cantarero-Prieto,

Daniel Coca

и другие.

Digital Health, Год журнала: 2025, Номер 11

Опубликована: Янв. 1, 2025

Purpose The COVID-19 pandemic has intensified the demand and use of healthcare resources, prompting search for efficient solutions under budgetary constraints. In this context, increasing artificial intelligence telemedicine emerged as a key strategy to optimize delivery resources. Consequently, chatbots have innovative tools in various fields, such mental health patient monitoring, offering therapeutic conversations early interventions. This systematic review aims explore current state sector, meticulously evaluating their effectiveness, practical applications, potential benefits. Methods was conducted following PRISMA guidelines, utilizing three databases, including PubMed, Web Science, Scopus, identify relevant studies on cost over past 5 years. Results Several articles were identified through database ( n = 31). chatbot interventions categorized by similar types. reviewed highlight diverse applications healthcare, support, medical information, appointment management, education, lifestyle changes, demonstrating significant across these areas. Conclusion Furthermore, there are challenges regarding implementation chatbots, compatibility with other systems, ethical considerations that may arise different settings. Addressing issues will be essential maximize benefits mitigate risks, ensure equitable access innovations.

Язык: Английский

The application of large language models in medicine: A scoping review DOI Creative Commons
Xiangbin Meng,

Xiangyu Yan,

Kuo Zhang

и другие.

iScience, Год журнала: 2024, Номер 27(5), С. 109713 - 109713

Опубликована: Апрель 23, 2024

This study systematically reviewed the application of large language models (LLMs) in medicine, analyzing 550 selected studies from a vast literature search. LLMs like ChatGPT transformed healthcare by enhancing diagnostics, medical writing, education, and project management. They assisted drafting documents, creating training simulations, streamlining research processes. Despite their growing utility diagnosis improving doctor-patient communication, challenges persisted, including limitations contextual understanding risk over-reliance. The surge LLM-related indicated focus on patient but highlighted need for careful integration, considering validation, ethical concerns, balance with traditional practice. Future directions suggested multimodal LLMs, deeper algorithmic understanding, ensuring responsible, effective use healthcare.

Язык: Английский

Процитировано

60

Bias in medical AI: Implications for clinical decision-making DOI Creative Commons
James M. Cross,

Michael A. Choma,

John A. Onofrey

и другие.

PLOS Digital Health, Год журнала: 2024, Номер 3(11), С. e0000651 - e0000651

Опубликована: Ноя. 7, 2024

Biases in medical artificial intelligence (AI) arise and compound throughout the AI lifecycle. These biases can have significant clinical consequences, especially applications that involve decision-making. Left unaddressed, biased lead to substandard decisions perpetuation exacerbation of longstanding healthcare disparities. We discuss potential at different stages development pipeline how they affect algorithms Bias occur data features labels, model evaluation, deployment, publication. Insufficient sample sizes for certain patient groups result suboptimal performance, algorithm underestimation, clinically unmeaningful predictions. Missing findings also produce behavior, including capturable but nonrandomly missing data, such as diagnosis codes, is not usually or easily captured, social determinants health. Expertly annotated labels used train supervised learning models may reflect implicit cognitive care practices. Overreliance on performance metrics during obscure bias diminish a model's utility. When applied outside training cohort, deteriorate from previous validation do so differentially across subgroups. How end users interact with deployed solutions introduce bias. Finally, where are developed published, by whom, impacts trajectories priorities future development. Solutions mitigate must be implemented care, which include collection large diverse sets, statistical debiasing methods, thorough emphasis interpretability, standardized reporting transparency requirements. Prior real-world implementation settings, rigorous through trials critical demonstrate unbiased application. Addressing crucial ensuring all patients benefit equitably AI.

Язык: Английский

Процитировано

22

The sociolinguistic foundations of language modeling DOI Creative Commons
Jack Grieve,

Sara Bartl,

Matteo Fuoli

и другие.

Frontiers in Artificial Intelligence, Год журнала: 2025, Номер 7

Опубликована: Янв. 13, 2025

In this article, we introduce a sociolinguistic perspective on language modeling. We claim that models in general are inherently modeling varieties of , and consider how insight can inform the development deployment models. begin by presenting technical definition concept variety as developed sociolinguistics. then discuss could help us better understand five basic challenges modeling: social bias, domain adaptation, alignment, change scale . argue to maximize performance societal value it is important carefully compile training corpora accurately represent specific being modeled, drawing theories, methods, descriptions from field

Язык: Английский

Процитировано

2

Benchmarking the Hallucination Tendency of Google Gemini and Moonshot Kimi DOI Open Access

Ruoxi Shan,

Qiang Ming,

Guang Hong

и другие.

Опубликована: Май 22, 2024

To evaluate the hallucination tendencies of state-of-the-art language models is crucial for improving their reliability and applicability across various domains. This article presents a comprehensive evaluation Google Gemini Kimi using HaluEval benchmark, focusing on key performance metrics such as accuracy, relevance, coherence, rate. demonstrated superior performance, particularly in maintaining low rates high contextual while Kimi, though robust, showed areas needing further refinement. The study highlights importance advanced training techniques optimization enhancing model efficiency accuracy. Practical recommendations future development are provided, emphasizing need continuous improvement rigorous to achieve reliable efficient models.

Язык: Английский

Процитировано

9

Measuring the Visual Hallucination in ChatGPT on Visually Deceptive Images DOI Open Access

Linzhi Ping,

Yue Gu,

Liefeng Feng

и другие.

Опубликована: Май 28, 2024

The evaluation of visual hallucinations in multimodal AI models is novel and significant because it addresses a critical gap understanding how systems interpret deceptive inputs. study systematically assessed ChatGPT's performance on synthetic dataset visually non-deceptive images, employing both quantitative qualitative analysis. Results revealed that while ChatGPT achieved high accuracy standard recognition tasks, its diminished when faced with highlighting areas for further improvement. analysis provided insights into the model's underlying mechanisms, such as extensive pretraining sophisticated integration capabilities, which contribute to robustness against deceptions. study's findings have important implications development more reliable robust technologies, offering benchmark future evaluations practical guidelines enhancing systems.

Язык: Английский

Процитировано

9

Evaluating Diagnostic Accuracy and Treatment Efficacy in Mental Health: A Comparative Analysis of Large Language Model Tools and Mental Health Professionals DOI Creative Commons
Inbar Levkovich

European Journal of Investigation in Health Psychology and Education, Год журнала: 2025, Номер 15(1), С. 9 - 9

Опубликована: Янв. 18, 2025

Large language models (LLMs) offer promising possibilities in mental health, yet their ability to assess disorders and recommend treatments remains underexplored. This quantitative cross-sectional study evaluated four LLMs (Gemini 2.0 Flash Experimental), Claude (Claude 3.5 Sonnet), ChatGPT-3.5, ChatGPT-4) using text vignettes representing conditions such as depression, suicidal ideation, early chronic schizophrenia, social phobia, PTSD. Each model’s diagnostic accuracy, treatment recommendations, predicted outcomes were compared with norms established by health professionals. Findings indicated that for certain conditions, including depression PTSD, like ChatGPT-4 achieved higher accuracy human However, more complex cases, LLM performance varied, achieving only 55% while other professionals performed better. tended suggest a broader range of proactive treatments, whereas recommended targeted psychiatric consultations specific medications. In terms outcome predictions, generally optimistic regarding full recovery, especially treatment, lower recovery rates partial rates, particularly untreated cases. While range, conservative highlight the need professional oversight. provide valuable support diagnostics planning but cannot replace discretion.

Язык: Английский

Процитировано

1

Urinary Bladder Acute Inflammations and Nephritis of the Renal Pelvis: Diagnosis Using Fine-Tuned Large Language Models DOI Open Access

Mohammad Khaleel Sallam Ma’aitah,

Abdulkader Helwan, Abdelrahman Radwan

и другие.

Journal of Personalized Medicine, Год журнала: 2025, Номер 15(2), С. 45 - 45

Опубликована: Янв. 24, 2025

Background: Large language models (LLMs) have seen a significant boost recently in the field of natural processing (NLP) due to their capabilities analyzing words. These autoregressive prove robust classification tasks where texts need be analyzed and classified. Objectives: In this paper, we explore power base LLMs such as Generative Pre-trained Transformer 2 (GPT-2), Bidirectional Encoder Representations from Transformers (BERT), Distill-BERT, TinyBERT diagnosing acute inflammations urinary bladder nephritis renal pelvis. Materials Methods: were trained tested using supervised fine-tuning (SFT) on dataset 120 examples that include symptoms may indicate occurrence these two conditions. Results: By employing method carefully crafted prompts present data, demonstrate feasibility minimal training data achieve reasonable diagnostic, with overall testing accuracies 100%, 94%, 79%, for GPT-2, BERT, TinyBERT, respectively.

Язык: Английский

Процитировано

1

Self-supervised learning for graph-structured data in healthcare applications: A comprehensive review DOI
Safa Ben Atitallah, Chaima Ben Rabah, Maha Driss

и другие.

Computers in Biology and Medicine, Год журнала: 2025, Номер 188, С. 109874 - 109874

Опубликована: Фев. 24, 2025

Язык: Английский

Процитировано

1

Management of Dupuytren’s Disease: A Multi-Centric Comparative Analysis Between Experienced Hand Surgeons Versus Artificial Intelligence DOI Creative Commons
Ishith Seth, Gianluca Marcaccini, Kaiyang Lim

и другие.

Diagnostics, Год журнала: 2025, Номер 15(5), С. 587 - 587

Опубликована: Фев. 28, 2025

Background: Dupuytren's fibroproliferative disease affecting the hand's palmar fascia leads to progressive finger contractures and functional limitations. Management of this condition relies heavily on expertise hand surgeons, who tailor interventions based clinical assessment. With growing interest in artificial intelligence (AI) medical decision-making, study aims evaluate feasibility integrating AI into management by comparing AI-generated recommendations with those expert surgeons. Methods: This multicentric comparative involved three experienced surgeons five systems (ChatGPT, Gemini, Perplexity, DeepSeek, Copilot). Twenty-two standardized prompts representing various scenarios were used assess decision-making. Surgeons provided recommendations, which analyzed for concordance, rationale, predicted outcomes. Key metrics included union accuracy, surgeon agreement, precision, recall, F1 scores. The also evaluated performance unanimous versus non-unanimous cases inter-AI agreements. Results: Gemini ChatGPT demonstrated highest accuracy (86.4% 81.8%, respectively), while Copilot showed lowest (40.9%). Surgeon agreement was (45.5%) (42.4%). performed better (accuracy up 92.0%) than as low 35.0%). Inter-AI agreements ranged from 75.0% (ChatGPT-Gemini) 48.0% (DeepSeek-Copilot). Precision, scores consistently higher other systems. Conclusions: systems, particularly ChatGPT, show promise aligning surgical especially straightforward cases. However, significant variability exists, complex scenarios. should be viewed complementary judgment, requiring further refinement validation integration practice.

Язык: Английский

Процитировано

1

A systematic review of large language model (LLM) evaluations in clinical medicine DOI Creative Commons
Sina Shool,

Sara Adimi,

Reza Saboori Amleshi

и другие.

BMC Medical Informatics and Decision Making, Год журнала: 2025, Номер 25(1)

Опубликована: Март 7, 2025

Large Language Models (LLMs), advanced AI tools based on transformer architectures, demonstrate significant potential in clinical medicine by enhancing decision support, diagnostics, and medical education. However, their integration into workflows requires rigorous evaluation to ensure reliability, safety, ethical alignment. This systematic review examines the parameters methodologies applied LLMs medicine, highlighting capabilities, limitations, application trends. A comprehensive of literature was conducted across PubMed, Scopus, Web Science, IEEE Xplore, arXiv databases, encompassing both peer-reviewed preprint studies. Studies were screened against predefined inclusion exclusion criteria identify original research evaluating LLM performance contexts. The results reveal a growing interest leveraging settings, with 761 studies meeting criteria. While general-domain LLMs, particularly ChatGPT GPT-4, dominated evaluations (93.55%), medical-domain accounted for only 6.45%. Accuracy emerged as most commonly assessed parameter (21.78%). Despite these advancements, evidence base highlights certain limitations biases included studies, emphasizing need careful interpretation robust frameworks. exponential growth underscores transformative healthcare. addressing challenges such risks, variability, underrepresentation critical specialties will be essential. Future efforts should prioritize standardized frameworks safe, effective, equitable practice.

Язык: Английский

Процитировано

1