PICOT questions and search strategies formulation: A novel approach using artificial intelligence automation DOI Creative Commons
Lucija Gosak, Gregor Štiglic, Lisiane Pruinelli

и другие.

Journal of Nursing Scholarship, Год журнала: 2024, Номер unknown

Опубликована: Ноя. 24, 2024

Abstract Aim The aim of this study was to evaluate and compare artificial intelligence (AI)‐based large language models (LLMs) (ChatGPT‐3.5, Bing, Bard) with human‐based formulations in generating relevant clinical queries, using comprehensive methodological evaluations. Methods To interact the major LLMs ChatGPT‐3.5, Bing Chat, Google Bard, scripts prompts were designed formulate PICOT (population, intervention, comparison, outcome, time) questions search strategies. Quality responses assessed a descriptive approach independent assessment by two researchers. determine number hits, PubMed, Web Science, Cochrane Library, CINAHL Ultimate results imported separately, without restrictions, strings generated three an additional one expert. Hits from scenarios also exported for relevance evaluation. use single scenario chosen provide focused analysis. Cronbach's alpha intraclass correlation coefficient (ICC) calculated. Results In five different scenarios, ChatGPT‐3.5 11,859 1,376,854, Bard 16,583, expert 5919 hits. We then used first assess obtained results. human resulted 65.22% (56/105) articles. most accurate AI‐based LLM 70.79% (63/89), followed 21.05% (12/45), 13.29% (42/316) Based on evaluators, received highest score ( M = 48.50; SD 0.71). showed high level agreement between evaluators. Although lower percentage hits compared reflects nuanced evaluation criteria, where subjective prioritized contextual accuracy quality over mere relevance. Conclusion This provides valuable insights into ability LLMs, such as demonstrate significant potential augmenting workflows, improving query development, supporting However, findings highlight limitations that necessitate further refinement continued oversight. Clinical Relevance AI could assist nurses formulating offer support healthcare professionals structure enhancing strategies, thereby significantly increasing efficiency information retrieval.

Язык: Английский

Accuracy of latest large language models in answering multiple choice questions in dentistry: A comparative study DOI Creative Commons

Huy Cong Nguyen,

Hai Dang, Thuy Linh Nguyen

и другие.

PLoS ONE, Год журнала: 2025, Номер 20(1), С. e0317423 - e0317423

Опубликована: Янв. 29, 2025

This study aims to evaluate the performance of latest large language models (LLMs) in answering dental multiple choice questions (MCQs), including both text-based and image-based questions. A total 1490 MCQs from two board review books for United States National Board Dental Examination were selected. evaluated six LLMs as August 2024, ChatGPT 4.0 omni (OpenAI), Gemini Advanced 1.5 Pro (Google), Copilot with GPT-4 Turbo (Microsoft), Claude 3.5 Sonnet (Anthropic), Mistral Large 2 (Mistral AI), Llama 3.1 405b (Meta). χ2 tests performed determine whether there significant differences percentages correct answers among sample each discipline (p < 0.05). Significant observed percentage accurate across questions, (p<0.001). For sample, (85.5%), (84.0%), (83.8%) demonstrated highest accuracy, followed by (78.3%) (77.1%), (72.4%) exhibiting lowest. Newer versions demonstrate superior compared earlier versions. Copilot, Claude, achieved high accuracy on low capable handling limited clinicians students should prioritize most up-to-date when supporting their learning, clinical practice, research.

Язык: Английский

Процитировано

1

Performance of artificial intelligence on Turkish dental specialization exam: can ChatGPT-4.0 and gemini advanced achieve comparable results to humans? DOI Creative Commons
Soner Şişmanoğlu, Belen Şirinoğlu Çapan

BMC Medical Education, Год журнала: 2025, Номер 25(1)

Опубликована: Фев. 10, 2025

AI-powered chatbots have spread to various fields including dental education and clinical assistance treatment planning. The aim of this study is assess compare leading chatbot performances in specialization exam (DUS) administered Turkey it with the best performer that year. DUS questions for 2020 2021 were directed ChatGPT-4.0 Gemini Advanced individually. manually entered into their original form, Turkish. results obtained compared each other year's performers. Candidates who score at least 45 points on centralized are deemed passed eligible select preferred department institution. data was statistically analyzed using Pearson's chi-squared test (p < 0.05). received 83.3% correct response rate exam, while 65% rate. On 80.5% rate, whereas 60.2% outperformed both exams performed worse overall (for 2020: ChatGPT-4.0, 65,5 Advanced, 50.1; 2021: 65,6 48.6) when scores year (68.5 72.3 2021). This poor performance also includes basic sciences sections 0.001). Additionally, periodontology specialty which achieved results, lowest determined endodontics orthodontics. chatbots, namely by exceeding threshold 45. However, they still lagged behind top performers year, particularly sciences, score. exhibited lower some specialties such as

Язык: Английский

Процитировано

1

Performance of Artificial Intelligence Chatbots in Responding to Patient Queries Related to Traumatic Dental Injuries: A Comparative Study DOI
Yeliz Güven, Omer Tarik Ozdemir, Melis Yazır Kavan

и другие.

Dental Traumatology, Год журнала: 2024, Номер unknown

Опубликована: Ноя. 22, 2024

ABSTRACT Background/Aim Artificial intelligence (AI) chatbots have become increasingly prevalent in recent years as potential sources of online healthcare information for patients when making medical/dental decisions. This study assessed the readability, quality, and accuracy responses provided by three AI to questions related traumatic dental injuries (TDIs), either retrieved from popular question‐answer sites or manually created based on hypothetical case scenarios. Materials Methods A total 59 injury queries were directed at ChatGPT 3.5, 4.0, Google Gemini. Readability was evaluated using Flesch Reading Ease (FRE) Flesch–Kincaid Grade Level (FKGL) scores. To assess response quality accuracy, DISCERN tool, Global Quality Score (GQS), misinformation scores used. The understandability actionability analyzed Patient Education Assessment Tool Printed (PEMAT‐P) tool. Statistical analysis included Kruskal–Wallis with Dunn's post hoc test non‐normal variables, one‐way ANOVA Tukey's normal variables ( p < 0.05). Results mean FKGL FRE Gemini 11.2 49.25, 11.8 46.42, 10.1 51.91, respectively, indicating that difficult read required a college‐level reading ability. 3.5 had lowest PEMAT‐P among 0.001). 4.0 rated higher (GQS score 5) compared Conclusions In this study, although widely used, some misleading inaccurate about TDIs. contrast, generated more accurate comprehensive answers, them reliable auxiliary sources. However, complex issues like TDIs, no chatbot can replace dentist diagnosis, treatment, follow‐up care.

Язык: Английский

Процитировано

5

Performance of Four AI Chatbots in Answering Endodontic Questions DOI
Saleem Abdulrab, Hisham Abada, Mohammed Mashyakhy

и другие.

Journal of Endodontics, Год журнала: 2025, Номер unknown

Опубликована: Янв. 1, 2025

Язык: Английский

Процитировано

0

Large Language Models in peri-implant disease: How well do they perform? DOI
Vasiliki P. Koidou, Georgios S. Chatzopoulos, Lazaros Tsalikis

и другие.

Journal of Prosthetic Dentistry, Год журнала: 2025, Номер unknown

Опубликована: Март 1, 2025

Язык: Английский

Процитировано

0

Bridging the Gap in Neonatal Care: Evaluating AI Chatbots for Chronic Neonatal Lung Disease and Home Oxygen Therapy Management DOI Open Access

Weiqin Liu,

Hong Wei,

Lingling Xiang

и другие.

Pediatric Pulmonology, Год журнала: 2025, Номер 60(3)

Опубликована: Март 1, 2025

To evaluate the accuracy and comprehensiveness of eight free, publicly available large language model (LLM) chatbots in addressing common questions related to chronic neonatal lung disease (CNLD) home oxygen therapy (HOT). Twenty CNLD HOT-related were curated across nine domains. Responses from ChatGPT-3.5, Google Bard, Bing Chat, Claude 3.5 Sonnet, ERNIE Bot 3.5, GLM-4 generated evaluated by three experienced neonatologists using Likert scales for comprehensiveness. Updated LLM models (ChatGPT-4o mini Gemini 2.0 Flash Experimental) incorporated assess rapid technological advancement. Statistical analyses included ANOVA, Kruskal-Wallis tests, intraclass correlation coefficients. Chat Sonnet demonstrated superior performance, with highest mean scores (5.78 ± 0.48 5.75 0.54, respectively) competence (2.65 0.58 2.80 0.41, respectively). In subsequent testing, Experimental ChatGPT-4o achieved comparable high performance. Performance varied domains, all excelling "equipment safety protocols" "caregiver support." showed self-correction capabilities when prompted. LLMs promise accurate CNLD/HOT information. However, performance variability risk misinformation necessitate expert oversight continued refinement before widespread clinical implementation.

Язык: Английский

Процитировано

0

Chatbots for Conducting Systematic Reviews in Pediatric Dentistry DOI
Rata Rokhshad,

Fateme Doost Mohammad,

Mahsa Nomani

и другие.

Journal of Dentistry, Год журнала: 2025, Номер unknown, С. 105733 - 105733

Опубликована: Апрель 1, 2025

Язык: Английский

Процитировано

0

Accuracy of Large Language Models for Infective Endocarditis Prophylaxis in Dental Procedures DOI Creative Commons
Paak Rewthamrongsris,

Jirayu Burapacheep,

Vorapat Trachoo

и другие.

International Dental Journal, Год журнала: 2024, Номер unknown

Опубликована: Окт. 1, 2024

Infective endocarditis (IE) is a serious, life-threatening condition requiring antibiotic prophylaxis for high-risk individuals undergoing invasive dental procedures. As LLMs are rapidly adopted by professionals their efficiency and accessibility, assessing accuracy in answering critical questions about IE prevention crucial.

Язык: Английский

Процитировано

4

Efficacy and Empathy of AI Chatbots in Answering Frequently Asked Questions on Oral Oncology DOI
Rata Rokhshad, Zaid H. Khoury, Hossein Mohammad‐Rahimi

и другие.

Oral Surgery Oral Medicine Oral Pathology and Oral Radiology, Год журнала: 2025, Номер unknown

Опубликована: Янв. 1, 2025

Язык: Английский

Процитировано

0

Evaluation of Information Provided by ChatGPT Versions on Traumatic Dental Injuries for Dental Students and Professionals DOI Creative Commons
Zeynep Öztürk, Cenkhan Bal,

Beyza Nur Çelikkaya

и другие.

Dental Traumatology, Год журнала: 2025, Номер unknown

Опубликована: Янв. 23, 2025

ABSTRACT Background/Aim The use of AI‐driven chatbots for accessing medical information is increasingly popular among educators and students. This study aims to assess two different ChatGPT models—ChatGPT 3.5 4.0—regarding their responses queries about traumatic dental injuries, specifically students professionals. Material Methods A total 40 questions were prepared, divided equally between those concerning definitions diagnosis on treatment follow‐up. from both versions evaluated several criteria: quality, reliability, similarity, readability. These evaluations conducted using the Global Quality Scale (GQS), Reliability Scoring System (adapted DISCERN), Flesch Reading Ease Score (FRES), Flesch–Kincaid Grade Level (FKRGL), Similarity Index. Normality was checked with Shapiro–Wilk test, variance homogeneity assessed Levene test. Results analysis revealed that provided more original compared 4.0. According FRES scores, challenging read, having a higher score (39.732 ± 9.713) than 4.0 (34.813 9.356), indicating relatively better There no significant differences regarding GQS, DISCERN, FKRGL scores. However, in definition section, had statistically quality 3.5. In contrast, answers follow‐up section. For 4.0, readability similarity rates section No observed 3.5's FRES, FKRGL, index measurements by topic. Conclusions Both offer high‐quality information, though they present challenges reliability. They are valuable resources professionals but should be used conjunction additional sources comprehensive understanding.

Язык: Английский

Процитировано

0