Are artificial intelligence based chatbots reliable sources for patients regarding orthodontics? DOI Open Access
Tuğba Haliloğlu Özkan, Ahmet Hüseyin Acar, Enes Özkan

et al.

APOS Trends in Orthodontics, Journal Year: 2025, Volume and Issue: 0, P. 1 - 6

Published: Jan. 6, 2025

Objectives: The objective of this study was to conduct a comprehensive and patient-centered evaluation chatbot responses within the field orthodontics, comparing three prominent platforms: ChatGPT-4, Microsoft Copilot, Google Gemini. Material Methods: Twenty orthodontic-related queries were presented Gemini by ten orthodontic experts. To assess accuracy completeness responses, Likert scale (LS) employed, while clarity evaluated using Global Quality Scale (GQS). Statistical analyses included One-way analysis variance post-hoc Tukey tests data, Pearson correlation test used determine relationship between variables. Results: results indicated that ChatGPT-4 (1.69 ± 0.10) Copilot (1.68 achieved significantly higher LS scores compared (2.27 0.53) ( P < 0.05). However, GQS scores, which 4.01 0.31 for 3.92 0.60 Gemini, 4.09 0.15 showed no significant differences among chatbots > Conclusion: While these generally handle basic well, they show in complex scenarios. outperform accurately addressing scenario-based questions, highlighting importance strong language comprehension, knowledge access, advanced algorithms. This underscores need continued improvements technology.

Language: Английский

Effectiveness of ChatGPT in clinical pharmacy and the role of artificial intelligence in medication therapy management DOI Creative Commons
Don Roosan,

Pauline Padua,

R.K. Rabeeha Khan

et al.

Journal of the American Pharmacists Association, Journal Year: 2023, Volume and Issue: 64(2), P. 422 - 428.e8

Published: Dec. 2, 2023

Abstract

Background

The use of artificial intelligence (AI) to optimize medication therapy management (MTM) in identifying drug interactions may potentially improve MTM efficiency. ChatGPT, an AI language model, be applied identify interventions by integrating patient and databases. ChatGPT has been shown effective other areas clinical medicine, from diagnosis management. However, ChatGPT's ability manage related activities is little known.

Objectives

To evaluate the effectiveness services simple, complex, very complex cases understand contributions MTM.

Methods

Two pharmacists rated validated difficulty complex. response was assessed based on 3 criteria: interactions, precision recommending alternatives, appropriateness devising plans. accuracy responses compared them actual answers for each complexity level.

Results

4.0 accurately solved 39 out (100 %) cases. successfully identified provided recommendations formulated general plans, but it did not recommend specific dosages. Results suggest can assist formulating plans overall

Conclusion

application potential enhance safety involvement, lower healthcare costs, providers interactions. Future utilize models such as care. future pharmacy profession will depend how field responds changing need care optimized automation.

Language: Английский

Citations

42

A Preliminary Checklist (METRICS) to Standardize the Design and Reporting of Studies on Generative Artificial Intelligence–Based Models in Health Care Education and Practice: Development Study Involving a Literature Review DOI Creative Commons
Malik Sallam, Muna Barakat, Mohammed Sallam

et al.

Interactive Journal of Medical Research, Journal Year: 2024, Volume and Issue: 13, P. e54704 - e54704

Published: Jan. 26, 2024

Background Adherence to evidence-based practice is indispensable in health care. Recently, the utility of generative artificial intelligence (AI) models care has been evaluated extensively. However, lack consensus guidelines on design and reporting findings these studies poses a challenge for interpretation synthesis evidence. Objective This study aimed develop preliminary checklist standardize AI-based education practice. Methods A literature review was conducted Scopus, PubMed, Google Scholar. Published records with “ChatGPT,” “Bing,” or “Bard” title were retrieved. Careful examination methodologies employed included identify common pertinent themes possible gaps reporting. panel discussion held establish unified thorough AI The finalized used evaluate by 2 independent raters. Cohen κ as method interrater reliability. Results final data set that formed basis theme identification analysis comprised total 34 records. 9 collectively referred METRICS (Model, Evaluation, Timing, Range/Randomization, Individual factors, Count, Specificity prompts language). Their details are follows: (1) Model its exact settings; (2) Evaluation approach generated content; (3) Timing testing model; (4) Transparency source; (5) Range tested topics; (6) Randomization selecting queries; (7) factors queries reliability; (8) Count executed test (9) language used. overall mean score 3.0 (SD 0.58). acceptable, range 0.558 0.962 (P<.001 items). With classification per item, highest average recorded “Model” followed “Specificity” while lowest scores “Randomization” item (classified suboptimal) “Individual factors” satisfactory). Conclusions can facilitate guiding researchers toward best practices results. highlight need standardized algorithms care, considering variability observed proposed could be helpful base universally accepted which swiftly evolving research topic.

Language: Английский

Citations

24

A Study on ChatGPT-4 as an Innovative Approach to Enhancing English as a Foreign Language Writing Learning DOI
Azzeddine Boudouaia, Samia Mouas, Bochra Kouider

et al.

Journal of Educational Computing Research, Journal Year: 2024, Volume and Issue: 62(6), P. 1509 - 1537

Published: April 17, 2024

The field of computer-assisted language learning has recently brought about a notable change in English as Foreign Language (EFL) writing. Starting from October 2022, students across different academic fields have increasingly depended on ChatGPT-4 helpful resource for addressing particular challenges EFL This study aimed to investigate the use and acceptance students’ To this end, an experiment was conducted with 76 undergraduate private school Algeria. participants were randomly allocated into two groups: experimental group (n = 37) control 39). Additionally, questionnaire administered. results showed that (EG) outperformed (CG). Besides, findings revealed EG post-test their pre-test scores. also substantial improvements EG’s views perceived usefulness, ease use, attitudes, behavioral intention. According results, helped boost students' writing skills, which ultimately led acceptance. Students appear particularly interested because its potential usefulness putting what they learn practice. Some suggestions recommendations provided.

Language: Английский

Citations

21

Comparison of Large Language Models in Answering Immuno-Oncology Questions: A Cross-Sectional Study DOI Creative Commons
Giovanni Maria Iannantuono, Dara Bracken-Clarke, Fatima Karzai

et al.

The Oncologist, Journal Year: 2024, Volume and Issue: 29(5), P. 407 - 414

Published: Feb. 3, 2024

Abstract Background The capability of large language models (LLMs) to understand and generate human-readable text has prompted the investigation their potential as educational management tools for patients with cancer healthcare providers. Materials Methods We conducted a cross-sectional study aimed at evaluating ability ChatGPT-4, ChatGPT-3.5, Google Bard answer questions related 4 domains immuno-oncology (Mechanisms, Indications, Toxicities, Prognosis). generated 60 open-ended (15 each section). Questions were manually submitted LLMs, responses collected on June 30, 2023. Two reviewers evaluated answers independently. Results ChatGPT-4 ChatGPT-3.5 answered all questions, whereas only 53.3% (P &lt; .0001). number reproducible was higher (95%) ChatGPT3.5 (88.3%) than (50%) In terms accuracy, deemed fully correct 75.4%, 58.5%, 43.8% Bard, respectively = .03). Furthermore, highly relevant 71.9%, 77.4%, .04). Regarding readability, readable (98.1%) (100%) compared (87.5%) .02). Conclusion are potentially powerful in immuno-oncology, demonstrated relatively poorer performance. However, risk inaccuracy or incompleteness evident 3 highlighting importance expert-driven verification outputs returned by these technologies.

Language: Английский

Citations

18

Personalized Medicine in Urolithiasis: AI Chatbot-Assisted Dietary Management of Oxalate for Kidney Stone Prevention DOI Open Access
Noppawit Aiumtrakul, Charat Thongprayoon, Chinnawat Arayangkool

et al.

Journal of Personalized Medicine, Journal Year: 2024, Volume and Issue: 14(1), P. 107 - 107

Published: Jan. 18, 2024

Accurate information regarding oxalate levels in foods is essential for managing patients with hyperoxaluria, nephropathy, or those susceptible to calcium stones. This study aimed assess the reliability of chatbots categorizing based on their content. We assessed accuracy ChatGPT-3.5, ChatGPT-4, Bard AI, and Bing Chat classify dietary content per serving into low (<5 mg), moderate (5–8 high (>8 mg) categories. A total 539 food items were processed through each chatbot. The was compared between stratified by AI had highest 84%, followed (60%), GPT-4 (52%), GPT-3.5 (49%) (p < 0.001). There a significant pairwise difference chatbots, except = 0.30). all decreased higher degree categories but remained having accuracy, regardless considerable variation classifying consistently showed Chat, GPT-4, GPT-3.5. These results underline potential management at-risk patient groups need enhancements chatbot algorithms clinical accuracy.

Language: Английский

Citations

17

Large Language Models for Chatbot Health Advice Studies DOI Creative Commons
Bright Huo,

Amy Boyle,

Nana Marfo

et al.

JAMA Network Open, Journal Year: 2025, Volume and Issue: 8(2), P. e2457879 - e2457879

Published: Feb. 4, 2025

Importance There is much interest in the clinical integration of large language models (LLMs) health care. Many studies have assessed ability LLMs to provide advice, but quality their reporting uncertain. Objective To perform a systematic review examine variability among peer-reviewed evaluating performance generative artificial intelligence (AI)–driven chatbots for summarizing evidence and providing advice inform development Chatbot Assessment Reporting Tool (CHART). Evidence Review A search MEDLINE via Ovid, Embase Elsevier, Web Science from inception October 27, 2023, was conducted with help sciences librarian yield 7752 articles. Two reviewers screened articles by title abstract followed full-text identify primary accuracy AI-driven (chatbot studies). then performed data extraction 137 eligible studies. Findings total were included. Studies examined topics surgery (55 [40.1%]), medicine (51 [37.2%]), care (13 [9.5%]). focused on treatment (91 [66.4%]), diagnosis (60 [43.8%]), or disease prevention (29 [21.2%]). Most (136 [99.3%]) evaluated inaccessible, closed-source did not enough information version LLM under evaluation. All lacked sufficient description characteristics, including temperature, token length, fine-tuning availability, layers, other details. describe prompt engineering phase study. The date querying reported 54 (39.4%) (89 [65.0%]) used subjective means define successful chatbot, while less than one-third addressed ethical, regulatory, patient safety implications LLMs. Conclusions Relevance In this chatbot studies, heterogeneous may CHART standards. Ethical, considerations are crucial as grows

Language: Английский

Citations

6

Chat GPT vs. Clinical Decision Support Systems in the Analysis of Drug–Drug Interactions DOI Creative Commons
Thorsten Bischof, Valentin al Jalali, Markus Zeitlinger

et al.

Clinical Pharmacology & Therapeutics, Journal Year: 2025, Volume and Issue: unknown

Published: Feb. 11, 2025

The current standard method for the analysis of potential drug–drug interactions (pDDIs) is time‐consuming and includes use multiple clinical decision support systems (CDSSs) interpretation by healthcare professionals. With emergence large language models developed with artificial intelligence, an interesting alternative arose. This retrospective study included 30 patients polypharmacy, who underwent a pDDI between October 2022 August 2023, compared performance Chat GPT established CDSSs (MediQ®, Lexicomp®, Micromedex®) in pDDIs. A multidisciplinary team interpreted obtained results decided upon relevance assigned severity grades using three categories: (i) contraindicated, (ii) severe, (iii) moderate. expert review identified total 280 clinically relevant pDDIs (3 contraindications, 13 264 moderate) CDSSs, 80 (2 5 73 GPT. almost entirely neglected risk to QTc prolongation (85 vs. 8), which could also not be sufficiently improved specific prompt. To assess consistency provided GPT, we repeated each query found inconsistent 90% cases. In contrast, acceptable comprehensible recommendations questions on side effects. identification cannot recommended currently, because were detected, there obvious errors inconsistent. However, if these limitations are addressed accordingly, it promising platform future.

Language: Английский

Citations

2

Pilot Testing of a Tool to Standardize the Assessment of the Quality of Health Information Generated by Artificial Intelligence-Based Models DOI Open Access
Malik Sallam, Muna Barakat, Mohammed Sallam

et al.

Cureus, Journal Year: 2023, Volume and Issue: unknown

Published: Nov. 24, 2023

Background Artificial intelligence (AI)-based conversational models, such as Chat Generative Pre-trained Transformer (ChatGPT), Microsoft Bing, and Google Bard, have emerged valuable sources of health information for lay individuals. However, the accuracy provided by these AI models remains a significant concern. This pilot study aimed to test new tool with key themes inclusion follows: Completeness content, Lack false in Evidence supporting Appropriateness Relevance, referred "CLEAR", designed assess quality delivered AI-based models. Methods Tool development involved literature review on quality, followed initial establishment CLEAR tool, which comprised five items that following: completeness, lack information, evidence support, appropriateness, relevance. Each item was scored five-point Likert scale from excellent poor. Content validity checked expert review. Pilot testing 32 healthcare professionals using content eight different topics deliberately varying qualities. The internal consistency Cronbach's alpha (α). Feedback resulted language modifications improve clarity items. final used generated four distinct topics. were ChatGPT 3.5, 4, two independent raters Cohen's kappa (κ) inter-rater agreement. Results were: (1) Is sufficient?; (2) accurate?; (3) evidence-based?; (4) clear, concise, easy understand?; (5) free irrelevant information? revealed acceptable α range 0.669-0.981. use yielded following average scores: Bing (mean=24.4±0.42), ChatGPT-4 (mean=23.6±0.96), Bard (mean=21.2±1.79), ChatGPT-3.5 (mean=20.6±5.20). agreement Cohen κ values: (κ=0.875, P<.001), (κ=0.780, (κ=0.348, P=.037), (κ=.749, P<.001). Conclusions is brief yet helpful can aid standardizing Future studies are recommended validate utility assessment AI-generated health-related larger sample across various complex

Language: Английский

Citations

38

Comparison of the Audiological Knowledge of Three Chatbots: ChatGPT, Bing Chat, and Bard DOI
W. Wiktor Jędrzejczak, Krzysztof Kochanek

Audiology and Neurotology, Journal Year: 2024, Volume and Issue: unknown, P. 1 - 7

Published: May 6, 2024

<b><i>Introduction:</i></b> The purpose of this study was to evaluate three chatbots – OpenAI ChatGPT, Microsoft Bing Chat (currently Copilot), and Google Bard Gemini) in terms their responses a defined set audiological questions. <b><i>Methods:</i></b> Each chatbot presented with the same 10 authors rated on Likert scale ranging from 1 5. Additional features, such as number inaccuracies or errors provision references, were also examined. <b><i>Results:</i></b> Most given by all satisfactory better. However, generated at least few inaccuracies. ChatGPT achieved highest overall score, while worst. only unable provide response one that did not information about its sources. <b><i>Conclusions:</i></b> Chatbots are an intriguing tool can be used access basic specialized area like audiology. Nevertheless, needs careful, correct is infrequently mixed hard pick up unless user well versed field.

Language: Английский

Citations

13

Search Engines and Generative Artificial Intelligence Integration: Public Health Risks and Recommendations to Safeguard Consumers Online DOI Creative Commons
Amir Reza Ashraf, Tim K. Mackey, András Fittler

et al.

JMIR Public Health and Surveillance, Journal Year: 2024, Volume and Issue: 10, P. e53086 - e53086

Published: Jan. 4, 2024

The online pharmacy market is growing, with legitimate pharmacies offering advantages such as convenience and accessibility. However, this increased demand has attracted malicious actors into space, leading to the proliferation of illegal vendors that use deceptive techniques rank higher in search results pose serious public health risks by dispensing substandard or falsified medicines. Search engine providers have started integrating generative artificial intelligence (AI) interfaces, which could revolutionize delivering more personalized through a user-friendly experience. improper integration these new technologies carries potential further exacerbate posed illicit inadvertently directing users vendors.

Language: Английский

Citations

10