Research Square (Research Square), Journal Year: 2025, Volume and Issue: unknown
Published: April 16, 2025
Language: Английский
Research Square (Research Square), Journal Year: 2025, Volume and Issue: unknown
Published: April 16, 2025
Language: Английский
Journal of Evaluation in Clinical Practice, Journal Year: 2024, Volume and Issue: 30(8), P. 1556 - 1564
Published: July 3, 2024
Artificial Intelligence (AI) large language models (LLM) are tools capable of generating human-like text responses to user queries across topics. The use these in various medical contexts is currently being studied. However, the performance and content quality have not been evaluated specific fields.
Language: Английский
Citations
5Cureus, Journal Year: 2024, Volume and Issue: unknown
Published: Aug. 28, 2024
Purpose Artificial intelligence (AI) has rapidly gained popularity with the growth of ChatGPT (OpenAI, San Francisco, USA) and other large-language model chatbots, these programs have tremendous potential to impact medicine. One important area consequence in medicine public health is that patients may use search answers medical questions. Despite increased utilization AI chatbots by public, there little research assess reliability alternative when queried for information. This study seeks elucidate accuracy readability answering patient questions regarding urology. As vasectomy one most common urologic procedures, this investigates AI-generated responses frequently asked vasectomy-related For study, five popular free-to-access platforms were utilized undertake investigation. Methods Fifteen individually from November-December 2023: USA), Bard (Google Inc., Mountainview, Bing (Microsoft, Redmond, Perplexity (Perplexity Claude (Anthropic, USA). Responses each platform graded two attending urologists, urology faculty, urological resident physician using a Likert (1-6) scale: (1-completely inaccurate, 6-completely accurate) based on comparison existing American Urological Association guidelines. Flesch-Kincaid Grade levels (FKGL) Flesch Reading Ease scores (FRES) (1-100) calculated response. To differences Likert, FRES, FKGL, Kruskal-Wallis tests performed GraphPad Prism V10.1.0 (GraphPad, Diego, Alpha set at 0.05. Results Analysis shows provided accurate across an average score 5.04 scale. Subsequently, Microsoft (4.91), Anthropic (4.65), Google (4.43), (4.41) followed. All found score, average, higher than 4.41 corresponding least "somewhat accurate." received highest (49.67) lowest level (10.1) compared chatbots. scored 46.7 FRES 10.55 FKGL. 45.57 11.56 36.4 13.29 had 30.4 FKGL 14.2. Conclusion medicine, specifically urology, it helps determine whether can be reliable sources freely available able achieve accurate" 6-point In terms readability, all less 50 10th-grade level. small-scale several significant identified between chatbot. However, no among their accuracies. Thus, our suggests major perform similarly ability correct but differ ease being comprehended general public.
Language: Английский
Citations
4BMC Oral Health, Journal Year: 2025, Volume and Issue: 25(1)
Published: April 15, 2025
Artificial intelligence (AI) chatbots are increasingly used in healthcare to address patient questions by providing personalized responses. Evaluating their performance is essential ensure reliability. This study aimed assess the of three AI responding frequently asked (FAQs) patients regarding dental prostheses. Thirty-one were collected from accredited organizations' websites and "People Also Ask" feature Google, focusing on removable fixed prosthodontics. Two board-certified prosthodontists evaluated response quality using modified Global Quality Score (GQS) a 5-point Likert scale. Inter-examiner agreement was assessed weighted kappa. Readability measured Flesch-Kincaid Grade Level (FKGL) Flesch Reading Ease (FRE) indices. Statistical analyses performed repeated measures ANOVA Friedman test, with Bonferroni correction for pairwise comparisons (α = 0.05). The inter-examiner good. Among chatbots, Google Gemini had highest score (4.58 ± 0.50), significantly outperforming Microsoft Copilot (3.87 0.89) (P =.004). analysis showed ChatGPT (10.45 1.26) produced more complex responses compared (7.82 1.19) (8.38 1.59) <.001). FRE scores indicated that ChatGPT's categorized as fairly difficult (53.05 7.16), while Gemini's plain English (64.94 7.29), significant difference between them show great potential answering inquiries about However, improvements needed enhance effectiveness education tools.
Language: Английский
Citations
0BMC Oral Health, Journal Year: 2025, Volume and Issue: 25(1)
Published: Jan. 11, 2025
Language: Английский
Citations
0Transplantation Proceedings, Journal Year: 2025, Volume and Issue: unknown
Published: Jan. 1, 2025
Language: Английский
Citations
0Dental Traumatology, Journal Year: 2025, Volume and Issue: unknown
Published: Jan. 24, 2025
ABSTRACT Background This study assessed the accuracy and consistency of responses provided by six Artificial Intelligence (AI) applications, ChatGPT version 3.5 (OpenAI), 4 4.0 Perplexity (Perplexity.AI), Gemini (Google), Copilot (Bing), to questions related emergency management avulsed teeth. Materials Methods Two pediatric dentists developed 18 true or false regarding dental avulsion asked public chatbots for 3 days. The were recorded compared with correct answers. SPSS program was used calculate obtained accuracies their consistency. Results achieved highest rate 95.6% over entire time frame, while (Perplexity.AI) had lowest 67.2%. (OpenAI) only AI that perfect agreement real answers, except at noon on day 1. showed weakest (6 times). Conclusions With exception ChatGPT's paid version, 4.0, do not seem ready use as main resource in managing teeth during emergencies. It might prove beneficial incorporate International Association Dental Traumatology (IADT) guidelines chatbot databases, enhancing
Language: Английский
Citations
0Techniques in Coloproctology, Journal Year: 2025, Volume and Issue: 29(1)
Published: Jan. 26, 2025
Language: Английский
Citations
0Cureus, Journal Year: 2025, Volume and Issue: unknown
Published: Jan. 31, 2025
The rise of artificial intelligence (AI), including generative chatbots like ChatGPT (OpenAI, San Francisco, CA, USA), has revolutionized many fields, healthcare. Patients have gained the ability to prompt generate purportedly accurate and individualized healthcare content. This study analyzed readability quality answers Achilles tendon rupture questions from six AI evaluate distinguish their potential as patient education resources. models used were 3.5, 4, Gemini 1.0 (previously Bard; Google, Mountain View, 1.5 Pro, Claude (Anthropic, USA) Grok (xAI, Palo Alto, without prior prompting. Each was asked 10 common about rupture, determined by five orthopaedic surgeons. responses measured using Flesch-Kincaid Reading Grade Level, Gunning Fog, SMOG (Simple Measure Gobbledygook). response subsequently graded DISCERN criteria blinded generated statistically significant differences in ease (closest average American reading level) than Claude. Additionally, mean scores demonstrated significantly higher (63.0±5.1) 4 (63.8±6.2) 3.5 (53.8±3.8), (55.0±3.8), (54.2±4.8). However, overall (question 16, DISCERN) each model averaged at an above-average level (range, 3.4-4.4). Our results indicate that can potentially serve resources alongside physicians. Although some lacked sufficient content, performed above quality. With lowest highest scores, outperformed ChatGPT, Claude, emerged simplest most reliable chatbot regarding management rupture.
Language: Английский
Citations
0Pediatric Pulmonology, Journal Year: 2025, Volume and Issue: 60(3)
Published: March 1, 2025
To evaluate the accuracy and comprehensiveness of eight free, publicly available large language model (LLM) chatbots in addressing common questions related to chronic neonatal lung disease (CNLD) home oxygen therapy (HOT). Twenty CNLD HOT-related were curated across nine domains. Responses from ChatGPT-3.5, Google Bard, Bing Chat, Claude 3.5 Sonnet, ERNIE Bot 3.5, GLM-4 generated evaluated by three experienced neonatologists using Likert scales for comprehensiveness. Updated LLM models (ChatGPT-4o mini Gemini 2.0 Flash Experimental) incorporated assess rapid technological advancement. Statistical analyses included ANOVA, Kruskal-Wallis tests, intraclass correlation coefficients. Chat Sonnet demonstrated superior performance, with highest mean scores (5.78 ± 0.48 5.75 0.54, respectively) competence (2.65 0.58 2.80 0.41, respectively). In subsequent testing, Experimental ChatGPT-4o achieved comparable high performance. Performance varied domains, all excelling "equipment safety protocols" "caregiver support." showed self-correction capabilities when prompted. LLMs promise accurate CNLD/HOT information. However, performance variability risk misinformation necessitate expert oversight continued refinement before widespread clinical implementation.
Language: Английский
Citations
0International Journal of Medical Informatics, Journal Year: 2025, Volume and Issue: unknown, P. 105871 - 105871
Published: March 1, 2025
Language: Английский
Citations
0