Okülofasiyal Plastik ve Orbital Cerrahide İngilizce ve Türkçe Dil Çeşitliliğinin Yapay Zeka Chatbot Performansına Etkisi: ChatGPT-3.5, Copilot ve Gemini Üzerine Bir Çalışma DOI
Eyüpcan Şensoy, Mehmet Çıtırık

OSMANGAZİ JOURNAL OF MEDICINE, Journal Year: 2024, Volume and Issue: 46(5)

Published: Sept. 3, 2024

Ücretsiz olarak erişim sağlanabilen ChatGPT-3,5, Copilot ve Gemini yapay zeka sohbet botlarına okülofasiyal plastik orbita cerrahisi ile ilişkili farklı dillerdeki aynı soru uygulamalarının bu programların performanslarına olan etkilerini araştırmaktır. Okülofasiyal 30 sorunun İngilizce Türkçe versiyonları uygulandı. Sohbet botlarının verdikleri cevaplar kitap arkasında yer alan cevap anahtarı karşılaştırıldı, doğru yanlış gruplandırıldı. Birbirlerine üstünlükleri istatistiksel karşılaştırıldı. ChatGPT-3,5 soruların %43,3’üne verirken, %23,3’üne verdi (p=0,07). %73,3’üne %63,3’üne (p=0,375). %46,7’sine %33,3’üne (p=0,344). Copilot, soruları cevaplamada diğer programlardan daha yüksek performans gösterdi (p<0,05). bilgi düzeylerinin geliştirilmesinin yanında performanslarının da incelenmeye geliştirilmeye ihtiyacı vardır. botlarındaki dezavantajların düzeltilmesi, yaygın güvenilir bir şekilde kullanılmasına zemin hazırlayacaktır.

A Systematic Review of Generative AI for Teaching and Learning Practice DOI Creative Commons
Bayode Ogunleye, Kudirat Ibilola Zakariyyah, Oluwaseun Ajao

et al.

Education Sciences, Journal Year: 2024, Volume and Issue: 14(6), P. 636 - 636

Published: June 13, 2024

The use of generative artificial intelligence (GenAI) in academia is a subjective and hotly debated topic. Currently, there are no agreed guidelines towards the usage GenAI systems higher education (HE) and, thus, it still unclear how to make effective technology for teaching learning practice. This paper provides an overview current state research on HE. To this end, study conducted systematic review relevant studies indexed by Scopus, using preferred reporting items reviews meta-analyses (PRISMA) guidelines. search criteria revealed total 625 papers, which 355 met final inclusion criteria. findings from showed future trends documents, citations, document sources/authors, keywords, co-authorship. gaps identified suggest that while some authors have looked at understanding detection AI-generated text, may be beneficial understand can incorporated into supporting educational curriculum assessments, teaching, delivery. Furthermore, need additional interdisciplinary, multidimensional HE through collaboration. will strengthen awareness students, tutors, other stakeholders, instrumental formulating guidelines, frameworks, policies usage.

Language: Английский

Citations

25

Performance of Google’s Artificial Intelligence Chatbot “Bard” (Now “Gemini”) on Ophthalmology Board Exam Practice Questions DOI Open Access

Monica Botross,

Seyed Omid Mohammadi, Kendall Montgomery

et al.

Cureus, Journal Year: 2024, Volume and Issue: unknown

Published: March 31, 2024

Purpose: To assess the performance of "Bard," one ChatGPT's competitors, in answering practice questions for ophthalmology board certification exam. Methods: In December 2023, 250 multiple-choice from "BoardVitals" exam question bank were randomly selected and entered into Bard to artificial intelligence chatbot's ability comprehend, process, answer complex scientific clinical ophthalmic questions. A random mix text-only image-and-text 10 subsections. Each subsection included 25 The percentage correct responses was calculated per section, an overall assessment score determined. Results: On average, answered 62.4% (156/250) correctly. worst 24% (6/25) on topic "Retina Vitreous," best "Oculoplastics," with a 84% (21/25). While majority minimal difficulty, not all could be processed by Bard. This particularly issue that human images multiple visual files. Some vignette-style also understood therefore omitted. Future investigations will focus having more increase available data points. Conclusions: correctly is capable analyzing vast amounts medical data, it ultimately lacks holistic understanding experience-informed knowledge ophthalmologist. An ophthalmologist's synthesize diverse pieces information draw experience standardized at present irreplaceable, intelligence, its current form, can employed as valuable tool supplementing clinicians' study methods.

Language: Английский

Citations

16

Performance of GPT-4V in Answering the Japanese Otolaryngology Board Certification Examination Questions: Evaluation Study DOI Creative Commons
Masao Noda, Takayoshi Ueno, Ryota Koshu

et al.

JMIR Medical Education, Journal Year: 2024, Volume and Issue: 10, P. e57054 - e57054

Published: March 9, 2024

Artificial intelligence models can learn from medical literature and clinical cases generate answers that rival human experts. However, challenges remain in the analysis of complex data containing images diagrams.

Language: Английский

Citations

15

Understanding natural language: Potential application of large language models to ophthalmology DOI Creative Commons
Zefeng Yang, Biao Wang, Fengqi Zhou

et al.

Asia-Pacific Journal of Ophthalmology, Journal Year: 2024, Volume and Issue: 13(4), P. 100085 - 100085

Published: July 1, 2024

Large language models (LLMs), a natural processing technology based on deep learning, are currently in the spotlight. These closely mimic comprehension and generation. Their evolution has undergone several waves of innovation similar to convolutional neural networks. The transformer architecture advancement generative artificial intelligence marks monumental leap beyond early-stage pattern recognition via supervised learning. With expansion parameters training data (terabytes), LLMs unveil remarkable human interactivity, encompassing capabilities such as memory retention comprehension. advances make particularly well-suited for roles healthcare communication between medical practitioners patients. In this comprehensive review, we discuss trajectory their potential implications clinicians For clinicians, can be used automated documentation, given better inputs extensive validation, may able autonomously diagnose treat future. patient care, triage suggestions, summarization documents, explanation patient's condition, customizing education materials tailored level. limitations possible solutions real-world use also presented. Given rapid advancements area, review attempts briefly cover many that play ophthalmic space, with focus improving quality delivery.

Language: Английский

Citations

7

Using large language models (ChatGPT, Copilot, PaLM, Bard, and Gemini) in Gross Anatomy course: Comparative analysis DOI Open Access
Volodymyr Mavrych,

Paul Ganguly,

Olena Bolgova

et al.

Clinical Anatomy, Journal Year: 2024, Volume and Issue: unknown

Published: Nov. 21, 2024

The increasing application of generative artificial intelligence large language models (LLMs) in various fields, including medical education, raises questions about their accuracy. primary aim our study was to undertake a detailed comparative analysis the proficiencies and accuracies six different LLMs (ChatGPT-4, ChatGPT-3.5-turbo, ChatGPT-3.5, Copilot, PaLM, Bard, Gemini) responding multiple-choice (MCQs), generating clinical scenarios MCQs for upper limb topics Gross Anatomy course students. Selected chatbots were tested, answering 50 USMLE-style MCQs. randomly selected from exam database students reviewed by three independent experts. results five successive attempts answer each set evaluated terms accuracy, relevance, comprehensiveness. best result provided ChatGPT-4, which answered 60.5% ± 1.9% accurately, then Copilot (42.0% 0.0%) ChatGPT-3.5 (41.0% 5.3%), followed ChatGPT-3.5-turbo (38.5% 5.7%). Google PaLM 2 (34.5% 4.4%) Bard (33.5% 3.0%) gave poorest results. overall performance GPT-4 statistically superior (p < 0.05) those GPT-3.5, GPT-Turbo, PaLM2, 18.6%, 19.5%, 22%, 26%, 27%, respectively. Each chatbot asked generate scenario topics-anatomical snuffbox, supracondylar fracture humerus, cubital fossa-and related anatomical with options each, indicate correct answers. Two experts analyzed graded 216 records received (0-5 scale). recorded Gemini, 2; had lowest grade. Technological progress notwithstanding, have yet mature sufficiently take over role teacher or facilitator completely within course; however, they can be valuable tools educators.

Language: Английский

Citations

6

The Performance of Artificial Intelligence-based Large Language Models on Ophthalmology-related Questions in Swedish Proficiency Test for Medicine: ChatGPT-4 omni vs Gemini 1.5 Pro DOI Creative Commons
Mehmet Cem Sabaner, Arzu Seyhan Karatepe, Kemal Mert Mutibayraktaroglu

et al.

Deleted Journal, Journal Year: 2024, Volume and Issue: unknown, P. 100070 - 100070

Published: Sept. 1, 2024

Language: Английский

Citations

4

Evaluating the Accuracy of Advanced Language Learning Models in Ophthalmology: A Comparative Study of ChatGPT-4o and Meta AI’s Llama 3.1 DOI Creative Commons
Trevor Lin,

Ryan T.K. Lin,

Rahul Mhaskar

et al.

Advances in Ophthalmology Practice and Research, Journal Year: 2025, Volume and Issue: unknown

Published: Jan. 1, 2025

Language: Английский

Citations

0

Oftalmik Patolojiler ve Göz İçi Tümörlerinde Dil Farklılıklarının Yapay Zeka Chatbot Performansı Üzerindeki Etkisinin Değerlendirilmesi: ChatGPT-3.5, Copilot ve Gemini Üzerine Bir Çalışma DOI Open Access
Eyüpcan Şensoy, Mehmet Çıtırık

Harran Üniversitesi Tıp Fakültesi Dergisi, Journal Year: 2025, Volume and Issue: 22(1), P. 61 - 64

Published: March 11, 2025

Amaç: ChatGPT-3,5, Copilot ve Gemini yapay zeka sohbet botlarının oftalmik patolojiler intraoküler tümörlerle ilişkili çoktan seçmeli sorularda ki başarısına dil farklılığının etkisini araştırmak Materyal Method: Oftalmik ilgili bilgi düzeyini test eden 36 İngilizce soru çalışmaya dahil edildi. Sertifikasyonlu çevirmen (native speaker) tarafından Türkçe çevirilerinin gerçekleştirilmesi sonrasında bu soruların hem de olarak botlarına soruldu. Verilen cevaplar cevap anahtarı ile karşılaştırılıp doğru yanlış gruplandırıldı. Bulgular: sorulara sırası %75, %66,7 %63,9 oranında verdi. Bu programlar ise %63,9, %69,4 Sohbet botları arasında hallerini cevaplamada farklı oranda görüldüğü halde, istatistiksel anlamlı bir fark tespit edilmedi (p&gt;0,05). Sonuç: Yapay dağarcığının geliştirilmesinin yanında dillerde aynı algıyı oluşturabilmek tek doğruya erişimi sağlayabilmek için dilleri anlama, çevirebilme fikir üretebilme özelliklerinin geliştirilmeye ihtiyacı vardır.

Citations

0

Comparative Evaluation of Large Language Models for Medical Education: Performance Analysis in Urinary System Histology. DOI Creative Commons
Anikó Szabó, Ghasem Dolatkhah Laein

Research Square (Research Square), Journal Year: 2025, Volume and Issue: unknown

Published: March 13, 2025

Abstract Large language models (LLMs) show potential for medical education, but their domain-specific capabilities need systematic evaluation. This study presents a comparative assessment of thirteen LLMs in urinary system histology education. Using multi-dimensional framework, we evaluated across two tasks: answering 65 validated multiple-choice questions (MCQs) and generating clinical scenarios with items. For MCQ performance, assessed accuracy along explanation quality through relevance comprehensiveness metrics. scenario generation, Quality, Complexity, Relevance, Correctness, Variety dimensions. Performance varied substantially tasks, ChatGPT-o1 achieving highest (96.31 ± 17.85%) Claude-3.5 demonstrating superior generation (91.4% maximum possible score). All significantly outperformed random guessing large effect sizes. Statistical analyses revealed significant differences consistency multiple attempts dimensional most showing higher Correctness than Quality scores generation. Term frequency analysis content imbalances all models, overemphasis certain anatomical structures complete omission others. Our findings demonstrate that while considerable promise reliable implementation requires matching specific to appropriate educational implementing verification mechanisms, recognizing current limitations pedagogically balanced content.

Language: Английский

Citations

0

Can off-the-shelf visual large language models detect and diagnose ocular diseases from retinal photographs? DOI Creative Commons

Sahana Srinivasan,

Hongwei Ji, David Z. Chen

et al.

BMJ Open Ophthalmology, Journal Year: 2025, Volume and Issue: 10(1), P. e002076 - e002076

Published: April 1, 2025

Background The advent of generative artificial intelligence has led to the emergence multiple vision large language models (VLLMs). This study aimed evaluate capabilities commonly available VLLMs, such as OpenAI’s GPT-4V and Google’s Gemini, in detecting diagnosing ocular diseases from retinal images. Methods analysis From Singapore Epidemiology Eye Diseases (SEED) study, we selected 44 representative photographs, including 10 healthy 34 representing six eye (age-related macular degeneration, diabetic retinopathy, glaucoma, visually significant cataract, myopic degeneration vein occlusion). (both default data analyst modes) Google Gemini were prompted with each image determine if retina was normal or abnormal provide diagnostic descriptions deemed abnormal. outputs VLLMs evaluated for accuracy by three attending-level ophthalmologists using a three-point scale (poor, borderline, good). Results mode demonstrated highest detection rate, correctly identifying 33 out detected (97.1%), outperforming its (61.8%) (41.2%). Despite relatively high rates, quality generally suboptimal—with only 21.2% GPT-4V’s (default) responses, 4.8% (data analyst) responses 28.6% Gemini’s rated good. Conclusions Although showed sensitivity abnormality detection, all inadequate providing accurate diagnoses diseases. These findings emphasise need domain-customised suggest continued human oversight clinical ophthalmology.

Language: Английский

Citations

0