Vision language models in ophthalmology DOI
Gilbert Lim, Kabilan Elangovan, Liyuan Jin

et al.

Current Opinion in Ophthalmology, Journal Year: 2024, Volume and Issue: unknown

Published: Aug. 27, 2024

Purpose of review Vision Language Models are an emerging paradigm in artificial intelligence that offers the potential to natively analyze both image and textual data simultaneously, within a single model. The fusion these two modalities is particular relevance ophthalmology, which has historically involved specialized imaging techniques such as angiography, optical coherence tomography, fundus photography, while also interfacing with electronic health records include free text descriptions. This then surveys fast-evolving field they apply current ophthalmologic research practice. Recent findings Although models incorporating have long provenance effective multimodal recent development exploiting advances technologies transformer autoencoder models. Summary offer assist streamline existing clinical workflow whether previsit, during, or post-visit. There are, however, important challenges be overcome, particularly regarding patient privacy explainability model recommendations.

Language: Английский

Generative artificial intelligence in ophthalmology: current innovations, future applications and challenges DOI Creative Commons
Sadi Can Sönmez, Mertcan Sevgi, Fares Antaki

et al.

British Journal of Ophthalmology, Journal Year: 2024, Volume and Issue: 108(10), P. 1335 - 1340

Published: June 26, 2024

The rapid advancements in generative artificial intelligence are set to significantly influence the medical sector, particularly ophthalmology. Generative adversarial networks and diffusion models enable creation of synthetic images, aiding development deep learning tailored for specific imaging tasks. Additionally, advent multimodal foundational models, capable generating text videos, presents a broad spectrum applications within These range from enhancing diagnostic accuracy improving patient education training healthcare professionals. Despite promising potential, this area technology is still its infancy, there several challenges be addressed, including data bias, safety concerns practical implementation these technologies clinical settings.

Language: Английский

Citations

7

Why we need to be careful with LLMs in medicine DOI Creative Commons
Jean‐Christophe Bélisle‐Pipon

Frontiers in Medicine, Journal Year: 2024, Volume and Issue: 11

Published: Dec. 4, 2024

Large language models (LLMs), the core of many generative AI (genAI) tools, are gaining attention for their potential applications in healthcare. These wide-ranging, including tasks such as assisting with diagnostic processes, streamlining patient communication, and providing decision support to healthcare professionals. Their ability process generate large volumes text makes them promising tools managing medical documentation enhancing efficiency clinical workflows (Harrer, 2023). LLMs offer a distinct advantage that they relatively straightforward use, particularly since introduction ChatGPT-3.5, exhibit notable alignment human communication patterns, facilitating more natural interactions (Ayers et al., 2023) acceptance LLMs' conclusions (Shekar 2024). operate by predicting next word sequence based on statistical correlations identified datasets (Patil 2021;Schubert However, while these effective at producing appears coherent contextually appropriate, do so without genuine understanding meaning or context. This limitation is significant healthcare, where accuracy critical. Unlike cognition, which driven complex array goals behaviors, narrowly focused generation. focus can lead production plausible sounding but inaccurate information, phenomenon referred "AI hallucination" (OpenAI In high-stakes environments like prediction, triaging, diagnosis, monitoring, care, inaccuracies have serious consequences.While numerous articles across various Frontiers journals discuss LLMs, few hallucinations central issue. For example, Jin al. (2023) Medicine note "While ChatGPT tremendous ophthalmology, addressing challenges hallucination misinformation paramount." Similarly, Giorgino Surgery emphasize "The responsible use this tool must be an awareness its limitations biases. Foremost among dangerous concept hallucination." Beyond realm Williams (2024) Education observes gained widespread around 2022, coinciding rise ChatGPT. Users noticed chatbots often generated random falsehoods responses, seemingly indifferent relevance accuracy." continues stressing "term has been criticized anthropomorphic connotations, it likens perception behavior models." Despite critical discussions, remain sparse compared praising medicine, highlighting need greater engagement technologies. imbalance highlights emphasis mitigating risks posed models. Building concern, Hicks, Humphries, Slater challenge conventional thinking paper "ChatGPT Bullshit." They assert produced should not simply labeled "hallucinations," "bullshit," term philosopher Harry Frankfurt's (2009) work. According perspective, "bullshit" reflects disregard accuracy, poses genAI By reconceptualizing "bullshiting" instead "hallucinating," aims provide perspective pose applications. It explores practical solutions layered LLM architectures improved XAI methods, emphasizes urgency implementing tailored oversight mechanisms counterbalance political industry push deregulation sensitive domains medicine.LLMs datasets. While produce human-like text, don't inherently understand verify acting "prop-oriented make-believe tools" (Mallory, errors result technical glitches resolved better data refined algorithms stem from fundamental nature-they evaluate evidence reason sense. distinction between processing reasoning misconceptions, when portrayed perceived capable cognition. accurate relevant outputs correlations, comprehension. As Bender (2021) famously argued, sequences learned function "stochastic parrots." contrast, involves deeper cognitive processes understanding, thinking, interpretation. some, Downes (2024), view, suggesting sensible answers leveraging higher-level structural information inherent design, fact remains fundamentally agnostic empirical reality. Recognizing crucial, predictions made models-no matter how convincing-should equated deliberate, evidence-based mind. When systems make mistakes, because malfunctioning way fixed tweaked algorithms. arbitrate first place. Hicks point out: trying communicate something believe perceive. inaccuracy due misperception hallucination. we pointed out, convey all. bullshitting." indifference especially concerning interpretability, liability paramount. Consider implications using advice assist diagnosing patients-if nature misunderstood, risks. Trusting potentially flawed could misdiagnoses improper treatments, consequences care. stated Harrer (2023): "Health buyers beware: experimental technology yet ready primetime."Recognizing rather than "hallucinations" calls cautious skeptical approach, according colleagues. Titus convincingly "Attributing semantic warranted doing social ethical related anthropormorphizing (sic) over-trusting meaningful truthful responses." health sector, implies that, mMedical professionals wary about avoid standalone sources (Cohen, Instead, serve supplementary all rigorously validated experts before being applied used any setting. The medicine significant. If truth, there heightened responsibility developers users ensure cause harm. only improving also clearly communicating users. al note, "Calling chatbot 'hallucinations' feeds into overblown hype abilities cheerleaders, unnecessary consternation general public. suggests problems might work, misguided efforts amongst specialists." Given expert validation both design prior (Bélisle-Pipon 2021;Cohen, 2023).Ensuring trustworthiness requires shared responsibility, creating transparent critically assessing (Amann 2020;Díaz-Rodríguez 2023;Siala & Wang, 2022;Smith, 2021). Medical trained AI-generated content may sound convincing, always reliable. Developers prioritize interfaces highlight encourage evaluation outputs. disclaimers confidence scores help assess reliability provided (Gallifant basically what Notice Explanation section White House's Bill Rights (2022) requires: "Medical source advice. tool, setting." disclosure enough itself conducive problems, shifting burden onto Such accessible understandable does reproduce consumer products' Terms Conditions, ridiculously long nobody reads (Solove, 2024).Employing multiple layers mitigate individual solve previously raised issues. Work currently underway area (Farquhar Usually entails enabling one model cross-validate another identify correct inaccuracies, thereby reducing incidence wherein different assigned specialized factchecking contextual validation, enhance robustness (Springer, methodology introduces complexity, risk error propagation associated coordination Furthermore, strategy, Verspoor "fighting fire fire," incrementally improve outputs, fails address foundational issue lack true understanding. An over-reliance diminishing returns, added complexity novel negate anticipated benefits enhanced accuracy. Additionally, approach fostering overdependence (Levinstein Herrmann, 2024), undermining role expertise requiring nuanced decision-making.LLMs still valuable contributions practice if wisely. administrative tasks, documentation, preliminary topics. even useful defending patients' interests insurance claims (Rosenbluth, designed safeguards prevent One utility rely solely them, implement verification reliable databases (not just web-scrapping). Even concerns "bullshit." connecting trusted database provides cross-referenced sources. system would incorporate mechanism arbitrating evidence, further certain level trustworthiness. integration implemented carefully introducing new forms inadvertently embedding values inconsistent context deployed 2021).Explainable (XAI) increase transparency decision-making, LLMs. Techniques post-hoc explanations generates fields limitation: depend (Titus, Moreover, techniques tracing back underlying fail expose epistemic inability evidence. explanations, therefore, reflect patterns Regulatory frameworks, European Union's Regulation ( 2024) US Blueprint (The House, 2022), establish standards transparency, safety, accountability. adapting meet overcome decision-making. Experts argue refining developing paradigms, neurosymbolic AI, combines neural networks logical gaps.Neurosymbolic offers alternative, integrating adaptability precision enable robust (Hamilton 2024;Wan key offering interpretability. Vivek Wadhwa suggests, nearing developmental ceiling, investment returns. regulators investors explore advancing drive generation innovation, ensuring increased trustworthy reasoning. promise, panacea. faces scalability, handling real-world (Marra reliance structures fully capture nuances probabilistic ambiguous common medicine. Thus, represents incremental advance, oversight, multidisciplinary collaboration, continued innovation essential AI's healthcare.A deep, examination crucial ways safety integrity. fluent, proficiency conceals troubling reality: responses necessarily grounded verified facts consistent logic. field, decision-making paramount, relying flaws presents core, predict training data. mechanism, though powerful generating truth. goal most statistically likely response appropriate one, infiltrating workflows.As underscore, "Responsible implementation continuous monitoring harness minimizing risks." A concern reproducibility. traditional software systems, identical inputs yield same question occasions. unpredictability undermines needed settings, consistency delivering safe Medicine, discipline, cannot afford embrace "epistemic insouciance"-a validity knowledge. problematic given cases, anchored factual reality merely sounds plausible. "hallucination" describe factually incorrect statements trivializes severity problem: medicine-an 1990s-this flaw adoption unreliable compromise integrity care.The standard ChatGPT, warn mistakes. Check important info," insufficient settings. points out "In defence OpenAI, never advertised advisor crowdsourced refinement experiment"; acknowledged mitigation genAI, sparked growing caution amid internet-level hype. sector significant, (especially Hhealthcare professionals) time every piece high-pressure stake margin slim, Entrusting fact-checking giving resources assurances exposes field well arguably ethics dumping, offload downstream Victor, casual use-particularly life-threatening consequences-reflects complacency. Transparency, luxury necessity. Healthcare recommends why arrived conclusions. Explainability building trust informed decisions output. Without "black boxes," accountability justification-an untenable situation decision-making.The amplified current climate, United States. incoming Trump administration expected removal "unnecessary" regulations accelerate (Chalfant, lobbying influential tech organizations BSA | Software Alliance -which companies OpenAI Microsoft-advocate policies reduce regulatory constraints promote adoption. group acknowledges importance international governance standards, removing barriers deprioritizing (such government-imposed mechanisms). President-elect Trump's plans undo previous administration-including management framework foster accountability-signal shift toward (Verma Vynck, perhaps regulation winter. move weaken deploying highstakes healthcare.Given context, systems. Developers, policymakers, institutions collaborate uphold deployment, regardless environment. efforts, exacerbate tendency misleading Trustworthy treated secondary consideration, outcomes lives directly stake.Reframing seen harmless recognizing terminology-it reframing small, occasional mistakes operate. Policymakers, providers, recognize stakes high, rigorous safeguards, erode quality

Language: Английский

Citations

7

Investigating the role of large language models on questions about refractive surgery DOI

Süleyman Demir

International Journal of Medical Informatics, Journal Year: 2025, Volume and Issue: 195, P. 105787 - 105787

Published: Jan. 7, 2025

Language: Английский

Citations

0

Speech-Driven Medical Emergency Decision-Support Systems in Constrained Environments DOI

Sergey K. Aityan,

Abdolreza Mosaddegh, Rolando Herrero

et al.

Published: Jan. 1, 2025

Language: Английский

Citations

0

Evaluating the Efficacy of Large Language Models in Guiding Treatment Decisions for Pediatric Refractive Error DOI Creative Commons

Daohuan Kang,

Hongkang Wu,

Lu Yuan

et al.

Ophthalmology and Therapy, Journal Year: 2025, Volume and Issue: unknown

Published: Feb. 22, 2025

Effective management of pediatric myopia, which includes treatments like corrective lenses and low-dose atropine, requires accurate clinical decisions. However, the complexity refractive data, such as variations in visual acuity, axial length, patient-specific factors, pose challenges to determining optimal treatment. This study aims evaluate performance three large language models analyzing these data. A dataset 100 records, including parameters acuity was analyzed using ChatGPT-3.5, ChatGPT-4o, Wenxin Yiyan, respectively. Each model tasked with whether intervention needed subsequently recommending a treatment (eyeglasses, orthokeratology lens, or atropine). The recommendations were compared professional optometrists' consensus, rated on 1–5 Global Quality Score (GQS) scale, evaluated for safety utilizing three-tier accuracy assessment. ChatGPT-4o outperformed both ChatGPT-3.5 Yiyan needs, an 90%, significantly higher than (p < 0.05). It also achieved highest GQS 4.4 ± 0.55, surpassing other 0.001), 85% responses "good" ahead (82%) (74%). made only eight errors interventions, fewer (12) (15). Additionally, it performed better incomplete abnormal maintaining quality scores. showed safety, making promising tool decision support ophthalmology, although expert oversight is still necessary.

Language: Английский

Citations

0

ChatGPT-4 for addressing patient-centred frequently asked questions in age-related macular degeneration clinical practice DOI Creative Commons
Henrietta Wang,

Amanda Ie,

Thomas Chan

et al.

Eye, Journal Year: 2025, Volume and Issue: unknown

Published: April 15, 2025

Abstract Purpose Large language models have shown promise in answering questions related to medical conditions. This study evaluated the responses of ChatGPT-4 patient-centred frequently asked (FAQs) relevant age-related macular degeneration (AMD). Methods Ten experts across a range clinical, education and research practices optometry ophthalmology. Over 200 patient-centric FAQs from authoritative professional society, hospital advocacy websites were condensed into 37 four themes: definition, causes risk factors, symptoms detection, treatment follow-up. The individually input generate responses. graded by using 5-point Likert scale (1 = strongly disagree; 5 agree) domains: coherency, factuality, comprehensiveness, safety. Results Across all themes domains, median scores 4 (“agree”). Comprehensiveness had lowest domains (mean 3.8 ± 0.8), followed factuality 3.9 safety 4.1 0.8) coherency 4.3 0.7). Examination individual showed that (14%), 21 (57%), 23 (62%) 9 (24%) average below (below “agree”) for comprehensiveness respectively. Free-text comments highlighted issues superseded or older technologies, techniques are not routinely used clinical practice, such as genetic testing. Conclusions AMD generally agreeable terms However, areas weakness identified, precluding recommendations routine use provide patients with tailored counselling AMD.

Language: Английский

Citations

0

Evaluation of error detection and treatment recommendations in nucleic acid test reports using ChatGPT models DOI
Wenzheng Han, Chao Wan, Rui Shan

et al.

Clinical Chemistry and Laboratory Medicine (CCLM), Journal Year: 2025, Volume and Issue: unknown

Published: April 18, 2025

Abstract Objectives Accurate medical laboratory reports are essential for delivering high-quality healthcare. Recently, advanced artificial intelligence models, such as those in the ChatGPT series, have shown considerable promise this domain. This study assessed performance of specific GPT models-namely, 4o, o1, and o1 mini-in identifying errors within providing treatment recommendations. Methods In retrospective study, 86 Nucleic acid test report seven upper respiratory tract pathogens were compiled. There 285 from four common error categories intentionally randomly introduced into generated incorrected reports. models tasked with detecting these errors, using three senior scientists (SMLS) interns (MLI) control groups. Additionally, generating accurate reliable recommendations following positive outcomes based on corrected χ2 tests, Kruskal-Wallis Wilcoxon tests used statistical analysis where appropriate. Results comparison SMLS or MLI, accurately detected types, average detection rates 88.9 %(omission), 91.6 % (time sequence), 91.7 (the same individual acted both inspector reviewer). However, rate result input format by was only 51.9 %, indicating a relatively poor aspect. exhibited substantial to almost perfect agreement total (kappa [min, max]: 0.778, 0.837). between MLI moderately lower 0.632, 0.696). When it comes reading all reports, showed obviously reduced time compared (all p<0.001). Notably, our also found GPT-o1 mini model had better consistency identification than model, which that GPT-4o model. The pairwise comparisons model’s outputs across repeated runs 0.912, 0.996). GPT-o1(all significantly outperformed p<0.0001). Conclusions capability some accuracy reliability competent, especially, potentially reducing work hours enhancing clinical decision-making.

Language: Английский

Citations

0

Assessing large language models’ accuracy in providing patient support for choroidal melanoma DOI
Rodrigo Anguita,

Catriona Downie,

Lorenzo Ferro Desideri

et al.

Eye, Journal Year: 2024, Volume and Issue: 38(16), P. 3113 - 3117

Published: July 13, 2024

Language: Английский

Citations

3

Assessment of Large Language Models in Cataract Care Information Provision: A Quantitative Comparison DOI Creative Commons
Zichang Su, Kai Jin,

Hongkang Wu

et al.

Ophthalmology and Therapy, Journal Year: 2024, Volume and Issue: unknown

Published: Nov. 8, 2024

Cataracts are a significant cause of blindness. While individuals frequently turn to the Internet for medical advice, distinguishing reliable information can be challenging. Large language models (LLMs) have attracted attention generating accurate, human-like responses that may used consultation. However, comprehensive assessment LLMs' accuracy within specific domains is still lacking.

Language: Английский

Citations

3

A Performance Evaluation of Large Language Models in Keratoconus: A Comparative Study of ChatGPT-3.5, ChatGPT-4.0, Gemini, Copilot, Chatsonic, and Perplexity DOI Open Access
Ali Hakim Reyhan,

Çağrı Mutaf,

İrfan UZUN

et al.

Journal of Clinical Medicine, Journal Year: 2024, Volume and Issue: 13(21), P. 6512 - 6512

Published: Oct. 30, 2024

This study evaluates the ability of six popular chatbots; ChatGPT-3.5, ChatGPT-4.0, Gemini, Copilot, Chatsonic, and Perplexity, to provide reliable answers questions concerning keratoconus.

Language: Английский

Citations

2