Foundation models in ophthalmology: opportunities and challenges DOI

Mertcan Sevgi,

Eden Ruffell,

Fares Antaki

и другие.

Current Opinion in Ophthalmology, Год журнала: 2024, Номер unknown

Опубликована: Сен. 26, 2024

Purpose of review Last year marked the development first foundation model in ophthalmology, RETFound, setting stage for generalizable medical artificial intelligence (GMAI) that can adapt to novel tasks. Additionally, rapid advancements large language (LLM) technology, including models such as GPT-4 and Gemini, have been tailored specialization evaluated on clinical scenarios with promising results. This explores opportunities challenges further these technologies. Recent findings RETFound outperforms traditional deep learning specific tasks, even when only fine-tuned small datasets. LMMs like Med-Gemini Medprompt perform better than out-of-the-box ophthalmology However, there is still a significant deficiency ophthalmology-specific multimodal models. gap primarily due substantial computational resources required train limitations high-quality Summary Overall, present but face challenges, particularly need high-quality, standardized datasets training specialization. Although has focused vision models, greatest lie advancing which more closely mimic capabilities clinicians.

Язык: Английский

Clinical Text Datasets for Medical Artificial Intelligence and Large Language Models — A Systematic Review DOI
Jiageng Wu, Xiaocong Liu, Minghui Li

и другие.

NEJM AI, Год журнала: 2024, Номер 1(6)

Опубликована: Май 17, 2024

Privacy and ethical considerations limit access to large-scale clinical datasets, particularly text data, which contain extensive diverse information serve as the foundation for building large language models (LLMs). The limited accessibility of data impedes development artificial intelligence systems hampers research participation from resource-poor regions medical institutions, thereby exacerbating health care disparities. In this review, we conduct a global review identify publicly available datasets elaborate on their accessibility, diversity, usability LLMs. We screened 3962 papers across (PubMed MEDLINE) computational linguistic academic databases (the Association Computational Linguistics Anthology) well 239 tasks prevalent natural processing (NLP) challenges, such National NLP Clinical Challenges (n2c2). identified 192 unique that claimed be available. Following an institutional board–approved data-requesting pipeline, was granted fewer than half (91 [47.4%]) with additional 14 (7.3%) being regulated 87 (45.3%) remaining inaccessible. cover nine languages countries over 10 million records, mostly (88 [95.7%]) originated Americas, Europe, Asia, none originating Oceania or Africa, leaving these significantly underrepresented. Distribution differences were also evident within focused context supported tasks, intensive unit (18 [16.8%]), respiratory disease (13 [12.1%]), cardiovascular (11 [10.3%]) gaining significant attention. Named entity recognition (23 [21.7%]), classification (22 [20.8%]), event extraction (12 [11.3%]) most explored datasets. To our knowledge, is first systematic characterize LLMs, highlighting difficulty in underrepresentation languages, challenges posed by Sharing diversified necessary, protection promote research.

Язык: Английский

Процитировано

24

FFA-GPT: an automated pipeline for fundus fluorescein angiography interpretation and question-answer DOI Creative Commons

Xiaolan Chen,

Weiyi Zhang, Pusheng Xu

и другие.

npj Digital Medicine, Год журнала: 2024, Номер 7(1)

Опубликована: Май 3, 2024

Abstract Fundus fluorescein angiography (FFA) is a crucial diagnostic tool for chorioretinal diseases, but its interpretation requires significant expertise and time. Prior studies have used Artificial Intelligence (AI)-based systems to assist FFA interpretation, these lack user interaction comprehensive evaluation by ophthalmologists. Here, we large language models (LLMs) develop an automated pipeline both report generation medical question-answering (QA) images. The comprises two parts: image-text alignment module (Bootstrapping Language-Image Pre-training) LLM (Llama 2) interactive QA. model was developed using 654,343 images with 9392 reports. It evaluated automatically, language-based classification-based metrics, manually three experienced automatic of the generated reports demonstrated that system can generate coherent comprehensible free-text reports, achieving BERTScore 0.70 F1 scores ranging from 0.64 0.82 detecting top-5 retinal conditions. manual revealed acceptable accuracy (68.3%, Kappa 0.746) completeness (62.3%, 0.739) free-form answers were manually, majority meeting ophthalmologists’ criteria (error-free: 70.7%, complete: 84.0%, harmless: 93.7%, satisfied: 65.3%, Kappa: 0.762–0.834). This study introduces innovative framework combines multi-modal transformers LLMs, enhancing ophthalmic image facilitating communications during consultation.

Язык: Английский

Процитировано

10

Large language models leverage external knowledge to extend clinical insight beyond language boundaries DOI
Jiageng Wu, Xian Wu, Zhaopeng Qiu

и другие.

Journal of the American Medical Informatics Association, Год журнала: 2024, Номер 31(9), С. 2054 - 2064

Опубликована: Апрель 29, 2024

Large Language Models (LLMs) such as ChatGPT and Med-PaLM have excelled in various medical question-answering tasks. However, these English-centric models encounter challenges non-English clinical settings, primarily due to limited knowledge respective languages, a consequence of imbalanced training corpora. We systematically evaluate LLMs the Chinese context develop novel in-context learning framework enhance their performance.

Язык: Английский

Процитировано

9

Understanding natural language: Potential application of large language models to ophthalmology DOI Creative Commons
Zefeng Yang, Biao Wang, Fengqi Zhou

и другие.

Asia-Pacific Journal of Ophthalmology, Год журнала: 2024, Номер 13(4), С. 100085 - 100085

Опубликована: Июль 1, 2024

Large language models (LLMs), a natural processing technology based on deep learning, are currently in the spotlight. These closely mimic comprehension and generation. Their evolution has undergone several waves of innovation similar to convolutional neural networks. The transformer architecture advancement generative artificial intelligence marks monumental leap beyond early-stage pattern recognition via supervised learning. With expansion parameters training data (terabytes), LLMs unveil remarkable human interactivity, encompassing capabilities such as memory retention comprehension. advances make particularly well-suited for roles healthcare communication between medical practitioners patients. In this comprehensive review, we discuss trajectory their potential implications clinicians For clinicians, can be used automated documentation, given better inputs extensive validation, may able autonomously diagnose treat future. patient care, triage suggestions, summarization documents, explanation patient's condition, customizing education materials tailored level. limitations possible solutions real-world use also presented. Given rapid advancements area, review attempts briefly cover many that play ophthalmic space, with focus improving quality delivery.

Язык: Английский

Процитировано

7

Human versus Artificial Intelligence: ChatGPT-4 Outperforming Bing, Bard, ChatGPT-3.5 and Humans in Clinical Chemistry Multiple-Choice Questions DOI Creative Commons
Malik Sallam,

Khaled Al‐Salahat,

Huda Eid

и другие.

Advances in Medical Education and Practice, Год журнала: 2024, Номер Volume 15, С. 857 - 871

Опубликована: Сен. 1, 2024

Artificial intelligence (AI) chatbots excel in language understanding and generation. These models can transform healthcare education practice. However, it is important to assess the performance of such AI various topics highlight its strengths possible limitations. This study aimed evaluate ChatGPT (GPT-3.5 GPT-4), Bing, Bard compared human students at a postgraduate master's level Medical Laboratory Sciences.

Язык: Английский

Процитировано

6

Evaluation of Prompts to Simplify Cardiovascular Disease Information Using a Large Language Model: Cross-Sectional Study (Preprint) DOI Creative Commons
Vishala Mishra, Ashish Sarraju, Neil Kalwani

и другие.

Journal of Medical Internet Research, Год журнала: 2024, Номер 26, С. e55388 - e55388

Опубликована: Янв. 31, 2024

In this cross-sectional study, we evaluated the completeness, readability, and syntactic complexity of cardiovascular disease prevention information produced by GPT-4 in response to 4 kinds prompts.

Язык: Английский

Процитировано

5

Consistent Performance of GPT-4o in Rare Disease Diagnosis Across Nine Languages and 4967 Cases DOI Creative Commons
Leonardo Chimirri, J. Harry Caufield, Yasemin Bridges

и другие.

medRxiv (Cold Spring Harbor Laboratory), Год журнала: 2025, Номер unknown

Опубликована: Фев. 28, 2025

Large language models (LLMs) are increasingly used in the medical field for diverse applications including differential diagnostic support. The estimated training data to create LLMs such as Generative Pretrained Transformer (GPT) predominantly consist of English-language texts, but could be across globe support diagnostics if barriers overcome. Initial pilot studies on utility diagnosis languages other than English have shown promise, a large-scale assessment relative performance these variety European and non-European comprehensive corpus challenging rare-disease cases is lacking. We created 4967 clinical vignettes using structured captured with Human Phenotype Ontology (HPO) terms Global Alliance Genomics Health (GA4GH) Phenopacket Schema. These span total 378 distinct genetic diseases 2618 associated phenotypic features. translations together language-specific templates generate prompts English, Chinese, Czech, Dutch, German, Italian, Japanese, Spanish, Turkish. applied GPT-4o, version gpt-4o-2024-08-06, task delivering ranked zero-shot prompt. An ontology-based approach Mondo disease ontology was map synonyms subtypes diagnoses order automate evaluation LLM responses. For GPT-4o placed correct at first rank 19·8% within top-3 ranks 27·0% time. In comparison, eight non-English tested here 1 between 16·9% 20·5%, 25·3% 27·7% cases. consistent nine tested. This suggests that may settings. NHGRI 5U24HG011449 5RM1HG010860. P.N.R. supported by Professorship Alexander von Humboldt Foundation; P.L. National Grant (PMP21/00063 ONTOPREC-ISCIII, Fondos FEDER).

Язык: Английский

Процитировано

0

Language discrepancies in the performance of generative artificial intelligence models: an examination of infectious disease queries in English and Arabic DOI Creative Commons
Malik Sallam,

Kholoud Al-Mahzoum,

Omaima Alshuaib

и другие.

BMC Infectious Diseases, Год журнала: 2024, Номер 24(1)

Опубликована: Авг. 8, 2024

Assessment of artificial intelligence (AI)-based models across languages is crucial to ensure equitable access and accuracy information in multilingual contexts. This study aimed compare AI model efficiency English Arabic for infectious disease queries.

Язык: Английский

Процитировано

4

The performance of OpenAI ChatGPT-4 and Google Gemini in virology multiple-choice questions: a comparative analysis of English and Arabic responses DOI Creative Commons
Malik Sallam,

Kholoud Al-Mahzoum,

Rawan Ahmad Almutawaa

и другие.

BMC Research Notes, Год журнала: 2024, Номер 17(1)

Опубликована: Сен. 3, 2024

Язык: Английский

Процитировано

4

ChatGPT usage and attitudes are driven by perceptions of usefulness, ease of use, risks, and psycho-social impact: a study among university students in the UAE DOI Creative Commons
Malik Sallam, Walid El‐Sayed, Muhammad Y. Al‐Shorbagy

и другие.

Frontiers in Education, Год журнала: 2024, Номер 9

Опубликована: Авг. 7, 2024

Background The use of ChatGPT among university students has gained a recent popularity. current study aimed to assess the factors driving attitude and usage as an example generative artificial intelligence (genAI) in United Arab Emirates (UAE). Methods This cross-sectional was based on previously validated Technology Acceptance Model (TAM)-based survey instrument termed TAME-ChatGPT. self-administered e-survey distributed by emails for enrolled UAE universities during September–December 2023 using convenience-based approach. Assessment demographic academic variables, TAME-ChatGPT constructs’ roles conducted univariate followed multivariate analyses. Results final sample comprised 608 participants, 91.0% whom heard while 85.4% used before study. Univariate analysis indicated that positive associated with three constructs namely, lower perceived risks, anxiety, higher scores technology/social influence. For usage, being male, nationality, point grade average (GPA) well four usefulness, risks use, behavior/cognitive construct ease-of-use construct. In analysis, only explained variance towards (80.8%) its (76.9%). Conclusion findings is commonplace UAE. determinants included cognitive behavioral factors, ease determined These should be considered understanding motivators successful adoption genAI including education.

Язык: Английский

Процитировано

4