Evaluating Microsoft Bing with ChatGPT-4 for the assessment of abdominal computed tomography and magnetic resonance images DOI Creative Commons
Alperen Elek, Duygu Doğa Ekizalioğlu, Ezgi Güler

et al.

Diagnostic and Interventional Radiology, Journal Year: 2024, Volume and Issue: unknown

Published: Aug. 19, 2024

To evaluate the performance of Microsoft Bing with ChatGPT-4 technology in analyzing abdominal computed tomography (CT) and magnetic resonance images (MRI).

Language: Английский

Prompt engineering in consistency and reliability with the evidence-based guideline for LLMs DOI Creative Commons
Wang Li, Xi Chen, Xiangwen Deng

et al.

npj Digital Medicine, Journal Year: 2024, Volume and Issue: 7(1)

Published: Feb. 20, 2024

Abstract The use of large language models (LLMs) in clinical medicine is currently thriving. Effectively transferring LLMs’ pertinent theoretical knowledge from computer science to their application crucial. Prompt engineering has shown potential as an effective method this regard. To explore the prompt LLMs and examine reliability LLMs, different styles prompts were designed used ask about agreement with American Academy Orthopedic Surgeons (AAOS) osteoarthritis (OA) evidence-based guidelines. Each question was asked 5 times. We compared consistency findings guidelines across evidence levels for assessed by asking same gpt-4-Web ROT prompting had highest overall (62.9%) a significant performance strong recommendations, total 77.5%. not stable (Fleiss kappa ranged −0.002 0.984). This study revealed that variable effects various models, most consistent. An appropriate could improve accuracy responses professional medical questions.

Language: Английский

Citations

81

A pilot study on the efficacy of GPT-4 in providing orthopedic treatment recommendations from MRI reports DOI Creative Commons
Daniel Truhn, Christian David Weber, Benedikt J. Braun

et al.

Scientific Reports, Journal Year: 2023, Volume and Issue: 13(1)

Published: Nov. 17, 2023

Abstract Large language models (LLMs) have shown potential in various applications, including clinical practice. However, their accuracy and utility providing treatment recommendations for orthopedic conditions remain to be investigated. Thus, this pilot study aims evaluate the validity of generated by GPT-4 common knee shoulder using anonymized MRI reports. A retrospective analysis was conducted 20 reports, with varying severity complexity. Treatment were elicited from evaluated two board-certified specialty-trained senior surgeons. Their evaluation focused on semiquantitative gradings limitations LLM-generated recommendations. provided patients (mean age, 50 years ± 19 [standard deviation]; 12 men) acute chronic conditions. The LLM produced largely accurate clinically useful limited awareness a patient’s overall situation, tendency incorrectly appreciate urgency, schematic unspecific observed may reduce its usefulness. In conclusion, LLM-based are adequate not prone ‘hallucinations’, yet inadequate particular situations. Critical guidance healthcare professionals is obligatory, independent use discouraged, given dependency precise data input.

Language: Английский

Citations

46

The impact of large language models on radiology: a guide for radiologists on the latest innovations in AI DOI Creative Commons
Takeshi Nakaura, Rintaro Ito, Daiju Ueda

et al.

Japanese Journal of Radiology, Journal Year: 2024, Volume and Issue: 42(7), P. 685 - 696

Published: March 29, 2024

Abstract The advent of Deep Learning (DL) has significantly propelled the field diagnostic radiology forward by enhancing image analysis and interpretation. introduction Transformer architecture, followed development Large Language Models (LLMs), further revolutionized this domain. LLMs now possess potential to automate refine workflow, extending from report generation assistance in diagnostics patient care. integration multimodal technology with could potentially leapfrog these applications unprecedented levels. However, come unresolved challenges such as information hallucinations biases, which can affect clinical reliability. Despite issues, legislative guideline frameworks have yet catch up technological advancements. Radiologists must acquire a thorough understanding technologies leverage LLMs’ fullest while maintaining medical safety ethics. This review aims aid that endeavor.

Language: Английский

Citations

29

A future role for health applications of large language models depends on regulators enforcing safety standards DOI Creative Commons
Oscar Freyer, Isabella C. Wiest, Jakob Nikolas Kather

et al.

The Lancet Digital Health, Journal Year: 2024, Volume and Issue: 6(9), P. e662 - e672

Published: Aug. 23, 2024

Among the rapid integration of artificial intelligence in clinical settings, large language models (LLMs), such as Generative Pre-trained Transformer-4, have emerged multifaceted tools that potential for health-care delivery, diagnosis, and patient care. However, deployment LLMs raises substantial regulatory safety concerns. Due to their high output variability, poor inherent explainability, risk so-called AI hallucinations, LLM-based applications serve a medical purpose face challenges approval devices under US EU laws, including recently passed Artificial Intelligence Act. Despite unaddressed risks patients, misdiagnosis unverified advice, are available on market. The ambiguity surrounding these creates an urgent need frameworks accommodate unique capabilities limitations. Alongside development frameworks, existing regulations should be enforced. If regulators fear enforcing market dominated by supply or technology companies, consequences layperson harm will force belated action, damaging potentiality advice.

Language: Английский

Citations

26

Assessing the Responses of Large Language Models (ChatGPT-4, Gemini, and Microsoft Copilot) to Frequently Asked Questions in Breast Imaging: A Study on Readability and Accuracy DOI Open Access

Murat Tepe,

Emre Emekli

Cureus, Journal Year: 2024, Volume and Issue: unknown

Published: May 9, 2024

Background Large language models (LLMs), such as ChatGPT-4, Gemini, and Microsoft Copilot, have been instrumental in various domains, including healthcare, where they enhance health literacy aid patient decision-making. Given the complexities involved breast imaging procedures, accurate comprehensible information is vital for engagement compliance. This study aims to evaluate readability accuracy of provided by three prominent LLMs, response frequently asked questions imaging, assessing their potential improve understanding facilitate healthcare communication. Methodology We collected most common on from clinical practice posed them LLMs. then evaluated responses terms accuracy. Responses LLMs were analyzed using Flesch Reading Ease Flesch-Kincaid Grade Level tests through a radiologist-developed Likert-type scale. Results The found significant variations among Gemini Copilot scored higher scales (p < 0.001), indicating easier understand. In contrast, ChatGPT-4 demonstrated greater its 0.001). Conclusions While show promise providing responses, issues may limit utility education. Conversely, despite being less accurate, are more accessible broader audience. Ongoing adjustments evaluations these essential ensure meet diverse needs patients, emphasizing need continuous improvement oversight deployment artificial intelligence technologies healthcare.

Language: Английский

Citations

18

The Role of AI in the Evaluation of Neuroendocrine Tumors: Current State of the Art DOI
Felipe Lopez-Ramirez, Mohammad Yasrab, Florent Tixier

et al.

Seminars in Nuclear Medicine, Journal Year: 2025, Volume and Issue: unknown

Published: Feb. 1, 2025

Language: Английский

Citations

2

Optimizing Large Language Models in Radiology and Mitigating Pitfalls: Prompt Engineering and Fine-tuning DOI
T. Kim, Michael Makutonin, Reza Sirous

et al.

Radiographics, Journal Year: 2025, Volume and Issue: 45(4)

Published: March 6, 2025

Large language models (LLMs) such as generative pretrained transformers (GPTs) have had a major impact on society, and there is increasing interest in using these for applications medicine radiology. This article presents techniques to optimize describes their known challenges limitations. Specifically, the authors explore how best craft natural prompts, process prompt engineering, elicit more accurate desirable responses. The also explain fine-tuning conducted, which general model, GPT-4, further trained specific use case, summarizing clinical notes, improve reliability relevance. Despite enormous potential of models, substantial limit widespread implementation. These tools differ substantially from traditional health technology complexity probabilistic nondeterministic nature, differences lead issues "hallucinations," biases, lack reliability, security risks. Therefore, provide radiologists with baseline knowledge underpinning an understanding them, addition exploring practices engineering fine-tuning. Also discussed are current proof-of-concept cases LLMs radiology literature, decision support report generation, limitations preventing adoption ©RSNA, 2025 See invited commentary by Chung Mongan this issue.

Language: Английский

Citations

2

The virtual reference radiologist: comprehensive AI assistance for clinical image reading and interpretation DOI Creative Commons
Robert Siepmann, Marc Huppertz,

Annika Rastkhiz

et al.

European Radiology, Journal Year: 2024, Volume and Issue: 34(10), P. 6652 - 6666

Published: April 16, 2024

Large language models (LLMs) have shown potential in radiology, but their ability to aid radiologists interpreting imaging studies remains unexplored. We investigated the effects of a state-of-the-art LLM (GPT-4) on radiologists' diagnostic workflow.

Language: Английский

Citations

10

Prompt Engineering Paradigms for Medical Applications: Scoping Review DOI Creative Commons
Jamil Zaghir, Marco Naguib, Mina Bjelogrlic

et al.

Journal of Medical Internet Research, Journal Year: 2024, Volume and Issue: 26, P. e60501 - e60501

Published: Sept. 10, 2024

Prompt engineering, focusing on crafting effective prompts to large language models (LLMs), has garnered attention for its capabilities at harnessing the potential of LLMs. This is even more crucial in medical domain due specialized terminology and technicity. Clinical natural processing applications must navigate complex ensure privacy compliance. engineering offers a novel approach by designing tailored guide exploiting clinically relevant information from texts. Despite promise, efficacy prompt remains be fully explored.

Language: Английский

Citations

10

Ethical Considerations in Human-Centered AI: Advancing Oncology Chatbots through Large Language Models (Preprint) DOI Creative Commons
James C. L. Chow, Kay Li

JMIR Bioinformatics and Biotechnology, Journal Year: 2024, Volume and Issue: 5, P. e64406 - e64406

Published: Sept. 25, 2024

The integration of chatbots in oncology underscores the pressing need for human-centered artificial intelligence (AI) that addresses patient and family concerns with empathy precision. Human-centered AI emphasizes ethical principles, empathy, user-centric approaches, ensuring technology aligns human values needs. This review critically examines implications using large language models (LLMs) like GPT-3 GPT-4 (OpenAI) chatbots. It how these replicate human-like patterns, impacting design systems. paper identifies key strategies ethically developing chatbots, focusing on potential biases arising from extensive datasets neural networks. Specific datasets, such as those sourced predominantly Western medical literature interactions, may introduce by overrepresenting certain demographic groups. Moreover, training methodologies LLMs, including fine-tuning processes, can exacerbate biases, leading to outputs disproportionately favor affluent or populations while neglecting marginalized communities. By providing examples biased highlights challenges LLMs present mitigation strategies. study integrating human-centric into mitigate ultimately advocating development are aligned principles capable serving diverse equitably.

Language: Английский

Citations

10