Cited by Using GPT-4o for CAD-RADS feature extraction and categorization with free-text coronary CT Angiography reports (Preprint)

The impact of large language models on radiology: a guide for radiologists on the latest innovations in AI DOI

Takeshi Nakaura, Rintaro Ito, Daiju Ueda

et al.

Japanese Journal of Radiology, Journal Year: 2024, Volume and Issue: 42(7), P. 685 - 696

Published: March 29, 2024

Abstract The advent of Deep Learning (DL) has significantly propelled the field diagnostic radiology forward by enhancing image analysis and interpretation. introduction Transformer architecture, followed development Large Language Models (LLMs), further revolutionized this domain. LLMs now possess potential to automate refine workflow, extending from report generation assistance in diagnostics patient care. integration multimodal technology with could potentially leapfrog these applications unprecedented levels. However, come unresolved challenges such as information hallucinations biases, which can affect clinical reliability. Despite issues, legislative guideline frameworks have yet catch up technological advancements. Radiologists must acquire a thorough understanding technologies leverage LLMs’ fullest while maintaining medical safety ethics. This review aims aid that endeavor.

Language: Английский

Citations

ChatGPT in radiology: A systematic review of performance, pitfalls, and future perspectives DOI

Pedram Keshavarz,

Sara Bagherieh,

Seyed Ali Nabipoorashrafi

et al.

Diagnostic and Interventional Imaging, Journal Year: 2024, Volume and Issue: 105(7-8), P. 251 - 265

Published: April 27, 2024

The purpose of this study was to systematically review the reported performances ChatGPT, identify potential limitations, and explore future directions for its integration, optimization, ethical considerations in radiology applications.

Language: Английский

Citations

Diagnostic Performance Comparison between Generative AI and Physicians: A Systematic Review and Meta-Analysis DOI

Hirotaka Takita, Daijiro Kabata, Shannon L. Walston

et al.

medRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: Jan. 22, 2024

Abstract Background The rapid advancement of generative artificial intelligence (AI) has led to the wide dissemination models with exceptional understanding and generation human language. Their integration into healthcare shown potential for improving medical diagnostics, yet a comprehensive diagnostic performance evaluation AI comparison their that physicians not been extensively explored. Methods In this systematic review meta-analysis, search Medline, Scopus, Web Science, Cochrane Central, MedRxiv was conducted studies published from June 2018 through December 2023, focusing on those validate tasks. risk bias assessed using Prediction Model Study Risk Bias Assessment Tool. Meta-regression performed summarize compare accuracy physicians. Results resulted in 54 being included meta-analysis. Nine were evaluated across 17 specialties. quality assessment indicated high majority studies, primarily due small sample sizes. overall 56.9% (95% confidence interval [CI]: 51.0–62.7%). meta-analysis demonstrated that, average, exceeded (difference accuracy: 14.4% [95% CI: 4.9–23.8%], p-value =0.004). However, both Prometheus (Bing) GPT-4 showed slightly better compared non-experts (-2.3% -27.0–22.4%], = 0.848 -0.32% -14.4–13.7%], 0.962), but underperformed when experts (10.9% -13.1–35.0%], 0.356 12.9% 0.15–25.7%], 0.048). sub-analysis revealed significantly improved fields Gynecology, Pediatrics, Orthopedic surgery, Plastic Otolaryngology, while showing reduced Neurology, Psychiatry, Rheumatology, Endocrinology General Medicine. No significant heterogeneity observed based bias. Conclusions Generative exhibits promising capabilities, varying by model specialty. Although they have reached reliability expert physicians, findings suggest enhance delivery education, provided are integrated caution limitations well-understood. Key Points Question: What is how does physicians? Findings: This found pooled interval: exceeds all specialties, however, some comparable non-expert Meaning: suggests do match level experienced may applications education.

Language: Английский

Citations

Impact of Multimodal Prompt Elements on Diagnostic Performance of GPT-4V in Challenging Brain MRI Cases DOI

Severin Schramm,

Silas Preis,

Marie‐Christin Metz

et al.

Radiology, Journal Year: 2025, Volume and Issue: 314(1)

Published: Jan. 1, 2025

Textual descriptions of radiologic image findings play a critical role in GPT-4 with vision–based differential diagnosis, underlining the importance radiologist experts even multimodal large language models.

Language: Английский

Citations

Climate change and artificial intelligence in healthcare: Review and recommendations towards a sustainable future DOI

Daiju Ueda, Shannon L. Walston, Shohei Fujita

et al.

Diagnostic and Interventional Imaging, Journal Year: 2024, Volume and Issue: 105(11), P. 453 - 459

Published: June 24, 2024

The rapid advancement of artificial intelligence (AI) in healthcare has revolutionized the industry, offering significant improvements diagnostic accuracy, efficiency, and patient outcomes. However, increasing adoption AI systems also raises concerns about their environmental impact, particularly context climate change. This review explores intersection change healthcare, examining challenges posed by energy consumption carbon footprint systems, as well potential solutions to mitigate impact. highlights energy-intensive nature model training deployment, contribution data centers greenhouse gas emissions, generation electronic waste. To address these challenges, development energy-efficient models, green computing practices, integration renewable sources are discussed solutions. emphasizes role optimizing workflows, reducing resource waste, facilitating sustainable practices such telemedicine. Furthermore, importance policy governance frameworks, global initiatives, collaborative efforts promoting is explored. concludes outlining best for including eco-design, lifecycle assessment, responsible management, continuous monitoring improvement. As industry continues embrace technologies, prioritizing sustainability responsibility crucial ensure that benefits realized while actively contributing preservation our planet.

Language: Английский

Citations

Diagnostic accuracy of vision-language models on Japanese diagnostic radiology, nuclear medicine, and interventional radiology specialty board examinations DOI

Tatsushi Oura, Hiroyuki Tatekawa, Daisuke Horiuchi

et al.

Japanese Journal of Radiology, Journal Year: 2024, Volume and Issue: 42(12), P. 1392 - 1398

Published: July 20, 2024

Abstract Purpose The performance of vision-language models (VLMs) with image interpretation capabilities, such as GPT-4 omni (GPT-4o), vision (GPT-4V), and Claude-3, has not been compared remains unexplored in specialized radiological fields, including nuclear medicine interventional radiology. This study aimed to evaluate compare the diagnostic accuracy various VLMs, + GPT-4V, GPT-4o, Claude-3 Sonnet, Opus, using Japanese radiology, medicine, radiology (JDR, JNM, JIR, respectively) board certification tests. Materials methods In total, 383 questions from JDR test (358 images), 300 JNM (92 322 JIR (96 images) 2019 2023 were consecutively collected. rates Opus calculated for all or images. VLMs McNemar’s test. Results GPT-4o demonstrated highest across evaluations (all questions, 49%; images, 48%), 64%; 59%), tests 43%; 34%), followed by 40%; 38%), 42%; 43%), 30%). For showed that significantly outperformed other P < 0.007), except 0.001), Conclusion had success images JDR,

Language: Английский

Citations

Comparative analysis of GPT-4-based ChatGPT’s diagnostic performance with radiologists using real-world radiology reports of brain tumors DOI

Yasuhito Mitsuyama, Hiroyuki Tatekawa, Hirotaka Takita

et al.

European Radiology, Journal Year: 2024, Volume and Issue: unknown

Published: Aug. 28, 2024

Large language models like GPT-4 have demonstrated potential for diagnosis in radiology. Previous studies investigating this primarily utilized quizzes from academic journals. This study aimed to assess the diagnostic capabilities of GPT-4-based Chat Generative Pre-trained Transformer (ChatGPT) using actual clinical radiology reports brain tumors and compare its performance with that neuroradiologists general radiologists.

Language: Английский

Citations

Constructing a Large Language Model to Generate Impressions from Findings in Radiology Reports DOI

Lu Zhang, Mingqian Liu, Lingyun Wang

et al.

Radiology, Journal Year: 2024, Volume and Issue: 312(3)

Published: Sept. 1, 2024

Background The specialization and complexity of radiology makes the automatic generation radiologic impressions (ie, a diagnosis with differential management recommendations) challenging. Purpose To develop large language model (LLM) that generates based on imaging findings to evaluate its performance in professional linguistic dimensions. Materials Methods Six radiologists recorded examination from August 2 31, 2023, at Shanghai General Hospital used developed LLM before routinely writing report for multiple modalities (CT, MRI, radiography, mammography) anatomic sites (cranium face, neck, chest, upper abdomen, lower vessels, bone joint, spine, breast), making necessary corrections completing impression. A subset was defined investigate cases where LLM-generated differed final radiologist by excluding identical highly similar cases. An expert panel scored five-point Likert scale (5 = strongly agree) scientific terminology, coherence, specific diagnosis, recommendations, correctness, comprehensiveness, harmlessness, lack bias. Results In this retrospective study, an pretrained using 20 GB medical general-purpose text data. fine-tuning data set comprised 1.5 data, including 800 reports paired instructions (describing output task natural language) outputs. Test included 3988 patients (median age, 56 years [IQR, 40-68 years]; 2159 male). median recall, precision, F1 score were 0.775 (IQR, 0.56-1), 0.84 0.611-1), 0.772 0.578-0.957), respectively, as reference standard. 1014 57 42-69 528 male), overall 5 5-5), ranging 4 3-5) 5-5). Conclusion generated professionally linguistically appropriate full spectrum examinations. © RSNA, 2024

Language: Английский

Citations

Automated classification of brain MRI reports using fine-tuned large language models DOI

Jun Kanzawa, Koichiro Yasaka, Nana Fujita

et al.

Neuroradiology, Journal Year: 2024, Volume and Issue: unknown

Published: July 12, 2024

Abstract Purpose This study aimed to investigate the efficacy of fine-tuned large language models (LLM) in classifying brain MRI reports into pretreatment, posttreatment, and nontumor cases. Methods retrospective included 759, 284, 164 for training, validation, test dataset. Radiologists stratified three groups: (group 1), posttreatment tumor 2), pretreatment 3) A pretrained Bidirectional Encoder Representations from Transformers Japanese model was using training dataset evaluated on validation The which demonstrated highest accuracy selected as final model. Two additional radiologists were involved datasets groups. model’s performance compared that two radiologists. Results LLM attained an overall 0.970 (95% CI: 0.930–0.990). sensitivity group 1/2/3 1.000/0.864/0.978. specificity group1/2/3 0.991/0.993/0.958. No statistically significant differences found terms accuracy, sensitivity, between human readers ( p ≥ 0.371). completed classification task approximately 20–26-fold faster than area under receiver operating characteristic curve discriminating groups 2 3 1 0.994 0.982–1.000) 0.992 0.982–1.000). Conclusion Fine-tuned a comparable with reports, while requiring substantially less time.

Language: Английский

Citations

Evaluating GPT-4o's Performance in the Official European Board of Radiology Exam: A Comprehensive Assessment DOI

Muhammed Said Beşler, Laura Oleaga,

Vanesa Junquero

et al.

Academic Radiology, Journal Year: 2024, Volume and Issue: unknown

Published: Sept. 1, 2024

Language: Английский

Citations