Using GPT-4o for CAD-RADS feature extraction and categorization with free-text coronary CT Angiography reports (Preprint) DOI

Youmei Chen,

Jie Sun,

Mengshi Dong

et al.

Published: Jan. 8, 2025

BACKGROUND Despite the Coronary Artery Reporting and Data System (CAD-RADS) providing a standardized approach, radiologists continue to favor free-text reports. This preference creates significant challenges for data extraction analysis in longitudinal studies, potentially limiting large-scale research quality assessment initiatives. OBJECTIVE To evaluate ability of GPT-4o model convert real-world coronary CT angiography (CCTA) reports into structured automatically identify CAD-RADS categories P Categories. METHODS retrospective study analyzed CCTA from January 2024 July 2024. A subset 25 was used prompt engineering instruct LLMs extracting categories, Categories, presence myocardial bridges non-calcified plaques. Reports were processed using API custom Python scripts. The ground truth established by radiologist based on 2.0 guidelines. Model performance assessed accuracy, sensitivity, specificity, F1 score. Intra-rater reliability Cohen's Kappa coefficient. RESULTS Among 999 patients (median age 66 years, range 58-74; 650 males), categorization showed accuracy 0.98-1.00, sensitivity 0.95-1.00, specificity score 0.96-1.00. Categories demonstrated 0.97-1.00, 0.90-1.00, 0.91-0.99. Myocardial bridge detection achieved 0.98 calcified plaque accuracy. values all classifications exceeded 0.98. CONCLUSIONS efficiently accurately converts data, excelling classification, burden assessment,

Language: Английский

The impact of large language models on radiology: a guide for radiologists on the latest innovations in AI DOI Creative Commons
Takeshi Nakaura, Rintaro Ito, Daiju Ueda

et al.

Japanese Journal of Radiology, Journal Year: 2024, Volume and Issue: 42(7), P. 685 - 696

Published: March 29, 2024

Abstract The advent of Deep Learning (DL) has significantly propelled the field diagnostic radiology forward by enhancing image analysis and interpretation. introduction Transformer architecture, followed development Large Language Models (LLMs), further revolutionized this domain. LLMs now possess potential to automate refine workflow, extending from report generation assistance in diagnostics patient care. integration multimodal technology with could potentially leapfrog these applications unprecedented levels. However, come unresolved challenges such as information hallucinations biases, which can affect clinical reliability. Despite issues, legislative guideline frameworks have yet catch up technological advancements. Radiologists must acquire a thorough understanding technologies leverage LLMs’ fullest while maintaining medical safety ethics. This review aims aid that endeavor.

Language: Английский

Citations

34

ChatGPT in radiology: A systematic review of performance, pitfalls, and future perspectives DOI Creative Commons

Pedram Keshavarz,

Sara Bagherieh,

Seyed Ali Nabipoorashrafi

et al.

Diagnostic and Interventional Imaging, Journal Year: 2024, Volume and Issue: 105(7-8), P. 251 - 265

Published: April 27, 2024

The purpose of this study was to systematically review the reported performances ChatGPT, identify potential limitations, and explore future directions for its integration, optimization, ethical considerations in radiology applications.

Language: Английский

Citations

27

Diagnostic Performance Comparison between Generative AI and Physicians: A Systematic Review and Meta-Analysis DOI Creative Commons
Hirotaka Takita, Daijiro Kabata, Shannon L. Walston

et al.

medRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: Jan. 22, 2024

Abstract Background The rapid advancement of generative artificial intelligence (AI) has led to the wide dissemination models with exceptional understanding and generation human language. Their integration into healthcare shown potential for improving medical diagnostics, yet a comprehensive diagnostic performance evaluation AI comparison their that physicians not been extensively explored. Methods In this systematic review meta-analysis, search Medline, Scopus, Web Science, Cochrane Central, MedRxiv was conducted studies published from June 2018 through December 2023, focusing on those validate tasks. risk bias assessed using Prediction Model Study Risk Bias Assessment Tool. Meta-regression performed summarize compare accuracy physicians. Results resulted in 54 being included meta-analysis. Nine were evaluated across 17 specialties. quality assessment indicated high majority studies, primarily due small sample sizes. overall 56.9% (95% confidence interval [CI]: 51.0–62.7%). meta-analysis demonstrated that, average, exceeded (difference accuracy: 14.4% [95% CI: 4.9–23.8%], p-value =0.004). However, both Prometheus (Bing) GPT-4 showed slightly better compared non-experts (-2.3% -27.0–22.4%], = 0.848 -0.32% -14.4–13.7%], 0.962), but underperformed when experts (10.9% -13.1–35.0%], 0.356 12.9% 0.15–25.7%], 0.048). sub-analysis revealed significantly improved fields Gynecology, Pediatrics, Orthopedic surgery, Plastic Otolaryngology, while showing reduced Neurology, Psychiatry, Rheumatology, Endocrinology General Medicine. No significant heterogeneity observed based bias. Conclusions Generative exhibits promising capabilities, varying by model specialty. Although they have reached reliability expert physicians, findings suggest enhance delivery education, provided are integrated caution limitations well-understood. Key Points Question: What is how does physicians? Findings: This found pooled interval: exceeds all specialties, however, some comparable non-expert Meaning: suggests do match level experienced may applications education.

Language: Английский

Citations

19

Impact of Multimodal Prompt Elements on Diagnostic Performance of GPT-4V in Challenging Brain MRI Cases DOI
Severin Schramm,

Silas Preis,

Marie‐Christin Metz

et al.

Radiology, Journal Year: 2025, Volume and Issue: 314(1)

Published: Jan. 1, 2025

Textual descriptions of radiologic image findings play a critical role in GPT-4 with vision–based differential diagnosis, underlining the importance radiologist experts even multimodal large language models.

Language: Английский

Citations

2

Climate change and artificial intelligence in healthcare: Review and recommendations towards a sustainable future DOI Creative Commons
Daiju Ueda, Shannon L. Walston, Shohei Fujita

et al.

Diagnostic and Interventional Imaging, Journal Year: 2024, Volume and Issue: 105(11), P. 453 - 459

Published: June 24, 2024

The rapid advancement of artificial intelligence (AI) in healthcare has revolutionized the industry, offering significant improvements diagnostic accuracy, efficiency, and patient outcomes. However, increasing adoption AI systems also raises concerns about their environmental impact, particularly context climate change. This review explores intersection change healthcare, examining challenges posed by energy consumption carbon footprint systems, as well potential solutions to mitigate impact. highlights energy-intensive nature model training deployment, contribution data centers greenhouse gas emissions, generation electronic waste. To address these challenges, development energy-efficient models, green computing practices, integration renewable sources are discussed solutions. emphasizes role optimizing workflows, reducing resource waste, facilitating sustainable practices such telemedicine. Furthermore, importance policy governance frameworks, global initiatives, collaborative efforts promoting is explored. concludes outlining best for including eco-design, lifecycle assessment, responsible management, continuous monitoring improvement. As industry continues embrace technologies, prioritizing sustainability responsibility crucial ensure that benefits realized while actively contributing preservation our planet.

Language: Английский

Citations

17

Diagnostic accuracy of vision-language models on Japanese diagnostic radiology, nuclear medicine, and interventional radiology specialty board examinations DOI Creative Commons
Tatsushi Oura, Hiroyuki Tatekawa, Daisuke Horiuchi

et al.

Japanese Journal of Radiology, Journal Year: 2024, Volume and Issue: 42(12), P. 1392 - 1398

Published: July 20, 2024

Abstract Purpose The performance of vision-language models (VLMs) with image interpretation capabilities, such as GPT-4 omni (GPT-4o), vision (GPT-4V), and Claude-3, has not been compared remains unexplored in specialized radiological fields, including nuclear medicine interventional radiology. This study aimed to evaluate compare the diagnostic accuracy various VLMs, + GPT-4V, GPT-4o, Claude-3 Sonnet, Opus, using Japanese radiology, medicine, radiology (JDR, JNM, JIR, respectively) board certification tests. Materials methods In total, 383 questions from JDR test (358 images), 300 JNM (92 322 JIR (96 images) 2019 2023 were consecutively collected. rates Opus calculated for all or images. VLMs McNemar’s test. Results GPT-4o demonstrated highest across evaluations (all questions, 49%; images, 48%), 64%; 59%), tests 43%; 34%), followed by 40%; 38%), 42%; 43%), 30%). For showed that significantly outperformed other P < 0.007), except 0.001), Conclusion had success images JDR,

Language: Английский

Citations

15

Comparative analysis of GPT-4-based ChatGPT’s diagnostic performance with radiologists using real-world radiology reports of brain tumors DOI Creative Commons
Yasuhito Mitsuyama, Hiroyuki Tatekawa, Hirotaka Takita

et al.

European Radiology, Journal Year: 2024, Volume and Issue: unknown

Published: Aug. 28, 2024

Large language models like GPT-4 have demonstrated potential for diagnosis in radiology. Previous studies investigating this primarily utilized quizzes from academic journals. This study aimed to assess the diagnostic capabilities of GPT-4-based Chat Generative Pre-trained Transformer (ChatGPT) using actual clinical radiology reports brain tumors and compare its performance with that neuroradiologists general radiologists.

Language: Английский

Citations

14

Constructing a Large Language Model to Generate Impressions from Findings in Radiology Reports DOI
Lu Zhang, Mingqian Liu, Lingyun Wang

et al.

Radiology, Journal Year: 2024, Volume and Issue: 312(3)

Published: Sept. 1, 2024

Background The specialization and complexity of radiology makes the automatic generation radiologic impressions (ie, a diagnosis with differential management recommendations) challenging. Purpose To develop large language model (LLM) that generates based on imaging findings to evaluate its performance in professional linguistic dimensions. Materials Methods Six radiologists recorded examination from August 2 31, 2023, at Shanghai General Hospital used developed LLM before routinely writing report for multiple modalities (CT, MRI, radiography, mammography) anatomic sites (cranium face, neck, chest, upper abdomen, lower vessels, bone joint, spine, breast), making necessary corrections completing impression. A subset was defined investigate cases where LLM-generated differed final radiologist by excluding identical highly similar cases. An expert panel scored five-point Likert scale (5 = strongly agree) scientific terminology, coherence, specific diagnosis, recommendations, correctness, comprehensiveness, harmlessness, lack bias. Results In this retrospective study, an pretrained using 20 GB medical general-purpose text data. fine-tuning data set comprised 1.5 data, including 800 reports paired instructions (describing output task natural language) outputs. Test included 3988 patients (median age, 56 years [IQR, 40-68 years]; 2159 male). median recall, precision, F1 score were 0.775 (IQR, 0.56-1), 0.84 0.611-1), 0.772 0.578-0.957), respectively, as reference standard. 1014 57 42-69 528 male), overall 5 5-5), ranging 4 3-5) 5-5). Conclusion generated professionally linguistically appropriate full spectrum examinations. © RSNA, 2024

Language: Английский

Citations

14

Automated classification of brain MRI reports using fine-tuned large language models DOI Creative Commons
Jun Kanzawa, Koichiro Yasaka, Nana Fujita

et al.

Neuroradiology, Journal Year: 2024, Volume and Issue: unknown

Published: July 12, 2024

Abstract Purpose This study aimed to investigate the efficacy of fine-tuned large language models (LLM) in classifying brain MRI reports into pretreatment, posttreatment, and nontumor cases. Methods retrospective included 759, 284, 164 for training, validation, test dataset. Radiologists stratified three groups: (group 1), posttreatment tumor 2), pretreatment 3) A pretrained Bidirectional Encoder Representations from Transformers Japanese model was using training dataset evaluated on validation The which demonstrated highest accuracy selected as final model. Two additional radiologists were involved datasets groups. model’s performance compared that two radiologists. Results LLM attained an overall 0.970 (95% CI: 0.930–0.990). sensitivity group 1/2/3 1.000/0.864/0.978. specificity group1/2/3 0.991/0.993/0.958. No statistically significant differences found terms accuracy, sensitivity, between human readers ( p ≥ 0.371). completed classification task approximately 20–26-fold faster than area under receiver operating characteristic curve discriminating groups 2 3 1 0.994 0.982–1.000) 0.992 0.982–1.000). Conclusion Fine-tuned a comparable with reports, while requiring substantially less time.

Language: Английский

Citations

12

Evaluating GPT-4o's Performance in the Official European Board of Radiology Exam: A Comprehensive Assessment DOI
Muhammed Said Beşler, Laura Oleaga,

Vanesa Junquero

et al.

Academic Radiology, Journal Year: 2024, Volume and Issue: unknown

Published: Sept. 1, 2024

Language: Английский

Citations

9