Evaluation of Radiology Residents' Reporting Skills Using Large Language Models: An Observational Study DOI Creative Commons
Natsuko Atsukawa, Hiroyuki Tatekawa, Tatsushi Oura

и другие.

medRxiv (Cold Spring Harbor Laboratory), Год журнала: 2024, Номер unknown

Опубликована: Ноя. 6, 2024

Abstract Background Large language models (LLMs) have the potential to objectively evaluate radiology resident reports; however, research on their use for feedback in training and assessment of skill development remains limited. Purpose This study aimed assess effectiveness LLMs revising reports by comparing them with verified board-certified radiologists analyze progression resident’s reporting skills over time. Materials methods To identify LLM that best aligned human radiologists, 100 were randomly selected from a total 7376 authored nine first-year residents. The evaluated based six criteria: (1) Addition missing positive findings, (2) Deletion (3) negative (4) Correction expression (5) diagnosis, (6) Proposal additional examinations or treatments. Reports segmented into four time-based terms, 900 (450 CT 450 MRI) chosen initial final terms residents’ first year. revised rates each criterion compared between last using Wilcoxon Signed-Rank test. Results Among tested, GPT-4o demonstrated highest level agreement radiologists. Significant improvements noted Criteria 1–3 when (all P < 0.023) GPT-4o. In contrast, no significant changes observed 4–6. Despite this, all criteria except Criterion 6 showed progressive enhancement Conclusion can effectively provide commonly corrected areas reports, enabling residents improve weaknesses monitor progress. Additionally, may help reduce workload radiologists’ mentors.

Язык: Английский

ChatGPT-4o's Performance in Brain Tumor Diagnosis and MRI Findings: A Comparative Analysis with Radiologists DOI
Cemre Özenbaş,

Duygu Engin,

Tayfun Altınok

и другие.

Academic Radiology, Год журнала: 2025, Номер unknown

Опубликована: Фев. 1, 2025

Язык: Английский

Процитировано

1

Performance of large language models for CAD-RADS 2.0 classification derived from cardiac CT reports DOI Creative Commons
Philipp Arnold, Maximilian Frederik Russe, Fabian Bamberg

и другие.

Journal of cardiovascular computed tomography, Год журнала: 2025, Номер unknown

Опубликована: Апрель 1, 2025

The Coronary Artery Disease-Reporting and Data System (CAD-RADS) 2.0 offers standardized guidelines for interpreting coronary artery disease in cardiac CT. Accurate consistent CAD-RADS scoring is crucial comprehensive characterization clinical decision-making. This study investigates the capability of large language models (LLMs) to autonomously generate scores from CT reports. A dataset reports was created evaluate performance several state-of-the-art LLMs generating via in-context learning. tested comprised GPT-3.5, GPT-4o, Mistral 7b, Mixtral 8 ​× ​7b, Llama3 8b, 8b with a 64k context length, 70b. generated each model were compared ground truth, which provided by two board-certified cardiothoracic radiologists consensus based on final set 200 GPT-4o 70b achieved highest accuracy full including all modifiers rate 93 ​% 92.5 ​%, respectively, followed ​7b 78 ​%. In contrast, older LLMs, such as 7b GPT-3.5 performed poorly (16 ​%) demonstrated intermediate results an 41.5 enhanced learning are capable excellent accuracy, potentially enhancing both efficiency consistency reporting. Open-source not only deliver competitive but also present benefit local hosting, mitigating concerns around data security.

Язык: Английский

Процитировано

1

Can ChatGPT detect breast cancer on mammography? DOI
Deniz Esin Tekcan Şanlı, Ahmet Necati Şanlı, Düzgün Yıldırım

и другие.

Journal of Medical Screening, Год журнала: 2025, Номер unknown

Опубликована: Апрель 21, 2025

Some noteworthy studies have questioned the use of ChatGPT, a free artificial intelligence program that has become very popular and widespread in recent times, different branches medicine. In this study, success ChatGPT detecting breast cancer on mammography (MMG) was evaluated. The pre-treatment mammographic images patients with histopathological diagnosis invasive carcinoma prominent mass formation MMG were read separately into two subprograms: Radiologist Report Writer (P1) XrayGPT (P2). programs asked to determine density, tumor size, side, quadrant, presence microcalcification, distortion, skin or nipple changes, axillary lymphadenopathy (LAP), BI-RADS score. responses evaluated consensus by experienced radiologists. Although detection rate both over 60%, determining size localization, LAP low. category agreement readers fair for P1 (κ:28%, 0.20< κ ≤ 0.40) moderate P2 (κ:58%, 0.40< 0.60). conclusion, while application can detect appearance better than application, is low all other related features. This casts doubt suitability current large language models image analysis screening.

Язык: Английский

Процитировано

1

Advancing clinical MRI exams with artificial intelligence: Japan’s contributions and future prospects DOI Creative Commons
Shohei Fujita, Yasutaka Fushimi, Rintaro Ito

и другие.

Japanese Journal of Radiology, Год журнала: 2024, Номер unknown

Опубликована: Ноя. 16, 2024

Abstract In this narrative review, we review the applications of artificial intelligence (AI) into clinical magnetic resonance imaging (MRI) exams, with a particular focus on Japan’s contributions to field. first part introduce various AI in optimizing different aspects MRI process, including scan protocols, patient preparation, image acquisition, reconstruction, and postprocessing techniques. Additionally, examine AI’s growing influence decision-making, particularly areas such as segmentation, radiation therapy planning, reporting assistance. By emphasizing studies conducted Japan, highlight nation’s advancement MRI. latter characteristics that make Japan unique environment for development implementation examinations. healthcare landscape is distinguished by several key factors collectively create fertile ground research development. Notably, boasts one highest densities scanners per capita globally, ensuring widespread access exam. national health insurance system plays pivotal role providing scans all citizens irrespective socioeconomic status, which facilitates collection inclusive unbiased data across diverse population. extensive screening programs, coupled collaborative initiatives like Medical Imaging Database (J-MID), enable aggregation sharing large, high-quality datasets. With its technological expertise infrastructure, well-positioned meaningful MRI–AI domain. The efforts researchers, clinicians, technology experts, those will continue advance future MRI, potentially leading improvements care efficiency.

Язык: Английский

Процитировано

4

Letter to the Editor: “Comparative analysis of GPT-4-based ChatGPT’s diagnostic performance with radiologists using real-world radiology reports of brain tumors” DOI
Yang Zhang

European Radiology, Год журнала: 2025, Номер unknown

Опубликована: Янв. 2, 2025

Язык: Английский

Процитировано

0

Reply to Letter to the Editor: “Comparative analysis of GPT-4 based ChatGPT’s diagnostic performance with radiologists using real-world radiology reports of brain tumors” DOI Creative Commons
Yasuhito Mitsuyama, Hiroyuki Tatekawa, Hirotaka Takita

и другие.

European Radiology, Год журнала: 2025, Номер unknown

Опубликована: Янв. 2, 2025

Язык: Английский

Процитировано

0

Prompting large language models for inner gains in radiology studies DOI
Partha Pratim Ray

Clinical Imaging, Год журнала: 2025, Номер 120, С. 110422 - 110422

Опубликована: Фев. 6, 2025

Язык: Английский

Процитировано

0

Evaluation of radiology residents’ reporting skills using large language models: an observational study DOI Creative Commons
Natsuko Atsukawa, Hiroyuki Tatekawa, Tatsushi Oura

и другие.

Japanese Journal of Radiology, Год журнала: 2025, Номер unknown

Опубликована: Март 8, 2025

Large language models (LLMs) have the potential to objectively evaluate radiology resident reports; however, research on their use for feedback in training and assessment of skill development remains limited. This study aimed assess effectiveness LLMs revising reports by comparing them with verified board-certified radiologists analyze progression resident's reporting skills over time. To identify LLM that best aligned human radiologists, 100 were randomly selected from 7376 authored nine first-year residents. The evaluated based six criteria: (1) addition missing positive findings, (2) deletion (3) negative (4) correction expression (5) diagnosis, (6) proposal additional examinations or treatments. Reports segmented into four time-based terms, 900 (450 CT 450 MRI) chosen initial final terms residents' first year. revised rates each criterion compared between last using Wilcoxon Signed-Rank test. Among three LLMs-ChatGPT-4 Omni (GPT-4o), Claude-3.5 Sonnet, Claude-3 Opus-GPT-4o demonstrated highest level agreement radiologists. Significant improvements noted Criteria 1-3 when (Criteria 1, 2, 3; P < 0.001, = 0.023, 0.004, respectively) GPT-4o. No significant changes observed 4-6. Despite this, all criteria except 6 showed progressive enhancement can effectively provide commonly corrected areas reports, enabling residents improve weaknesses monitor progress. Additionally, may help reduce workload radiologists' mentors.

Язык: Английский

Процитировано

0

A systematic review and meta-analysis of diagnostic performance comparison between generative AI and physicians DOI Creative Commons
Hirotaka Takita, Daijiro Kabata, Shannon L. Walston

и другие.

npj Digital Medicine, Год журнала: 2025, Номер 8(1)

Опубликована: Март 22, 2025

Abstract While generative artificial intelligence (AI) has shown potential in medical diagnostics, comprehensive evaluation of its diagnostic performance and comparison with physicians not been extensively explored. We conducted a systematic review meta-analysis studies validating AI models for tasks published between June 2018 2024. Analysis 83 revealed an overall accuracy 52.1%. No significant difference was found ( p = 0.10) or non-expert 0.93). However, performed significantly worse than expert 0.007). Several demonstrated slightly higher compared to non-experts, although the differences were significant. Generative demonstrates promising capabilities varying by model. Although it yet achieved expert-level reliability, these findings suggest enhancing healthcare delivery education when implemented appropriate understanding limitations.

Язык: Английский

Процитировано

0

Comparing Diagnostic Accuracy of Clinical Professionals and Large Language Models: Systematic Review and Meta-Analysis DOI Creative Commons

Guxue Shan,

Xiaonan Chen,

Chen Wang

и другие.

JMIR Medical Informatics, Год журнала: 2025, Номер 13, С. e64963 - e64963

Опубликована: Апрель 25, 2025

Abstract Background With the rapid development of artificial intelligence (AI) technology, especially generative AI, large language models (LLMs) have shown great potential in medical field. Through massive data training, it can understand complex texts and quickly analyze records provide health counseling diagnostic advice directly, rare diseases. However, no study has yet compared extensively discussed performance LLMs with that physicians. Objective This systematically reviewed accuracy clinical diagnosis provided reference for further application. Methods We conducted searches CNKI (China National Knowledge Infrastructure), VIP Database, SinoMed, PubMed, Web Science, Embase, CINAHL (Cumulative Index to Nursing Allied Health Literature) from January 1, 2017, present. A total 2 reviewers independently screened literature extracted relevant information. The risk bias was assessed using Prediction Model Risk Bias Assessment Tool (PROBAST), which evaluates both applicability included studies. Results 30 studies involving 19 a 4762 cases were included. quality assessment indicated high majority studies, primary cause is known case diagnosis. For optimal model, ranged 25% 97.8%, while triage 66.5% 98%. Conclusions demonstrated considerable capabilities significant application across various cases. Although their still falls short professionals, if used cautiously, they become one best intelligent assistants field human care.

Язык: Английский

Процитировано

0