The Impact of Physician Variation on the Training and Performance of Deep Learning Auto-Segmentation Models: the Development of Physician Inconsistency Metrics DOI Creative Commons
Yujie Yan, Christopher E. Kehayias, John Cijiang He

et al.

Research Square (Research Square), Journal Year: 2023, Volume and Issue: unknown

Published: Aug. 21, 2023

Abstract Manual segmentation of tumors and organs-at-risk (OAR) in 3D imaging for radiation-therapy planning is time-consuming subject to variation between different observers. Artificial intelligence (AI) can assist with segmentation, but challenges exist ensuring high-quality especially small, variable structures. We investigated the effect quality style physicians training deep-learning models esophagus proposed a new metric, edge roughness, evaluating/quantifying slice-to-slice inconsistency. This study includes real-world cohort 394 patients who each received radiation therapy (mainly lung cancer). Segmentation was performed by 8 as part routine clinical care. evaluated manual comparing length roughness segmentations among analyze inconsistencies. trained six multiple- individual-physician total, based on U-Net architectures residual backbones. used volumetric Dice coefficient measure performance model. quantify shift adjacent slices calculating curvature edges 2D sagittal- coronal-view projections. The auto-segmentation model multiple (MD1-7) achieved highest mean 73.7±14.8%. (MD7) (mean ± SD: 0.106±0.016) demonstrated significantly lower test cases compared other individual (MD7: 58.5±15.8%, MD6: 67.1±16.8%, p < 0.001). An additional multiple-physician after removing MD7 data resulted fewer outliers (e.g., £ 40%: 4 MD1-6, 7 MD1-7, N total =394). demonstrates that there significant care, AI algorithms from real-world, datasets may result unexpectedly under-performing inclusion outliers. Importantly, this provides novel evaluation physician which will allow developers filter optimize performance.

Language: Английский

Development and validation of fully automated robust deep learning models for multi-organ segmentation from whole-body CT images DOI Creative Commons
Yazdan Salimi, Isaac Shiri, Zahra Mansouri

et al.

Physica Medica, Journal Year: 2025, Volume and Issue: 130, P. 104911 - 104911

Published: Feb. 1, 2025

This study aimed to develop a deep-learning framework generate multi-organ masks from CT images in adult and pediatric patients. A dataset consisting of 4082 ground-truth manual segmentation various databases, including 300 cases, were collected. In strategy#1, the provided by public databases split into training (90%) testing (10% each database named subset #1) cohort. The set was used train multiple nnU-Net networks five-fold cross-validation (CV) for 26 separate organs. next step, trained models strategy #1 missing organs entire dataset. generated data then model CV (strategy#2). Models' performance evaluated terms Dice coefficient (DSC) other well-established image metrics. lowest DSC strategy#1 0.804 ± 0.094 adrenal glands while average > 0.90 achieved 17/26 strategy#2 (0.833 0.177) obtained pancreas, whereas 13/19 For all mutual included #2, our outperformed TotalSegmentator both strategies. addition, on #3. Our with significant variability different producing acceptable results making it well-suited implementation clinical setting.

Language: Английский

Citations

3

A clinical evaluation of the performance of five commercial artificial intelligence contouring systems for radiotherapy DOI Creative Commons
Paul Doolan,

Stefanie Charalambous,

Yiannis Roussakis

et al.

Frontiers in Oncology, Journal Year: 2023, Volume and Issue: 13

Published: Aug. 4, 2023

Auto-segmentation with artificial intelligence (AI) offers an opportunity to reduce inter- and intra-observer variability in contouring, improve the quality of contours, as well time taken conduct this manual task. In work we benchmark AI auto-segmentation contours produced by five commercial vendors against a common dataset.The organ at risk (OAR) generated solutions (Mirada (Mir), MVision (MV), Radformation (Rad), RayStation (Ray) TheraPanacea (Ther)) were compared manually-drawn expert from 20 breast, head neck, lung prostate patients. Comparisons made using geometric similarity metrics including volumetric surface Dice coefficient (vDSC sDSC), Hausdorff distance (HD) Added Path Length (APL). To assess saved, manually draw correct recorded.There are differences number CT offered each solution study (Mir 99; MV 143; Rad 83; Ray 67; Ther 86), all offering some lymph node levels OARs. Averaged across structures, median vDSCs good for systems favorably existing literature: Mir 0.82; 0.88; 0.86; 0.87; 0.88. All offer substantial savings, ranging between: breast 14-20 mins; neck 74-93 20-26 35-42 mins. The averaged was similar systems: 39.8 43.6 36.6 min; 43.2 45.2 mins.All evaluated high significantly reduced could be used render radiotherapy workflow more efficient standardized.

Language: Английский

Citations

42

Review of Deep Learning Based Autosegmentation for Clinical Target Volume: Current Status and Future Directions DOI Creative Commons

Thomas Matoska,

Mira A. Patel, Hefei Liu

et al.

Advances in Radiation Oncology, Journal Year: 2024, Volume and Issue: 9(5), P. 101470 - 101470

Published: Feb. 8, 2024

PurposeManual contour work for radiation treatment planning takes significant time to ensure volumes are accurately delineated. The use of artificial intelligence with deep learning based autosegmentation (DLAS) models has made itself known in recent years alleviate this workload. It is used organs at risk (OAR) contouring consistency performance and saving. purpose study was evaluate the current published data DLAS clinical target volume (CTV) contours, identify areas improvement, discuss future directions.MethodologyA literature review performed by utilizing key words "Deep Learning" AND ("Segmentation" OR "Delineation") "Clinical Target Volume" an indexed search into PubMed. A total 154 articles on criteria were reviewed. considered model used, disease site, targets contoured, guidelines utilized, overall performance.ResultsOf 53 investigating CTV, only 6 before 2020. Publications have increased years, 46 between 2020-2023. cervix (n=19) prostate (n=12) studied most frequently. Most studies (n=43) involved a single institution. Median sample size 130 patients (range: 5-1,052). common metrics utilized measure Dice similarity coefficient (DSC) followed Hausdorff distance. Dosimetric seldom reported (n=11). There also variability specific (RTOG, ESTRO, others). had good CTV multiple sites, showing DSC values >0.7. delineated faster compared manual contouring. However, some contours still required least minor edits, require improvement.ConclusionsDLAS demonstrates capability completing plans efficiency accuracy. developed validated institutions using developing institutions. about years. Future need include larger datasets different patient demographics, stages, validation multi-institutional settings, inclusion dosimetric performance. Manual directions. Of improvement.

Language: Английский

Citations

12

Deep learning-based segmentation of ultra-low-dose CT images using an optimized nnU-Net model DOI Creative Commons
Yazdan Salimi, Zahra Mansouri, Chang Sun

et al.

La radiologia medica, Journal Year: 2025, Volume and Issue: unknown

Published: March 18, 2025

Abstract Purpose Low-dose CT protocols are widely used for emergency imaging, follow-ups, and attenuation correction in hybrid PET/CT SPECT/CT imaging. However, low-dose images often suffer from reduced quality depending on acquisition patient parameters. Deep learning (DL)-based organ segmentation models typically trained high-quality images, with limited dedicated noisy images. This study aimed to develop a DL pipeline ultra-low-dose Materials methods 274 raw datasets were reconstructed using Siemens ReconCT software ADMIRE iterative algorithm, generating full-dose (FD-CT) simulated (LD-CT) at 1%, 2%, 5%, 10% of the original tube current. Existing FD-nnU-Net segmented 22 organs FD-CT serving as reference masks training new LD-nnU-Net LD-CT Three bony tissue (6 organs), soft-tissue (15 body contour segmentation. The compared standard reference. External actual also compared. Results performance declined radiation dose, especially below (5 mAs). achieved average Dice scores 0.937 ± 0.049 (bony tissues), 0.905 0.117 (soft-tissues), 0.984 0.023 (body contour). LD outperformed FD external datasets. Conclusion Conventional performed poorly Dedicated demonstrated superior across cross-validation evaluations, enabling accurate available our GitHub page.

Language: Английский

Citations

1

NRG Oncology Assessment of Artificial Intelligence Deep Learning–Based Auto-segmentation for Radiation Therapy: Current Developments, Clinical Considerations, and Future Directions DOI
Yi Rong, Quan Chen, Yabo Fu

et al.

International Journal of Radiation Oncology*Biology*Physics, Journal Year: 2023, Volume and Issue: 119(1), P. 261 - 280

Published: Nov. 14, 2023

Language: Английский

Citations

19

Custom-Trained Deep Learning-Based Auto-Segmentation for Male Pelvic Iterative CBCT on C-Arm Linear Accelerators DOI
Riley C. Tegtmeier, Christopher J. Kutyreff,

Jennifer L. Smetanick

et al.

Practical Radiation Oncology, Journal Year: 2024, Volume and Issue: 14(5), P. e383 - e394

Published: Feb. 6, 2024

Language: Английский

Citations

4

Performance of Commercial Deep Learning-Based Auto-Segmentation Software for Prostate Cancer Radiation Therapy Planning: A Systematic Review DOI Creative Commons
Curtise K. C. Ng

Information, Journal Year: 2025, Volume and Issue: 16(3), P. 215 - 215

Published: March 11, 2025

As yet, there is no systematic review focusing on benefits and issues of commercial deep learning-based auto-segmentation (DLAS) software for prostate cancer (PCa) radiation therapy (RT) planning despite that NRG Oncology has underscored such necessity. This article’s purpose to systematically DLAS product performances PCa RT their associated evaluation methodology. A literature search was performed with the use electronic databases 7 November 2024. Thirty-two articles were included as per selection criteria. They evaluated 12 products (Carina Medical LLC INTContour (Lexington, KY, USA), Elekta AB ADMIRE (Stockholm, Sweden), Limbus AI Inc. Contour (Regina, SK, Canada), Manteia Technologies Co. AccuContour (Jian Sheng, China), MIM Software ProtégéAI (Cleveland, OH, Mirada Ltd. DLCExpert (Oxford, UK), MVision.ai Contour+ (Helsinki, Finland), Radformation AutoContour (New York, NY, RaySearch Laboratories RayStation Siemens Healthineers AG AI-Rad Companion Organs RT, syngo.via Image Suite DirectORGANS (Erlangen, Germany), Therapanacea Annotate (Paris, France), Varian Systems, Ethos (Palo Alto, CA, USA)). Their results illustrate can delineate organs at risk (abdominopelvic cavity, anal canal, bladder, body, cauda equina, left (L) right (R) femurs, L R pelvis, proximal sacrum) four clinical target volumes (prostate, lymph nodes, bed, seminal vesicle bed) clinically acceptable outcomes, resulting in delineation time reduction, 5.7–81.1%. Although recommended each centre perform its own prior implementation, seems more important due methodological respective single studies, e.g., small dataset used, etc.

Language: Английский

Citations

0

Deep learning-assisted multiple organ segmentation from whole-body CT images DOI Open Access
Yazdan Salimi, Isaac Shiri, Zahra Mansouri

et al.

medRxiv (Cold Spring Harbor Laboratory), Journal Year: 2023, Volume and Issue: unknown

Published: Oct. 21, 2023

Abstract Background Automated organ segmentation from computed tomography (CT) images facilitates a number of clinical applications, including diagnosis, monitoring treatment response, quantification, radiation therapy planning, and dosimetry. Purpose To develop novel deep learning framework to generate multi-organ masks CT for 23 different body organs. Methods A dataset consisting 3106 (649,398 axial 2D slices, 13,640 images/segment pairs) ground-truth manual various online available databases were collected. After cropping them contour, they resized, normalized used train separate models Data split (80%) test (20%) covering all the databases. Res-UNET model was trained input images. The output converted back original dimensions compared with in terms Dice Jaccard coefficients. information about positions implemented during post-processing by providing six anchor segmentations as input. Our “TotalSegmentator” through testing our on their datasets datasets. Results average coefficient before after 84.28% 83.26% respectively. index 76.17 70.60 coefficients over 90% achieved liver, heart, bones, kidneys, spleen, femur heads, lungs, aorta, eyes, brain masks. Post-processing improved performance only nine TotalSegmentator better than five organs out 15 common almost similar two Conclusions availability fast reliable tool leverages implementation setting. In this study, we developed segment multiple algorithms. presenting large variability emanating producing acceptable results even cases unusual anatomies pathologies, such splenomegaly. We recommend using these algorithms good performance. One main merits proposed is lightweight nature an inference time 1.67 seconds per case total-body image, which standard computers.

Language: Английский

Citations

6

Clinical adoption of deep learning target auto-segmentation for radiation therapy: challenges, clinical risks, and mitigation strategies DOI Creative Commons
Alessia de Biase, Nanna M. Sijtsema, Tomas Janssen

et al.

Deleted Journal, Journal Year: 2024, Volume and Issue: 1(1)

Published: Jan. 1, 2024

Abstract Radiation therapy is a localized cancer treatment that relies on precise delineation of the target to be treated and healthy tissues guarantee optimal effect. This step, known as contouring or segmentation, involves identifying both volumes organs at risk imaging modalities like CT, PET, MRI guide radiation delivery. Manual however, time-consuming highly subjective, despite presence guidelines. In recent years, automated segmentation methods, particularly deep learning models, have shown promise in addressing this task. However, challenges persist their clinical use, including need for robust quality assurance (QA) processes risks associated with use models. review examines considerations adoption auto-segmentation radiotherapy, focused volume. We discuss potential (eg, over- under-segmentation, automation bias, appropriate trust), mitigation strategies human oversight, uncertainty quantification, education professionals), we highlight importance expanding QA include geometric, dose-volume, outcome-based performance monitoring. While offers significant benefits, careful attention rigorous measures are essential its successful integration practice.

Language: Английский

Citations

2

Edge roughness quantifies impact of physician variation on training and performance of deep learning auto-segmentation models for the esophagus DOI Creative Commons
Yujie Yan, Christopher E. Kehayias, John Cijiang He

et al.

Scientific Reports, Journal Year: 2024, Volume and Issue: 14(1)

Published: Jan. 30, 2024

Manual segmentation of tumors and organs-at-risk (OAR) in 3D imaging for radiation-therapy planning is time-consuming subject to variation between different observers. Artificial intelligence (AI) can assist with segmentation, but challenges exist ensuring high-quality especially small, variable structures, such as the esophagus. We investigated effect quality style physicians training deep-learning models esophagus proposed a new metric, edge roughness, evaluating/quantifying slice-to-slice inconsistency. This study includes real-world cohort 394 patients who each received radiation therapy (mainly lung cancer). Segmentation was performed by 8 part routine clinical care. evaluated manual comparing length roughness segmentations among analyze inconsistencies. trained eight multiple- individual-physician total, based on U-Net architectures residual backbones. used volumetric Dice coefficient measure performance model. quantify shift adjacent slices calculating curvature edges 2D sagittal- coronal-view projections. The auto-segmentation model multiple (MD1-7) achieved highest mean 73.7 ± 14.8%. (MD7) (mean SD: 0.106 0.016) demonstrated significantly lower test cases compared other individual (MD7: 58.5 15.8%, MD6: 67.1 16.8%, p < 0.001). A multiple-physician after removing MD7 data resulted fewer outliers (e.g., ≤ 40%: 4 MD1-6, 7 MD1-7, Ntotal = 394). While we initially detected this pattern single clinician, validated metric across entire dataset. lowest-quantile (MDER-Q1, Ntrain 62) higher (Ntest 270) than highest-quantile ones (MDER-Q4, (MDER-Q1: 67.8 14.8%, MDER-Q4: 62.8 15.7%, demonstrates that there significant care, AI algorithms from real-world, datasets may result unexpectedly under-performing inclusion outliers. Importantly, provides novel evaluation physician which will allow developers filter optimize performance.

Language: Английский

Citations

1