A Hybrid CNN-TransXNet Approach for Advanced Glomerular Segmentation in Renal Histology Imaging DOI Creative Commons
Yangtao Liu

International Journal of Computational Intelligence Systems, Journal Year: 2024, Volume and Issue: 17(1)

Published: May 21, 2024

Abstract In the specialized field of renal histology, precise segmentation glomeruli in microscopic images is crucial for accurate clinical diagnosis and pathological analysis. Facing challenge discerning complex visual features, such as shape, texture, size within these images, we introduce a novel model that innovatively combines convolutional neural networks (CNNs) with advanced TransXNet block, specifically tailored glomerular segmentation. This innovative designed to capture intricate details broader contextual features ensuring comprehensive process. The model's architecture unfolds two primary phases: down-sampling phase, which utilizes CNNs structures block meticulous extraction detailed up-sampling employs deconvolution techniques restore spatial resolution enhance macroscopic feature representation. A critical innovation our implementation residual connections between phases, facilitate seamless integration minimize loss precision during image reconstruction. Experimental results demonstrate significant improvement model’s performance compared existing medical methods. We report enhancements mean Pixel Accuracy (mPA) Intersection over Union (mIoU), increases approximately 3–5% 3–8%, respectively. Additionally, segmented outputs exhibit higher subjective quality fewer noise artifacts. These findings suggest offers promising applications marking contribution domain.

Language: Английский

Pre-trained Language Models in Biomedical Domain: A Systematic Survey DOI Open Access
Benyou Wang, Qianqian Xie, Jiahuan Pei

et al.

ACM Computing Surveys, Journal Year: 2023, Volume and Issue: 56(3), P. 1 - 52

Published: Aug. 1, 2023

Pre-trained language models (PLMs) have been the de facto paradigm for most natural processing tasks. This also benefits biomedical domain: researchers from informatics, medicine, and computer science communities propose various PLMs trained on datasets, e.g., text, electronic health records, protein, DNA sequences However, cross-discipline characteristics of hinder their spreading among communities; some existing works are isolated each other without comprehensive comparison discussions. It is nontrivial to make a survey that not only systematically reviews recent advances in applications but standardizes terminology benchmarks. article summarizes progress pre-trained domain downstream Particularly, we discuss motivations introduce key concepts models. We then taxonomy categorizes them perspectives systematically. Plus, tasks exhaustively discussed, respectively. Last, illustrate limitations future trends, which aims provide inspiration research.

Language: Английский

Citations

94

Deep semi-supervised learning for medical image segmentation: A review DOI
Kai Han, Victor S. Sheng, Yuqing Song

et al.

Expert Systems with Applications, Journal Year: 2024, Volume and Issue: 245, P. 123052 - 123052

Published: Jan. 4, 2024

Language: Английский

Citations

54

ScribFormer: Transformer Makes CNN Work Better for Scribble-Based Medical Image Segmentation DOI
Zihan Li, Yuan Zheng,

Dandan Shan

et al.

IEEE Transactions on Medical Imaging, Journal Year: 2024, Volume and Issue: 43(6), P. 2254 - 2265

Published: Feb. 7, 2024

Most recent scribble-supervised segmentation methods commonly adopt a CNN framework with an encoder-decoder architecture. Despite its multiple benefits, this generally can only capture small-range feature dependency for the convolutional layer local receptive field, which makes it difficult to learn global shape information from limited provided by scribble annotations. To address issue, paper proposes new CNN-Transformer hybrid solution medical image called ScribFormer. The proposed ScribFormer model has triple-branch structure, i.e., of branch, Transformer and attention-guided class activation map (ACAM) branch. Specifically, branch collaborates fuse features learned representations obtained Transformer, effectively overcome limitations existing methods. Furthermore, ACAM assists in unifying shallow convolution deep improve model's performance further. Extensive experiments on two public datasets one private dataset show that our superior over state-of-the-art methods, achieves even better results than fully-supervised code is released at https://github.com/HUANGLIZI/ScribFormer.

Language: Английский

Citations

35

All-in-SAM: from Weak Annotation to Pixel-wise Nuclei Segmentation with Prompt-based Finetuning DOI Open Access
Can Cui, Ruining Deng, Quan Liu

et al.

Journal of Physics Conference Series, Journal Year: 2024, Volume and Issue: 2722(1), P. 012012 - 012012

Published: March 1, 2024

The Segment Anything Model (SAM) is a recently proposed prompt-based segmentation model in generic zero-shot approach. With the capacity, SAM achieved impressive flexibility and precision on various tasks. However, current pipeline requires manual prompts during inference stage, which still resource intensive for biomedical image segmentation. In this paper, instead of using we introduce that utilizes SAM, called all-in-SAM, through entire AI development workflow (from annotation generation to finetuning) without requiring stage. Specifically, first employed generate pixel-level annotations from weak (e.g., points, bounding box). Then, are used finetune rather than training scratch. Our experimental results reveal two key findings: 1) surpasses state-of-the-art methods nuclei task public Monuseg dataset, 2) utilization few finetuning achieves competitive performance compared strong pixelwise annotated data.

Language: Английский

Citations

23

Natural language processing for chest X‐ray reports in the transformer era: BERT‐like encoders for comprehension and GPT‐like decoders for generation DOI Creative Commons
Han Yuan

iRadiology, Journal Year: 2025, Volume and Issue: unknown

Published: Jan. 6, 2025

We conducted a comprehensive literature search in PubMed to illustrate the current landscape of transformer-based tools from perspective transformer's two integral components: encoder exemplified by BERT and decoder characterized GPT. Also, we discussed adoption barriers potential solutions terms computational burdens, interpretability concerns, ethical issues, hallucination problems, malpractice, legal liabilities. hope that this commentary will serve as foundational introduction for radiologists seeking explore evolving technical chest X-ray report analysis transformer era. Natural language processing (NLP) has gained widespread use computer-assisted (CXR) analysis, particularly since renaissance deep learning (DL) 2012 ImageNet challenge. While early endeavors predominantly employed recurrent neural networks (RNN) convolutional (CNN) [1], revolution is brought [2] its success can be attributed three key factors [3]. First, self-attention mechanism enables simultaneous multiple parts an input sequence, offering significantly greater efficiency compared earlier models such RNN [4]. Second, architecture exhibits exceptional scalability, supporting with over 100 billion parameters capture intricate linguistic relationships human [5]. Third, availability vast internet-based corpus advances power have made pre-training fine-tuning large-scale feasible [6]. The development resolution previously intractable problems achieves expert-level performance across broad range CXR analytical tasks, name entity recognition, question answering, extractive summarization [7]. In commentary, (Figure 1) landscape, barriers, handling comprehension managing generation. As our primary focus NLP, classification criteria or was based on text modules excluded research purely focusing vision transformers (ViT). Literature pipeline identify relevant articles published June 12, 2017, when model first introduced, October 4, 2024. followed previous systematic reviews [3, 8, 9] design groups keywords: (1) "transformer"; (2) "clinical notes", reports", narratives", text", "medical text"; (3) "natural processing", "text mining", "information extraction"; (4) "radiography", "chest film", radiograph", "radiograph", "X-rays". means communication between referring physicians, reports contain high-density information patients' conditions [10]. Much like physicians interpreting reports, step NLP understanding content important application explicitly converting it into format suitable subsequent tasks. One notable [11], which stands bidirectional representations transformers. contrast predecessors rely large amounts expert annotations supervised [12], undergoes self-supervised training unlabeled datasets understand patterns subsequently fine-tuned small set target task [12, 13], yielding superior [14], recognition [15], [16], semantics optimization [17]. context healthcare, Olthof et al. [18] built evaluate varying complexities, disease prevalence, sample sizes, demonstrating statistically outperformed conventional DL CNN, area under curve F1-score, t-test p-values less than 0.05. Beyond models, adapting domain-specific further enhance effectiveness various Yan [19] adapted four BERT-like encoders using millions radiology tackle tasks: identifying sentences describe abnormal findings, assigning diagnostic codes, extracting summarize reports. Their results demonstrated adaptation yielded significant improvements accuracy, ROUGE metrics all Most BERT-relevant studies sentence-, paragraph-, report-level predictions, while are also well-suited word-level pattern recognition. Chambon [20] leveraged [21], biomedical-specific BERT, probability individual tokens containing protected health information, replaced identified sensitive synthetic surrogates ensure privacy preservation. Similarly, Weng [22] developed system utilizing ALBERT [23], lite reduced parameters, keywords unrelated thereby reducing false-positive alarms outperforming regular expression-, syntactic grammar-, DL-based baselines. BERT-derived labels applied develop targeting other modalities 13]. Nowak [24] systematically explored utility BERT-generated silver linked them corresponding radiographs image classifiers. Compared trained exclusively radiologist-annotated gold labels, integrating led improved discriminability. macro-averaged synchronous proved effective settings limited whereas silver, better cases abundant labels. Zhang [25] introduced novel approach more generalizable classifiers, rather relying predefined categories: first, they used extract entities relationships; second, constructed knowledge graph these extractions; third, refined their domain expertise. Unlike traditional multiclass established not only categorized each but revealed interpretable categories, those linking anatomical regions signs. addition deriving advanced capabilities unprecedented innovation: direct supervision pixel-level segmentation medical [26]. Li [26] proposed text-augmented lesion paradigm integrated BERT-based textual compensate deficiency radiograph quality refine pseudo semi-supervision. These highlight strength comprehending healthcare-related annotation systems multi-modality beyond text. Meanwhile, researchers failures complex clinical Sushil [27] implementations inference achieved test accuracy 0.778. adaptations textbooks 0.833, still fell short experts. Potential limitations lie relatively modest parameter size, although larger reliance inadequate corpora, books, Wikipedia, selected databases [28]. Consequently, ability learn remains constrained. shortcomings being alleviated GPT-like decoders, incorporate hundreds billions internet-scale corpora [29]. Following advent encoders, generative pre-trained (GPT) [30], next groundbreaking leap, breaks enabling non-experts perform tasks through freely conversational without any coding. CvT2DistilGPT2 [31], prominent generator era, utilizes ViT GPT-2 decoder. experiments indicated CNN GPT surpassed encoder–decoder architectures specific generation applications, state-of-the-art methods integrate decoders. TranSQ [32] framework. emulates reasoning process generating reports: formulating hypothesis embeddings represent implicit intentions, querying visual features extracted synthesizing semantic cross-modality fusion, transforming candidate DistilGPT [33]. Finally, attained BLEU-4 score 0.205 0.409. comparison, best-performing baseline among 17 retrieval 0.188 0.383, highlighting capability unified multi-modality. Though decoders dominated general domain, family long short-term memory (LSTM) [34] good partially because highly templated characteristics [32]. Kaur Mittal [35] classical architectures, feature extraction, LSTM token They modules, generate numerical inputs prior shortlist disease-relevant afterward. Results presented solution 0.767 0.897, suggesting approaches remain viable backbone scenarios. quantitative comparing outputs ground truth model-generated should supplemented evaluation Boag [36] study automated generation, divergence accuracy. A discrepancy readability been reported [37]. Accordingly, emphasize involvement rating correctness readability. sections, reviewed applications Although remarkable well-established, face problems. Some integration specialized expertise [31, 38], others necessitate resolution. demands era substantial. For example, version contains 334 million GPT-3 175 billion. contrast, support vector machines [39] random forests [40], require few hundred thousand parameters. result, many healthcare providers cannot afford costs tailoring scratch. To address this, offer several recommendations. development, suggest leverage open-access building fine-tuning, considering scales, recommend parameter-efficient technique updates subset model's leaving majority weights unchanged [41]. An exemplificative Taylor [42] empirically validated techniques within domain. advocate prompt engineering techniques, retrieval-augmented crafting informative instructive guide decoders' output changing [43]. Ranjit [44] method retrieve most contextual prompts concise accurate retaining critical entities. Last least, obtaining approval ethics committees share anonymous data facilitate collaboration external partners, helping alleviate resource burdens. including both where decisions directly impact lives. often regarded black-box simple render explainable modern layers neurons dissected visualized, providing insights functionality [45-48]. behavior challenge due complexity associated exponential scaling neuron numbers [49]. though internal activations challenging interpret, preliminary analyzing influence high degree alignment assessments [50, 51]. lies flexibility align instructions. This allows users obtain expected request explanations outputs, fostering enhanced usability [52, 53]. readers overview detailed insights, surveys [54-56]. considerations paramount transformers, given powerful nuanced datasets. concerns pressing private representative population. patient privacy, anonymizing during deployment stages neither learned [57] nor inadvertently disclosed certain [58]. Dataset representativeness issue, underrepresentation minority exacerbate disparities perpetuate inequities [59]. mitigate risk, developers prioritize inclusivity collection, maintainers continuously monitor equitable outcomes [60]. Fourth, coherent responses diverse user solving wide [61], predictive internet instead radiological well-defined logic [62]. Therefore, continue suffer hallucinations, phenomenon appears plausible factually incorrect, nonsensical, users' [63]. Current efforts broadly post-training stages. During training, strategies include in-house reinforcement guided radiologists' feedback 64]. Post-training encompass detection, knowledge, multi-agent collaboration, radiologist-in-the-loop frameworks [62, 65]. Due space constraints, encourage refer 66-68] strategies. Lastly, even after refinements, may present risks potentially leading errors liabilities [69]. Errors arise sources, inaccurate clinician nonadherence correct recommendations, poor workflows [70]. determining responsibility adverse issue stakeholders, software developers, maintenance teams, departments, [71]. European Commission focuses safety liability implications artificial intelligence, applies device laws demonstrates generally falls civil product Civil typically pertains developers. However, stops strict definitive framework inherent ambiguity algorithms questions surrounding likely addressed courts case law. Under existing frameworks, follow standard care, supplementary confirmatory substitutes practice beneficial stakeholders Additionally, departments implement tools, involve radiologists, throughout entire cycle [72], prepare in-depth programs familiarize differ routine statistical tests black boxes resist full interpretation [73]. Moreover, expectations important: unrealistic optimism, seen replacement expertise, undue pessimism, perceived no utility, avoided [74-77]. Han Yuan: Conceptualization; curation; formal analysis; investigation; project administration; validation; visualization; writing—original draft; writing—review editing. None. author declares he conflicts interest. exempt review committee does participants, animal subjects, collection. Not applicable. Data sharing apply were generated analyzed.

Language: Английский

Citations

4

SAlexNet: Superimposed AlexNet using Residual Attention Mechanism for Accurate and Efficient Automatic Primary Brain Tumor Detection and Classification DOI Creative Commons

Qurat-ul-ain Chaudhary,

Shahzad Ahmad Qureshi,

Touseef Sadiq

et al.

Results in Engineering, Journal Year: 2025, Volume and Issue: unknown, P. 104025 - 104025

Published: Jan. 1, 2025

Language: Английский

Citations

3

Open challenges and opportunities in federated foundation models towards biomedical healthcare DOI Creative Commons
Xingyu Li, Peng Lu, Yu‐Ping Wang

et al.

BioData Mining, Journal Year: 2025, Volume and Issue: 18(1)

Published: Jan. 4, 2025

This survey explores the transformative impact of foundation models (FMs) in artificial intelligence, focusing on their integration with federated learning (FL) biomedical research. Foundation such as ChatGPT, LLaMa, and CLIP, which are trained vast datasets through methods including unsupervised pretraining, self-supervised learning, instructed fine-tuning, reinforcement from human feedback, represent significant advancements machine learning. These models, ability to generate coherent text realistic images, crucial for applications that require processing diverse data forms clinical reports, diagnostic multimodal patient interactions. The incorporation FL these sophisticated presents a promising strategy harness analytical power while safeguarding privacy sensitive medical data. approach not only enhances capabilities FMs diagnostics personalized treatment but also addresses critical concerns about security healthcare. reviews current settings, underscores challenges, identifies future research directions scaling FMs, managing diversity, enhancing communication efficiency within frameworks. objective is encourage further into combined potential FL, laying groundwork healthcare innovations.

Language: Английский

Citations

2

Advancements in medical image segmentation: A review of transformer models DOI

S. S. Kumar

Computers & Electrical Engineering, Journal Year: 2025, Volume and Issue: 123, P. 110099 - 110099

Published: Jan. 22, 2025

Language: Английский

Citations

2

Vision-Language Models in medical image analysis: From simple fusion to general large models DOI
Xiang Li, Like Li, Yuchen Jiang

et al.

Information Fusion, Journal Year: 2025, Volume and Issue: unknown, P. 102995 - 102995

Published: Feb. 1, 2025

Language: Английский

Citations

2

Bridging multi-level gaps: Bidirectional reciprocal cycle framework for text-guided label-efficient segmentation in echocardiography DOI
Zhenxuan Zhang, Heye Zhang, Tieyong Zeng

et al.

Medical Image Analysis, Journal Year: 2025, Volume and Issue: 102, P. 103536 - 103536

Published: March 7, 2025

Language: Английский

Citations

2