How to Design, Create, and Evaluate an Instruction-Tuning Dataset for Large Language Model Training in Health Care: Tutorial From a Clinical Perspective (Preprint) DOI
Wojciech Nazar,

Grzegorz Nazar,

Aleksandra Kamińska

и другие.

Опубликована: Дек. 23, 2024

UNSTRUCTURED High-quality data are critical in health care, forming the cornerstone for accurate diagnoses, effective treatment plans, and reliable conclusions. Similarly, high-quality datasets underpin development performance of large language models (LLMs). Among these, instruction-tuning (ITDs) used instruction fine-tuning have been pivotal enhancing LLM generalization capabilities across diverse tasks. This tutorial provides a comprehensive guide to designing, creating, evaluating ITDs care applications. Written from clinical perspective, it aims make concepts accessible broad audience, especially medical practitioners. Key topics include identifying useful sources, defining characteristics well-designed datasets, crafting instruction-input-output examples. We explore practical approaches dataset construction, examining advantages limitations 3 primary methods: fully manual preparation by expert annotators, synthetic generation using artificial intelligence (AI), an innovative hybrid approach which experts draft initial AI generates additional data. Moreover, we discuss strategies metadata selection human evaluation ensure quality effectiveness ITDs. By integrating these elements, this structured framework establishing It bridges technical domains, supporting continued interdisciplinary advancement medicine. Additionally, address current practices propose future directions, emphasizing need global, unified also argue that general (AGI), if realized, will not replace empirical research AGI depend on human-curated process apply knowledge. At same time, likely remain most method supplying knowledge AGI, positioning them as tool AI-driven care.

Язык: Английский

Revolutionizing diagnosis of pulmonary Mycobacterium tuberculosis based on CT: a systematic review of imaging analysis through deep learning DOI Creative Commons
Fei Zhang,

Hui Han,

Minglin Li

и другие.

Frontiers in Microbiology, Год журнала: 2025, Номер 15

Опубликована: Янв. 8, 2025

The mortality rate associated with Mycobacterium tuberculosis (MTB) has seen a significant rise in regions heavily affected by the disease over past few decades. traditional methods for diagnosing and differentiating (TB) remain thorny issues, particularly areas high TB epidemic inadequate resources. Processing numerous images can be time-consuming tedious. Therefore, there is need automatic segmentation classification technologies based on lung computed tomography (CT) scans to expedite enhance diagnosis of TB, enabling rapid secure identification condition. Deep learning (DL) offers promising solution automatically segmenting classifying CT scans, expediting enhancing diagnosis. This review evaluates diagnostic accuracy DL modalities pulmonary (PTB) after searching PubMed Web Science databases using preferred reporting items systematic reviews meta-analyses (PRISMA) guidelines. Seven articles were found included review. While been widely used achieved great success CT-based PTB diagnosis, are still challenges addressed opportunities explored, including data scarcity, model generalization, interpretability, ethical concerns. Addressing these requires augmentation, interpretable models, moral frameworks, clinical validation. Further research should focus developing robust generalizable establishing guidelines, conducting validation studies. holds promise transforming improving patient outcomes.

Язык: Английский

Процитировано

0

Research progress of MRI-based radiomics in hepatocellular carcinoma DOI Creative Commons
Xiaoyun Xie, Rong Chen

Frontiers in Oncology, Год журнала: 2025, Номер 15

Опубликована: Фев. 6, 2025

Primary liver cancer (PLC), notably hepatocellular carcinoma (HCC), stands as a formidable global health challenge, ranking the sixth most prevalent malignant tumor and third leading cause of cancer-related deaths. HCC presents daunting clinical landscape characterized by nonspecific early symptoms late-stage detection, contributing to its poor prognosis. Moreover, limited efficacy existing treatments high recurrence rates post-surgery compound challenges in managing this disease. While histopathologic examination remains cornerstone for diagnosis, utility guiding preoperative decisions is constrained. Radiomics, an emerging field, harnesses high-throughput imaging data, encompassing shape, texture, intensity features, alongside parameters, elucidate disease characteristics through advanced computational techniques such machine learning statistical modeling. MRI radiomics specifically holds significant importance diagnosis treatment (HCC). This study aims evaluate methodology delineate advancements facilitated MRI-based realm treatment. A systematic review literature was conducted, peer-reviewed articles published between July 2018 Jan 2025, sourced from PubMed Google Scholar. Key search terms included Hepatocellular carcinoma, HCC, Liver cancer, Magnetic resonance imaging, MRI, radiomics, deep learning, artificial intelligence. comprehensive analysis 93 underscores noninvasive modality, across various facets management. These encompass differentiation, subtype classification, histopathological grading, prediction microvascular invasion (MVI), assessment response, prognostication, metastasis prediction. emerges promising adjunctive tool detection personalized decision-making, with overarching goal optimizing patient outcomes. Nevertheless, current lack interpretability within field imperative continued research validation efforts.

Язык: Английский

Процитировано

0

A Review of Recent Artificial Intelligence for Traditional Medicine DOI Creative Commons
Chengbin Hou, Yifan Gao, Xinyu Lin

и другие.

Journal of Traditional and Complementary Medicine, Год журнала: 2025, Номер unknown

Опубликована: Фев. 1, 2025

Язык: Английский

Процитировано

0

Management of psychological emergency cases on social media: A hybrid approach combining knowledge graphs and graph neural networks DOI
Mourad Ellouze, Sonda Rekik, Lamia Hadrich Belguith

и другие.

Online Social Networks and Media, Год журнала: 2025, Номер 46, С. 100308 - 100308

Опубликована: Март 5, 2025

Язык: Английский

Процитировано

0

Optimizing the efficiency and effectiveness of data quality assurance in a multicenter clinical dataset DOI
Anne Fu,

Trong Shen,

Surain B. Roberts

и другие.

Journal of the American Medical Informatics Association, Год журнала: 2025, Номер unknown

Опубликована: Март 13, 2025

Abstract Objectives Electronic health records (EHRs) data are increasingly used for research and analysis, but there is little empirical evidence to inform how automated manual assessments can be combined efficiently assess quality in large EHR repositories. Materials Methods The GEMINI database collected from 462 226 patient admissions across 32 hospitals 2021 2023. We report issues identified through semi-automated completed during the collection phase. conducted a simulation experiment evaluate relationship between number of reviewed manually, detection true errors (true positives) chart abstraction (false that required unnecessary investigation. Results 79 requiring correction, which 14 had impact, affecting at least 50% data. After resolving assessments, validation 2676 encounters 19 4 new meaningful (3 transfusion 1 physician identifiers), distributed hospitals. There were 365 errors, investigation by analysts identify as “false positives.” These increased linearly with charts manually. Simulation results demonstrate all 3 95% sensitivity after review 5 records, whereas 18 needed physician’s table. Discussion Conclusion approach represents scalable framework assessment improvement multisite databases. Manual important minimized optimize trade-off false identification errors.

Язык: Английский

Процитировано

0

Trust at every step: Embedding trust quality gates into the visual data exploration loop for machine learning-based clinical decision support systems DOI Creative Commons
Dario Antweiler, Georg Fuchs

Computers & Graphics, Год журнала: 2025, Номер unknown, С. 104212 - 104212

Опубликована: Апрель 1, 2025

Язык: Английский

Процитировано

0

AI models in clinical neonatology: a review of modeling approaches and a consensus proposal for standardized reporting of model performance DOI
Ameena Husain, Lindsey A. Knake, Brynne A. Sullivan

и другие.

Pediatric Research, Год журнала: 2024, Номер unknown

Опубликована: Дек. 17, 2024

Язык: Английский

Процитировано

2

Data quality assurance practices in research data repositories—A systematic literature review DOI
Besiki Stvilia, Y J Pang, Dong Joon Lee

и другие.

Journal of the Association for Information Science and Technology, Год журнала: 2024, Номер unknown

Опубликована: Авг. 7, 2024

Abstract Data quality issues can significantly hinder research reproducibility, data sharing, and reuse. At the forefront of addressing are repositories (RDRs). This study conducted a systematic analysis assurance (DQA) practices in RDRs, guided by activity theory literature, resulting conceptualizing model (DQAM) for RDRs. DQAM outlines DQA process comprising evaluation, intervention, communication activities categorizes 17 dimensions into intrinsic product‐level quality. It also details specific improvement actions products identifies essential roles, skills, standards, tools By comparing with existing models, highlights its potential to improve these models adding structure. The theoretical implication is conceptualization work RDRs that grounded comprehensive literature offers refined integration broader frameworks RDR evaluation. In practice, inform design development workflows tools. As future direction, suggests applying evaluating across various domains validate refine this further.

Язык: Английский

Процитировано

1

“Artificial histology” in colonic Neoplasia: A critical approach DOI
Gavino Faa, Matteo Fraschini, Luca Didaci

и другие.

Digestive and Liver Disease, Год журнала: 2024, Номер unknown

Опубликована: Ноя. 1, 2024

Язык: Английский

Процитировано

1

Predicting bacterial transcription factor binding sites through machine learning and structural characterization based on DNA duplex stability DOI Creative Commons
André Borges Farias, Gustavo Sganzerla Martinez, Edgardo Galán-Vásquez

и другие.

Briefings in Bioinformatics, Год журнала: 2024, Номер 25(6)

Опубликована: Сен. 23, 2024

Abstract Transcriptional factors (TFs) in bacteria play a crucial role gene regulation by binding to specific DNA sequences, thereby assisting the activation or repression of genes. Despite their central role, deciphering shape recognition bacterial TFs-DNA interactions remains an intricate challenge. A deeper understanding secondary structures could greatly enhance our knowledge how TFs recognize and interact with DNA, elucidating biological function. In this study, we employed machine learning algorithms predict transcription factor sites (TFBS) classify them as directed-repeat (DR) inverted-repeat (IR). To accomplish this, divided set TFBS nucleotide sequences size, ranging from 8 20 base pairs, converted into thermodynamic data known duplex stability (DDS). Our results demonstrate that Random Forest algorithm accurately predicts average accuracy over 82% effectively distinguishes between IR DR 89%. Interestingly, upon converting pairs several TFBS-IR DDS values, observed symmetric profile typical palindromic structure associated these architectures. This study presents novel prediction model based on characteristic may indicate respective proteins thus providing insights molecular mechanisms underlying interaction.

Язык: Английский

Процитировано

0