NDDRF 2.0: An update and expansion of risk factor knowledge base for personalized prevention of neurodegenerative diseases DOI Creative Commons
Cheng Bi, Xin Zheng, Yuxin Zhang

и другие.

Alzheimer s & Dementia, Год журнала: 2025, Номер 21(5)

Опубликована: Май 1, 2025

Abstract INTRODUCTION Neurodegenerative diseases (NDDs) are chronic caused by brain neuron degeneration, requiring systematic integration of risk factors to address their heterogeneity. Established in 2021, Knowledgebase Risk Factors for Diseases (NDDRF) was the first knowledge base consolidate NDD factors. NDDRF 2.0 expands focus modifiable lifestyle‐related factors, enhancing utility prevention. METHODS Data from past 4 years were comprehensively updated, while lifestyle manually collected and filtered 1975 2024. Each factor embedded with International Classification codes clinical stage annotations, then re‐standardized, classified, annotated accordance Unified Medical Language System Semantic Network. RESULTS encompasses 1971 classified under 151 subcategories across 20 NDDs, including 536 covering six major categories is freely accessible at http://sysbio.org.cn/NDDRF/ . DISCUSSION As lifestyle‐specific holistic offers structured deep phenotype information, enabling personalized prevention strategies decision support. Highlights An enhanced (Knowledgebase [NDDRF] 2.0) built neurodegenerative (NDDs). provides detailed categorization phenotypes support targeted a knowledge‐driven resource that facilitates assessment proactive health management. clinicians, researchers, at‐risk populations develop implement effective strategies. can be used build chatbots large language models future.

Язык: Английский

Research on Intelligent Grading of Physics Problems Based on Large Language Models DOI Creative Commons

Yanan Wei,

Rui Zhang, Jianwei Zhang

и другие.

Education Sciences, Год журнала: 2025, Номер 15(2), С. 116 - 116

Опубликована: Янв. 21, 2025

The automation of educational and instructional assessment plays a crucial role in enhancing the quality teaching management. In physics education, calculation problems with intricate problem-solving ideas pose challenges to intelligent grading tests. This study explores automatic through combination large language models prompt engineering. By comparing performance four strategies (one-shot, few-shot, chain thought, tree thought) within two model frameworks, namely ERNIEBot-4-turbo GPT-4o. finds that thought can better assess complex (N = 100, ACC ≥ 0.9, kappa > 0.8) reduce gap between different models. research provides valuable insights for assessments education.

Язык: Английский

Процитировано

1

A Comparative Study on the Question-Answering Proficiency of Artificial Intelligence Models in Bladder-Related Conditions: An Evaluation of Gemini and ChatGPT 4.o DOI Open Access
Mustafa Azizoğlu, Sergey Klyuev

Medical Records, Год журнала: 2025, Номер 7(1), С. 201 - 205

Опубликована: Янв. 10, 2025

Aim: The rapid evolution of artificial intelligence (AI) has revolutionized medicine, with tools like ChatGPT and Google Gemini enhancing clinical decision-making. ChatGPT's advancements, particularly GPT-4, show promise in diagnostics education. However, variability accuracy limitations complex scenarios emphasize the need for further evaluation these models medical applications. This study aimed to assess agreement between 4.o AI identifying bladder-related conditions, including neurogenic bladder, vesicoureteral reflux (VUR), posterior urethral valve (PUV). Material Method: study, conducted October 2024, compared AI's on 51 questions about VUR, PUV. Questions, randomly selected from pediatric surgery urology materials, were evaluated using metrics statistical analysis, highlighting models' performance agreement. Results: demonstrated similar across PUV questions, true response rates 66.7% 68.6%, respectively, no statistically significant differences (p>0.05). Combined all topics was 67.6%. Strong inter-rater reliability (κ=0.87) highlights their Conclusion: comparable ChatGPT-4.o key performance.

Язык: Английский

Процитировано

0

Enhancing ophthalmology students’ awareness of retinitis pigmentosa: assessing the efficacy of ChatGPT in AI-assisted teaching of rare diseases—a quasi-experimental study DOI Creative Commons
Junwen Zeng, Kexin Sun, Peng Qin

и другие.

Frontiers in Medicine, Год журнала: 2025, Номер 12

Опубликована: Март 18, 2025

Retinitis pigmentosa (RP) is a rare retinal dystrophy often underrepresented in ophthalmology education. Despite advancements diagnostics and treatments like gene therapy, RP knowledge gaps persist. This study assesses the efficacy of AI-assisted teaching using ChatGPT compared to traditional methods educating students about RP. A quasi-experimental was conducted with 142 medical randomly assigned control (traditional review materials) groups. Both groups attended lecture on completed pre- post-tests. Statistical analyses learning outcomes, times, response accuracy. significantly improved post-test scores (p < 0.001), but group required less time (24.29 ± 12.62 vs. 42.54 20.43 min, p 0.0001). The also performed better complex questions regarding advanced treatments, demonstrating AI's potential deliver accurate current information efficiently. enhances efficiency comprehension diseases hybrid educational model combining AI can address gaps, offering promising approach for modern

Язык: Английский

Процитировано

0

Is artificial intelligence successful in the Turkish neurology board exam? DOI
Ayse Betul Acar, Ece Yanık, Emine Altin

и другие.

Neurological Research, Год журнала: 2025, Номер unknown, С. 1 - 4

Опубликована: Март 20, 2025

Objectives OpenAI declared that GPT-4 performed better in academic and certain specialty areas. Medical licensing exams assess the clinical competence of doctors. We aimed to investigate for first time howChatGPT will perform Turkish Neurology Proficiency Exam.

Язык: Английский

Процитировано

0

Comparative Evaluation of Advanced AI Reasoning Models in Korean National Licensing Examination OpenAI vs DeepSeek (Preprint) DOI Creative Commons
Jin-Gyu Lee, Gyeong Hoon Kim, Jei Keon Chae

и другие.

Опубликована: Март 27, 2025

UNSTRUCTURED Artificial intelligence (AI) has advanced in natural language processing and reasoning, with large models (LLMs) increasingly assessed for medical education licensing exams. Given the growing use of AI examinations, evaluating their performance on non-Western, region-specific tests like Korean Medical Licensing Examination (KMLE) is crucial assessing real-world applicability. This study compared five LLMs—GPT-4o, o1, o3-mini (OpenAI), DeepSeek-V3, DeepSeek-R1 (DeepSeek)—on KMLE. A total 150 multiple-choice questions from 2024 KMLE were extracted categorized into three domains: Local Health & Laws, Preventive Medicine, Clinical Medicine. Graph-based excluded. Each model completed independent runs via API, accuracy against official answers. Statistical differences analyzed using ANOVA, consistency was measured Fleiss' kappa coefficient. o1 achieved highest overall (94.3%), excelling Medicine (97.5%) Law (81.0%), while led (92.6%). Despite domain-specific variations, all surpassed passing criteria. For consistency, ranked (97.1%), DeepSeek-V3 (97.5%). Performance declined Law, likely due to legal complexities limited Korean-language training data. first compare OpenAI DeepSeek exam, demonstrating strong performance, ranking within top 10% human candidates. While most accurate, provided a cost-effective alternative. Future research should optimize LLMs non-English exams develop Korea-specific improve domains.

Язык: Английский

Процитировано

0

Comparative analysis of a standard (GPT-4o) and reasoning-enhanced (o1 pro) large language model on complex clinical questions from the Japanese orthopaedic board examination DOI
Joe Hasei,

Ryuichi Nakahara,

Koichi Takeuchi

и другие.

Journal of Orthopaedic Science, Год журнала: 2025, Номер unknown

Опубликована: Апрель 1, 2025

Язык: Английский

Процитировано

0

A Brief Review on Benchmarking for Large Language Models Evaluation in Healthcare DOI Creative Commons
Leona Cilar, Hongyu Chen, Aokun Chen

и другие.

Wiley Interdisciplinary Reviews Data Mining and Knowledge Discovery, Год журнала: 2025, Номер 15(2)

Опубликована: Апрель 9, 2025

ABSTRACT This paper reviews benchmarking methods for evaluating large language models (LLMs) in healthcare settings. It highlights the importance of rigorous to ensure LLMs' safety, accuracy, and effectiveness clinical applications. The review also discusses challenges developing standardized benchmarks metrics tailored healthcare‐specific tasks such as medical text generation, disease diagnosis, patient management. Ethical considerations, including privacy, data security, bias, are addressed, underscoring need multidisciplinary collaboration establish robust frameworks that facilitate reliable ethical use healthcare. Evaluation LLMs remains challenging due lack comprehensive datasets. Key concerns include model better explainability, all which impact overall trustworthiness

Язык: Английский

Процитировано

0

Overview of the Lymphoma Information Extraction and Automatic Coding Evaluation Task in CHIP 2024 DOI
Hui Zong, Liang Tao, Zuofeng Li

и другие.

Communications in computer and information science, Год журнала: 2025, Номер unknown, С. 75 - 84

Опубликована: Янв. 1, 2025

Язык: Английский

Процитировано

0

The Effectiveness of Local Fine-Tuned LLMs: Assessment of the Japanese National Examination for Pharmacists DOI
Hiroto Asano, Daisuke Takaya, Asuka Hatabu

и другие.

Research Square (Research Square), Год журнала: 2025, Номер unknown

Опубликована: Апрель 15, 2025

Abstract Large Language Models (LLMs) offer great potential for applications in healthcare and pharmaceutical fields. While cloud-based implementations are commonly used, they present challenges related to privacy cost. This study examined the performance of locally executable LLMs on Japanese National Examination Pharmacists (JNEP). Additionally, we explore feasibility creating specialized pharmacy models through fine-tuning with Low-Rank Adaptation (LoRA). Text-based questions from 97th 109th JNEP were utilized, comprising 2,421 training 165 testing. Four distinct evaluated, including Microsoft phi-4 DeepSeek R1 Distill Qwen series. Baseline was initially assessed, followed by using LoRA dataset. Model evaluated based accuracy scores achieved test In baseline evaluation against JNEP, ranged 55.15–76.36%. Notably, CyberAgent 32B passing threshold (approximately 61%). Following fine-tuning, exhibited a increase 60.61–66.06%. showed that capable handling knowledge tasks comparable those national pharmacist examination. Moreover, found techniques like can significantly enhance model performance, demonstrating robust AI specifically designed pharmacological applications. These findings contribute understanding implementing secure high-performing solutions tailored use.

Язык: Английский

Процитировано

0

NDDRF 2.0: An update and expansion of risk factor knowledge base for personalized prevention of neurodegenerative diseases DOI Creative Commons
Cheng Bi, Xin Zheng, Yuxin Zhang

и другие.

Alzheimer s & Dementia, Год журнала: 2025, Номер 21(5)

Опубликована: Май 1, 2025

Abstract INTRODUCTION Neurodegenerative diseases (NDDs) are chronic caused by brain neuron degeneration, requiring systematic integration of risk factors to address their heterogeneity. Established in 2021, Knowledgebase Risk Factors for Diseases (NDDRF) was the first knowledge base consolidate NDD factors. NDDRF 2.0 expands focus modifiable lifestyle‐related factors, enhancing utility prevention. METHODS Data from past 4 years were comprehensively updated, while lifestyle manually collected and filtered 1975 2024. Each factor embedded with International Classification codes clinical stage annotations, then re‐standardized, classified, annotated accordance Unified Medical Language System Semantic Network. RESULTS encompasses 1971 classified under 151 subcategories across 20 NDDs, including 536 covering six major categories is freely accessible at http://sysbio.org.cn/NDDRF/ . DISCUSSION As lifestyle‐specific holistic offers structured deep phenotype information, enabling personalized prevention strategies decision support. Highlights An enhanced (Knowledgebase [NDDRF] 2.0) built neurodegenerative (NDDs). provides detailed categorization phenotypes support targeted a knowledge‐driven resource that facilitates assessment proactive health management. clinicians, researchers, at‐risk populations develop implement effective strategies. can be used build chatbots large language models future.

Язык: Английский

Процитировано

0