A Structured Narrative Prompt for Large Language Models to Create Pertinent Narratives of Simulated Agents’ Life Events: A Sentiment Analysis Comparison DOI Open Access
Christopher J. Lynch, Erik J. Jensen, Virginia Zamponi

и другие.

Опубликована: Сен. 29, 2023

Large language models (LLMs) excel in providing natural responses that sound authoritative, reflect knowledge of the context area, and can present from a range varied perspectives. Agent Based Models Simulation consist simulated agents interact within environment to explore societal, social, ethical, among other, problems. Agents generate large volumes data over time discerning useful relevant content is an onerous task. LLMs help communicating agents’ perspectives on key events by narratives. However, these narratives need be factual, transparent, reproducible. To this end, we structured narrative prompt for sending queries LLMs. Chi-square tests Fisher’s Exact are applied assess statistically significant difference sentiment scores messages between simulation generated narratives, ChatGPT-generated real tweets. The structure effectively yields with desired components ChatGPT. This expected extensible across In 14 out 44 categories, ChatGPT which has were not discernibly different, terms statistical significance (alpha level 0.05), expressed Three outcomes provided: (1) list benefits challenges generation; (2) requesting LLM based information; (3) assessment prevalence compared indicates promise utilization helping connect agent’s experiences people.

Язык: Английский

Generative artificial intelligence in healthcare: A scoping review on benefits, challenges and applications DOI
Khadijeh Moulaei,

Atiye Yadegari,

Mahdi Baharestani

и другие.

International Journal of Medical Informatics, Год журнала: 2024, Номер 188, С. 105474 - 105474

Опубликована: Май 8, 2024

Язык: Английский

Процитировано

50

Integrating Retrieval-Augmented Generation with Large Language Models in Nephrology: Advancing Practical Applications DOI Creative Commons
Jing Miao, Charat Thongprayoon, Supawadee Suppadungsuk

и другие.

Medicina, Год журнала: 2024, Номер 60(3), С. 445 - 445

Опубликована: Март 8, 2024

The integration of large language models (LLMs) into healthcare, particularly in nephrology, represents a significant advancement applying advanced technology to patient care, medical research, and education. These have progressed from simple text processors tools capable deep understanding, offering innovative ways handle health-related data, thus improving practice efficiency effectiveness. A challenge applications LLMs is their imperfect accuracy and/or tendency produce hallucinations—outputs that are factually incorrect or irrelevant. This issue critical where precision essential, as inaccuracies can undermine the reliability these crucial decision-making processes. To overcome challenges, various strategies been developed. One such strategy prompt engineering, like chain-of-thought approach, which directs towards more accurate responses by breaking down problem intermediate steps reasoning sequences. Another one retrieval-augmented generation (RAG) strategy, helps address hallucinations integrating external enhancing output relevance. Hence, RAG favored for tasks requiring up-to-date, comprehensive information, clinical decision making educational applications. In this article, we showcase creation specialized ChatGPT model integrated with system, tailored align KDIGO 2023 guidelines chronic kidney disease. example demonstrates its potential providing specialized, advice, marking step reliable efficient nephrology practices.

Язык: Английский

Процитировано

47

A critical review of large language models: Sensitivity, bias, and the path toward specialized AI DOI Creative Commons
Arash Hajikhani, Carolyn Cole

Quantitative Science Studies, Год журнала: 2024, Номер 5(3), С. 736 - 756

Опубликована: Янв. 1, 2024

Abstract This paper examines the comparative effectiveness of a specialized compiled language model and general-purpose such as OpenAI’s GPT-3.5 in detecting sustainable development goals (SDGs) within text data. It presents critical review large models (LLMs), addressing challenges related to bias sensitivity. The necessity training for precise, unbiased analysis is underlined. A case study using company descriptions data set offers insight into differences between SDG detection model. While boasts broader coverage, it may identify SDGs with limited relevance companies’ activities. In contrast, zeroes on highly pertinent SDGs. importance thoughtful selection emphasized, taking account task requirements, cost, complexity, transparency. Despite versatility LLMs, use suggested tasks demanding precision accuracy. concludes by encouraging further research find balance capabilities LLMs need domain-specific expertise interpretability.

Язык: Английский

Процитировано

25

Diagnostic Performance Comparison between Generative AI and Physicians: A Systematic Review and Meta-Analysis DOI Creative Commons
Hirotaka Takita, Daijiro Kabata, Shannon L. Walston

и другие.

medRxiv (Cold Spring Harbor Laboratory), Год журнала: 2024, Номер unknown

Опубликована: Янв. 22, 2024

Abstract Background The rapid advancement of generative artificial intelligence (AI) has led to the wide dissemination models with exceptional understanding and generation human language. Their integration into healthcare shown potential for improving medical diagnostics, yet a comprehensive diagnostic performance evaluation AI comparison their that physicians not been extensively explored. Methods In this systematic review meta-analysis, search Medline, Scopus, Web Science, Cochrane Central, MedRxiv was conducted studies published from June 2018 through December 2023, focusing on those validate tasks. risk bias assessed using Prediction Model Study Risk Bias Assessment Tool. Meta-regression performed summarize compare accuracy physicians. Results resulted in 54 being included meta-analysis. Nine were evaluated across 17 specialties. quality assessment indicated high majority studies, primarily due small sample sizes. overall 56.9% (95% confidence interval [CI]: 51.0–62.7%). meta-analysis demonstrated that, average, exceeded (difference accuracy: 14.4% [95% CI: 4.9–23.8%], p-value =0.004). However, both Prometheus (Bing) GPT-4 showed slightly better compared non-experts (-2.3% -27.0–22.4%], = 0.848 -0.32% -14.4–13.7%], 0.962), but underperformed when experts (10.9% -13.1–35.0%], 0.356 12.9% 0.15–25.7%], 0.048). sub-analysis revealed significantly improved fields Gynecology, Pediatrics, Orthopedic surgery, Plastic Otolaryngology, while showing reduced Neurology, Psychiatry, Rheumatology, Endocrinology General Medicine. No significant heterogeneity observed based bias. Conclusions Generative exhibits promising capabilities, varying by model specialty. Although they have reached reliability expert physicians, findings suggest enhance delivery education, provided are integrated caution limitations well-understood. Key Points Question: What is how does physicians? Findings: This found pooled interval: exceeds all specialties, however, some comparable non-expert Meaning: suggests do match level experienced may applications education.

Язык: Английский

Процитировано

19

Digital Diagnostics: The Potential of Large Language Models in Recognizing Symptoms of Common Illnesses DOI Creative Commons

Gaurav Kumar Gupta,

Aditi Singh, Sijo Valayakkad Manikandan

и другие.

AI, Год журнала: 2025, Номер 6(1), С. 13 - 13

Опубликована: Янв. 16, 2025

This study aimed to evaluate the potential of Large Language Models (LLMs) in healthcare diagnostics, specifically their ability analyze symptom-based prompts and provide accurate diagnoses. The focused on models including GPT-4, GPT-4o, Gemini, o1 Preview, GPT-3.5, assessing performance identifying illnesses based solely provided symptoms. Symptom-based were curated from reputable medical sources ensure validity relevance. Each model was tested under controlled conditions diagnostic accuracy, precision, recall, decision-making capabilities. Specific scenarios designed explore both general high-stakes tasks. Among models, GPT-4 achieved highest demonstrating strong alignment with reasoning. Gemini excelled requiring precise decision-making. GPT-4o Preview showed balanced performance, effectively handling real-time tasks a focus precision recall. though less advanced, proved dependable for highlights strengths limitations LLMs diagnostics. While such as exhibit promise, challenges privacy compliance, ethical considerations, mitigation inherent biases must be addressed. findings suggest pathways responsibly integrating into processes enhance outcomes.

Язык: Английский

Процитировано

3

Ethical Dilemmas in Using AI for Academic Writing and an Example Framework for Peer Review in Nephrology Academia: A Narrative Review DOI Creative Commons
Jing Miao, Charat Thongprayoon, Supawadee Suppadungsuk

и другие.

Clinics and Practice, Год журнала: 2023, Номер 14(1), С. 89 - 105

Опубликована: Дек. 30, 2023

The emergence of artificial intelligence (AI) has greatly propelled progress across various sectors including the field nephrology academia. However, this advancement also given rise to ethical challenges, notably in scholarly writing. AI’s capacity automate labor-intensive tasks like literature reviews and data analysis created opportunities for unethical practices, with scholars incorporating AI-generated text into their manuscripts, potentially undermining academic integrity. This situation gives a range dilemmas that not only question authenticity contemporary endeavors but challenge credibility peer-review process integrity editorial oversight. Instances misconduct are highlighted, spanning from lesser-known journals reputable ones, even infiltrating graduate theses grant applications. subtle AI intrusion hints at systemic vulnerability within publishing domain, exacerbated by publish-or-perish mentality. solutions aimed mitigating employment academia include adoption sophisticated AI-driven plagiarism detection systems, robust augmentation an “AI scrutiny” phase, comprehensive training academics on usage, promotion culture transparency acknowledges role research. review underscores pressing need collaborative efforts among institutions foster environment application, thus preserving esteemed face rapid technological advancements. It makes plea rigorous research assess extent involvement literature, evaluate effectiveness AI-enhanced tools, understand long-term consequences utilization An example framework been proposed outline approach integrating Nephrology writing peer review. Using proactive initiatives evaluations, harmonious harnesses capabilities while upholding stringent standards can be envisioned.

Язык: Английский

Процитировано

37

Evaluating multimodal AI in medical diagnostics DOI Creative Commons
Robert Kaczmarczyk, T Wilhelm, Ron Martin

и другие.

npj Digital Medicine, Год журнала: 2024, Номер 7(1)

Опубликована: Авг. 7, 2024

This study evaluates multimodal AI models' accuracy and responsiveness in answering NEJM Image Challenge questions, juxtaposed with human collective intelligence, underscoring AI's potential current limitations clinical diagnostics. Anthropic's Claude 3 family demonstrated the highest among evaluated models, surpassing average accuracy, while decision-making outperformed all models. GPT-4 Vision Preview exhibited selectivity, responding more to easier questions smaller images longer questions.

Язык: Английский

Процитировано

12

A phenotype-based AI pipeline outperforms human experts in differentially diagnosing rare diseases using EHRs DOI Creative Commons
Xiaohao Mao, Yuying Huang, Ye Jin

и другие.

npj Digital Medicine, Год журнала: 2025, Номер 8(1)

Опубликована: Янв. 28, 2025

Abstract Rare diseases, affecting ~350 million people worldwide, pose significant challenges in clinical diagnosis due to the lack of experienced physicians and complexity differentiating between numerous rare diseases. To address these challenges, we introduce PhenoBrain, a fully automated artificial intelligence pipeline. PhenoBrain utilizes BERT-based natural language processing model extract phenotypes from texts EHRs employs five new diagnostic models for differential diagnoses The AI system was developed evaluated on diverse, multi-country disease datasets, comprising 2271 cases with 431 In 1936 test cases, achieved an average predicted top-3 recall 0.513 top-10 0.654, surpassing 13 leading prediction methods. human-computer study 75 exhibited exceptional performance 0.613 0.813, 50 specialist large like ChatGPT GPT-4. Combining PhenoBrain’s predictions specialists increased 0.768, demonstrating its potential enhance accuracy workflows.

Язык: Английский

Процитировано

1

Critical Analysis of ChatGPT 4 Omni in USMLE Disciplines, Clinical Clerkships, and Clinical Skills (Preprint) DOI Creative Commons
Brenton T. Bicknell,

Danner Butler,

Sydney Whalen

и другие.

JMIR Medical Education, Год журнала: 2024, Номер 10, С. e63430 - e63430

Опубликована: Сен. 14, 2024

Abstract Background Recent studies, including those by the National Board of Medical Examiners, have highlighted remarkable capabilities recent large language models (LLMs) such as ChatGPT in passing United States Licensing Examination (USMLE). However, there is a gap detailed analysis LLM performance specific medical content areas, thus limiting an assessment their potential utility education. Objective This study aimed to assess and compare accuracy successive versions (GPT-3.5, GPT-4, GPT-4 Omni) USMLE disciplines, clinical clerkships, skills diagnostics management. Methods used 750 vignette-based multiple-choice questions characterize (ChatGPT 3.5 [GPT-3.5], 4 [GPT-4], Omni [GPT-4o]) across (diagnostics management). Accuracy was assessed using standardized protocol, with statistical analyses conducted models’ performances. Results GPT-4o achieved highest at 90.4%, outperforming GPT-3.5, which scored 81.1% 60.0%, respectively. GPT-4o’s performances were social sciences (95.5%), behavioral neuroscience (94.2%), pharmacology (93.2%). In skills, diagnostic 92.7% management 88.8%, significantly higher than its predecessors. Notably, both outperformed student average 59.3% (95% CI 58.3‐60.3). Conclusions indicates substantial improvements over predecessors, suggesting significant for use this technology educational aid students. These findings underscore need careful consideration when integrating LLMs into education, emphasizing importance structured curricula guide appropriate ongoing critical ensure reliability effectiveness.

Язык: Английский

Процитировано

8

Comparative Analysis of Large Language Models in Emergency Plastic Surgery Decision-Making: The Role of Physical Exam Data DOI Open Access
Sahar Borna, Cesar A. Gomez-Cabello, Sophia M. Pressman

и другие.

Journal of Personalized Medicine, Год журнала: 2024, Номер 14(6), С. 612 - 612

Опубликована: Июнь 8, 2024

In the U.S., diagnostic errors are common across various healthcare settings due to factors like complex procedures and multiple providers, often exacerbated by inadequate initial evaluations. This study explores role of Large Language Models (LLMs), specifically OpenAI’s ChatGPT-4 Google Gemini, in improving emergency decision-making plastic reconstructive surgery evaluating their effectiveness both with without physical examination data. Thirty medical vignettes covering conditions such as fractures nerve injuries were used assess management responses models. These evaluated professionals against established clinical guidelines, using statistical analyses including Wilcoxon rank-sum test. Results showed that consistently outperformed Gemini diagnosis management, irrespective presence data, though no significant differences noted within each model’s performance different data scenarios. Conclusively, while demonstrates superior accuracy capabilities, addition enhancing response detail, did not significantly surpass traditional resources. underscores utility AI supporting decision-making, particularly scenarios limited suggesting its a complement to, rather than replacement for, comprehensive evaluation expertise.

Язык: Английский

Процитировано

6