Performance of ChatGPT on Chinese national medical licensing examinations: a five-year examination evaluation study for physicians, pharmacists and nurses DOI Creative Commons
Hui Zong, Jiakun Li,

Erman Wu

et al.

BMC Medical Education, Journal Year: 2024, Volume and Issue: 24(1)

Published: Feb. 14, 2024

Abstract Background Large language models like ChatGPT have revolutionized the field of natural processing with their capability to comprehend and generate textual content, showing great potential play a role in medical education. This study aimed quantitatively evaluate comprehensively analysis performance on three types national examinations China, including National Medical Licensing Examination (NMLE), Pharmacist (NPLE), Nurse (NNLE). Methods We collected questions from Chinese NMLE, NPLE NNLE year 2017 2021. In NMLE NPLE, each exam consists 4 units, while NNLE, 2 units. The figures, tables or chemical structure were manually identified excluded by clinician. applied direct instruction strategy via multiple prompts force clear answer distinguish between single-choice multiple-choice questions. Results failed pass accuracy threshold 0.6 any over five years. Specifically, highest recorded was 0.5467, which attained both 2018 0.5599 2017. most impressive result shown 2017, an 0.5897, is also our entire evaluation. ChatGPT’s showed no significant difference different but question types. performed well range subject areas, clinical epidemiology, human parasitology, dermatology, as various topics such molecules, health management prevention, diagnosis screening. Conclusions These results indicate spanning show large future high-quality data will be required improve performance.

Language: Английский

Large language models in medicine DOI
Arun James Thirunavukarasu, Darren Shu Jeng Ting, Kabilan Elangovan

et al.

Nature Medicine, Journal Year: 2023, Volume and Issue: 29(8), P. 1930 - 1940

Published: July 17, 2023

Language: Английский

Citations

1415

MetaboAnalyst 6.0: towards a unified platform for metabolomics data processing, analysis and interpretation DOI Creative Commons
Zhiqiang Pang, Yao Lü,

Guangyan Zhou

et al.

Nucleic Acids Research, Journal Year: 2024, Volume and Issue: 52(W1), P. W398 - W406

Published: April 8, 2024

Abstract We introduce MetaboAnalyst version 6.0 as a unified platform for processing, analyzing, and interpreting data from targeted well untargeted metabolomics studies using liquid chromatography - mass spectrometry (LC–MS). The two main objectives in developing are to support tandem MS (MS2) processing annotation, the analysis of exposomics related experiments. Key features include: (i) significantly enhanced Spectra Processing module with MS2 asari algorithm; (ii) Peak Annotation based on comprehensive reference databases fragment-level annotation; (iii) new Statistical Analysis dedicated handling complex study design multiple factors or phenotypic descriptors; (iv) Causal estimating metabolite phenotype causal relations two-sample Mendelian randomization, (v) Dose-Response benchmark dose calculations. In addition, we have also improved MetaboAnalyst's visualization functions, updated its compound database sets, expanded pathway around 130 species. is freely available at https://www.metaboanalyst.ca.

Language: Английский

Citations

399

A foundation model for generalizable disease detection from retinal images DOI Creative Commons
Yukun Zhou, Mark A. Chia, Siegfried Wagner

et al.

Nature, Journal Year: 2023, Volume and Issue: 622(7981), P. 156 - 163

Published: Sept. 13, 2023

Abstract Medical artificial intelligence (AI) offers great potential for recognizing signs of health conditions in retinal images and expediting the diagnosis eye diseases systemic disorders 1 . However, development AI models requires substantial annotation are usually task-specific with limited generalizability to different clinical applications 2 Here, we present RETFound, a foundation model that learns generalizable representations from unlabelled provides basis label-efficient adaptation several applications. Specifically, RETFound is trained on 1.6 million by means self-supervised learning then adapted disease detection tasks explicit labels. We show consistently outperforms comparison prognosis sight-threatening diseases, as well incident prediction complex such heart failure myocardial infarction fewer labelled data. solution improve performance alleviate workload experts enable broad imaging.

Language: Английский

Citations

293

scGPT: toward building a foundation model for single-cell multi-omics using generative AI DOI
Haotian Cui, Xiaoming Wang, Hassaan Maan

et al.

Nature Methods, Journal Year: 2024, Volume and Issue: 21(8), P. 1470 - 1480

Published: Feb. 26, 2024

Language: Английский

Citations

255

The Current and Future State of AI Interpretation of Medical Images DOI
Pranav Rajpurkar, Matthew P. Lungren

New England Journal of Medicine, Journal Year: 2023, Volume and Issue: 388(21), P. 1981 - 1990

Published: May 24, 2023

The authors examine the advantages and limitations of current clinical radiologic AI systems, new workflows, potential effect generative large multimodal foundation models.

Language: Английский

Citations

216

Towards a general-purpose foundation model for computational pathology DOI
Richard J. Chen, Tong Ding, Ming Y. Lu

et al.

Nature Medicine, Journal Year: 2024, Volume and Issue: 30(3), P. 850 - 862

Published: March 1, 2024

Language: Английский

Citations

199

Benchmarking large language models’ performances for myopia care: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, and Google Bard DOI
Zhi Wei Lim, Krithi Pushpanathan,

Samantha Min Er Yew

et al.

EBioMedicine, Journal Year: 2023, Volume and Issue: 95, P. 104770 - 104770

Published: Aug. 23, 2023

Language: Английский

Citations

189

Towards Generalist Biomedical AI DOI Open Access
Tao Tu, Shekoofeh Azizi, Danny Driess

et al.

NEJM AI, Journal Year: 2024, Volume and Issue: 1(3)

Published: Feb. 22, 2024

BackgroundMedicine is inherently multimodal, requiring the simultaneous interpretation and integration of insights between many data modalities spanning text, imaging, genomics, more. Generalist biomedical artificial intelligence systems that flexibly encode, integrate, interpret these might better enable impactful applications ranging from scientific discovery to care delivery.MethodsTo catalyze development models, we curated MultiMedBench, a new multimodal benchmark. MultiMedBench encompasses 14 diverse tasks, such as medical question answering, mammography dermatology image interpretation, radiology report generation summarization, genomic variant calling. We then introduced Med-PaLM Multimodal (Med-PaLM M), our proof concept for generalist AI system encodes interprets including clinical language, genomics with same set model weights. To further probe capabilities limitations M, conducted radiologist evaluation model-generated (and human) chest x-ray reports.ResultsWe observed encouraging performance across scales. M reached competitive or exceeding state art on all often surpassing specialist models by wide margin. In side-by-side ranking 246 retrospective x-rays, clinicians expressed pairwise preference reports over those produced radiologists in up 40.50% cases, suggesting potential utility.ConclusionsAlthough considerable work needed validate real-world cases understand if cross-modality generalization possible, results represent milestone toward systems. (Funded Alphabet Inc. and/or subsidiary thereof.)

Language: Английский

Citations

144

Potential of ChatGPT and GPT-4 for Data Mining of Free-Text CT Reports on Lung Cancer DOI
Matthias A. Fink, Arved Bischoff, Christoph A. Fink

et al.

Radiology, Journal Year: 2023, Volume and Issue: 308(3)

Published: Sept. 1, 2023

Background The latest large language models (LLMs) solve unseen problems via user-defined text prompts without the need for retraining, offering potentially more efficient information extraction from free-text medical records than manual annotation. Purpose To compare performance of LLMs ChatGPT and GPT-4 in data mining labeling oncologic phenotypes CT reports on lung cancer by using prompts. Materials Methods This retrospective study included patients who underwent follow-up between September 2021 March 2023. A subset 25 was reserved prompt engineering to instruct extracting lesion diameters, metastatic disease, assessing progression. output fed into a rule-based natural processing pipeline match ground truth annotations four radiologists derive metrics. reasoning rated five-point Likert scale factual correctness accuracy. occurrence confabulations recorded. Statistical analyses Wilcoxon signed rank McNemar tests. Results On 424 (mean age, 65 years ± 11 [SD]; 265 male), outperformed parameters (98.6% vs 84.0%, P < .001), resulting 96% correctly mined (vs 67% ChatGPT, .001). achieved higher accuracy identification disease (98.1% [95% CI: 97.7, 98.5] 90.3% 89.4, 91.0]) generating correct labels progression (F1 score, 0.96 0.94, 0.98] 0.91 0.89, 0.94]) (both In reasoning, had scores (4.3 3.9) (4.4 3.3), with lower rate confabulation (1.7% 13.7%) (all Conclusion When prompts, demonstrated better fewer confabulations. © RSNA, 2023 Supplemental material is available this article. See also editorial Hafezi-Nejad Trivedi issue.

Language: Английский

Citations

142

A transformer-based representation-learning model with unified processing of multimodal input for clinical diagnostics DOI Open Access
Hong-Yu Zhou,

Yizhou Yu,

Chengdi Wang

et al.

Nature Biomedical Engineering, Journal Year: 2023, Volume and Issue: 7(6), P. 743 - 755

Published: June 12, 2023

Language: Английский

Citations

125