Large language models identify causal genes in complex trait GWAS DOI Creative Commons
Suyash Shringarpure, Wei Wang,

Sotiris Karagounis

и другие.

medRxiv (Cold Spring Harbor Laboratory), Год журнала: 2024, Номер unknown

Опубликована: Май 31, 2024

Abstract Identifying underlying causal genes at significant loci from genome-wide association studies (GWAS) remains a challenging task. Literature evidence for disease-gene co-occurrence, whether through automated approaches or human expert annotation, is one way of nominating GWAS loci. However, current are limited in accuracy and generalizability, annotation not scalable to hundreds thousands findings. Here, we demonstrate that large language models (LLMs) can accurately identify likely be GWAS. By evaluating the performance GPT-3.5 GPT-4 on datasets with high-confidence gene annotations, show these outperform state-of-the-art methods identifying putative genes. These findings highlight potential LLMs augment existing discovery.

Язык: Английский

Bioinformatics and biomedical informatics with ChatGPT: Year one review DOI Open Access
Jinge Wang,

Zien Cheng,

Qiuming Yao

и другие.

Quantitative Biology, Год журнала: 2024, Номер 12(4), С. 345 - 359

Опубликована: Июнь 27, 2024

Abstract The year 2023 marked a significant surge in the exploration of applying large language model chatbots, notably Chat Generative Pre‐trained Transformer (ChatGPT), across various disciplines. We surveyed application ChatGPT bioinformatics and biomedical informatics throughout year, covering omics, genetics, text mining, drug discovery, image understanding, programming, education. Our survey delineates current strengths limitations this chatbot offers insights into potential avenues for future developments.

Язык: Английский

Процитировано

9

Benchmarking the Hallucination Tendency of Google Gemini and Moonshot Kimi DOI Open Access

Ruoxi Shan,

Qiang Ming,

Guang Hong

и другие.

Опубликована: Май 22, 2024

To evaluate the hallucination tendencies of state-of-the-art language models is crucial for improving their reliability and applicability across various domains. This article presents a comprehensive evaluation Google Gemini Kimi using HaluEval benchmark, focusing on key performance metrics such as accuracy, relevance, coherence, rate. demonstrated superior performance, particularly in maintaining low rates high contextual while Kimi, though robust, showed areas needing further refinement. The study highlights importance advanced training techniques optimization enhancing model efficiency accuracy. Practical recommendations future development are provided, emphasizing need continuous improvement rigorous to achieve reliable efficient models.

Язык: Английский

Процитировано

9

Large language models identify causal genes in complex trait GWAS DOI Creative Commons
Suyash Shringarpure, Wei Wang,

Sotiris Karagounis

и другие.

medRxiv (Cold Spring Harbor Laboratory), Год журнала: 2024, Номер unknown

Опубликована: Май 31, 2024

Abstract Identifying underlying causal genes at significant loci from genome-wide association studies (GWAS) remains a challenging task. Literature evidence for disease-gene co-occurrence, whether through automated approaches or human expert annotation, is one way of nominating GWAS loci. However, current are limited in accuracy and generalizability, annotation not scalable to hundreds thousands findings. Here, we demonstrate that large language models (LLMs) can accurately identify likely be GWAS. By evaluating the performance GPT-3.5 GPT-4 on datasets with high-confidence gene annotations, show these outperform state-of-the-art methods identifying putative genes. These findings highlight potential LLMs augment existing discovery.

Язык: Английский

Процитировано

2