Steering veridical large language model analyses by correcting and enriching generated database queries: first steps toward ChatGPT bioinformatics DOI Creative Commons
Olivier Cinquin

Briefings in Bioinformatics, Год журнала: 2024, Номер 26(1)

Опубликована: Ноя. 22, 2024

Large language models (LLMs) leverage factual knowledge from pretraining. Yet this remains incomplete and sometimes challenging to retrieve-especially in scientific domains not extensively covered pretraining datasets where information is still evolving. Here, we focus on genomics bioinformatics. We confirm expand upon issues with plain ChatGPT functioning as a bioinformatics assistant. Poor data retrieval hallucination lead err, do incorrect sequence manipulations. To address this, propose system basing LLM outputs up-to-date, authoritative facts facilitating LLM-guided analysis. Specifically, introduce NagGPT, middleware tool insert between LLMs databases, designed bridge gaps usage of database application programming interfaces. NagGPT proxies LLM-generated queries, special handling queries. It acts gatekeeper query responses the prompt, redirecting large files but providing synthesized snippet injecting comments steer LLM. A companion OpenAI custom GPT, Genomics Fetcher-Analyzer, connects NagGPT. steers generate run Python code, performing tasks dynamically retrieved dozen common databases (e.g. NCBI, Ensembl, UniProt, WormBase, FlyBase). implement partial mitigations for encountered challenges: detrimental interactions code generation style analysis, confusion identifiers, both actions taken. Our results identify avenues augment assistant and, more broadly, improve accuracy instruction following unmodified LLMs.

Язык: Английский

FhGenie: A Custom, Confidentiality-Preserving Chat AI for Corporate and Scientific Use DOI
Ingo Weber,

Hendrik Linka,

Daniel Mertens

и другие.

Опубликована: Июнь 4, 2024

Язык: Английский

Процитировано

0

Automatic code generation based on Abstract Syntax-based encoding. Application on malware detection code generation based on MITRE ATT&CK techniques DOI Creative Commons

Alexandru-Gabriel Sîrbu,

Gabriela Czibula

Expert Systems with Applications, Год журнала: 2024, Номер unknown, С. 125821 - 125821

Опубликована: Ноя. 1, 2024

Язык: Английский

Процитировано

0

Machine learning opportunities for nucleosynthesis studies DOI Creative Commons
Michael S. Smith, Dan Lu

Frontiers in Astronomy and Space Sciences, Год журнала: 2024, Номер 11

Опубликована: Дек. 5, 2024

Nuclear astrophysics is an interdisciplinary field focused on exploring the impact of nuclear physics evolution and explosions stars cosmic creation elements. While researchers in are separately using machine learning approaches to advance studies their fields, there currently little use astrophysics. We briefly describe most common types algorithms, then detail numerous possible uses astrophysics, with a focus simulation-based nucleosynthesis studies. show that offers novel, complementary, creative address many important puzzles, potential initiate new frontier research.

Язык: Английский

Процитировано

0

Steering veridical large language model analyses by correcting and enriching generated database queries: first steps toward ChatGPT bioinformatics DOI Creative Commons
Olivier Cinquin

Briefings in Bioinformatics, Год журнала: 2024, Номер 26(1)

Опубликована: Ноя. 22, 2024

Large language models (LLMs) leverage factual knowledge from pretraining. Yet this remains incomplete and sometimes challenging to retrieve-especially in scientific domains not extensively covered pretraining datasets where information is still evolving. Here, we focus on genomics bioinformatics. We confirm expand upon issues with plain ChatGPT functioning as a bioinformatics assistant. Poor data retrieval hallucination lead err, do incorrect sequence manipulations. To address this, propose system basing LLM outputs up-to-date, authoritative facts facilitating LLM-guided analysis. Specifically, introduce NagGPT, middleware tool insert between LLMs databases, designed bridge gaps usage of database application programming interfaces. NagGPT proxies LLM-generated queries, special handling queries. It acts gatekeeper query responses the prompt, redirecting large files but providing synthesized snippet injecting comments steer LLM. A companion OpenAI custom GPT, Genomics Fetcher-Analyzer, connects NagGPT. steers generate run Python code, performing tasks dynamically retrieved dozen common databases (e.g. NCBI, Ensembl, UniProt, WormBase, FlyBase). implement partial mitigations for encountered challenges: detrimental interactions code generation style analysis, confusion identifiers, both actions taken. Our results identify avenues augment assistant and, more broadly, improve accuracy instruction following unmodified LLMs.

Язык: Английский

Процитировано

0