Steering veridical large language model analyses by correcting and enriching generated database queries: first steps toward ChatGPT bioinformatics DOI Creative Commons
Olivier Cinquin

Briefings in Bioinformatics, Journal Year: 2024, Volume and Issue: 26(1)

Published: Nov. 22, 2024

Large language models (LLMs) leverage factual knowledge from pretraining. Yet this remains incomplete and sometimes challenging to retrieve-especially in scientific domains not extensively covered pretraining datasets where information is still evolving. Here, we focus on genomics bioinformatics. We confirm expand upon issues with plain ChatGPT functioning as a bioinformatics assistant. Poor data retrieval hallucination lead err, do incorrect sequence manipulations. To address this, propose system basing LLM outputs up-to-date, authoritative facts facilitating LLM-guided analysis. Specifically, introduce NagGPT, middleware tool insert between LLMs databases, designed bridge gaps usage of database application programming interfaces. NagGPT proxies LLM-generated queries, special handling queries. It acts gatekeeper query responses the prompt, redirecting large files but providing synthesized snippet injecting comments steer LLM. A companion OpenAI custom GPT, Genomics Fetcher-Analyzer, connects NagGPT. steers generate run Python code, performing tasks dynamically retrieved dozen common databases (e.g. NCBI, Ensembl, UniProt, WormBase, FlyBase). implement partial mitigations for encountered challenges: detrimental interactions code generation style analysis, confusion identifiers, both actions taken. Our results identify avenues augment assistant and, more broadly, improve accuracy instruction following unmodified LLMs.

Language: Английский

FhGenie: A Custom, Confidentiality-Preserving Chat AI for Corporate and Scientific Use DOI
Ingo Weber,

Hendrik Linka,

Daniel Mertens

et al.

Published: June 4, 2024

Language: Английский

Citations

0

Automatic code generation based on Abstract Syntax-based encoding. Application on malware detection code generation based on MITRE ATT&CK techniques DOI Creative Commons

Alexandru-Gabriel Sîrbu,

Gabriela Czibula

Expert Systems with Applications, Journal Year: 2024, Volume and Issue: unknown, P. 125821 - 125821

Published: Nov. 1, 2024

Language: Английский

Citations

0

Machine learning opportunities for nucleosynthesis studies DOI Creative Commons
Michael S. Smith, Dan Lu

Frontiers in Astronomy and Space Sciences, Journal Year: 2024, Volume and Issue: 11

Published: Dec. 5, 2024

Nuclear astrophysics is an interdisciplinary field focused on exploring the impact of nuclear physics evolution and explosions stars cosmic creation elements. While researchers in are separately using machine learning approaches to advance studies their fields, there currently little use astrophysics. We briefly describe most common types algorithms, then detail numerous possible uses astrophysics, with a focus simulation-based nucleosynthesis studies. show that offers novel, complementary, creative address many important puzzles, potential initiate new frontier research.

Language: Английский

Citations

0

Steering veridical large language model analyses by correcting and enriching generated database queries: first steps toward ChatGPT bioinformatics DOI Creative Commons
Olivier Cinquin

Briefings in Bioinformatics, Journal Year: 2024, Volume and Issue: 26(1)

Published: Nov. 22, 2024

Large language models (LLMs) leverage factual knowledge from pretraining. Yet this remains incomplete and sometimes challenging to retrieve-especially in scientific domains not extensively covered pretraining datasets where information is still evolving. Here, we focus on genomics bioinformatics. We confirm expand upon issues with plain ChatGPT functioning as a bioinformatics assistant. Poor data retrieval hallucination lead err, do incorrect sequence manipulations. To address this, propose system basing LLM outputs up-to-date, authoritative facts facilitating LLM-guided analysis. Specifically, introduce NagGPT, middleware tool insert between LLMs databases, designed bridge gaps usage of database application programming interfaces. NagGPT proxies LLM-generated queries, special handling queries. It acts gatekeeper query responses the prompt, redirecting large files but providing synthesized snippet injecting comments steer LLM. A companion OpenAI custom GPT, Genomics Fetcher-Analyzer, connects NagGPT. steers generate run Python code, performing tasks dynamically retrieved dozen common databases (e.g. NCBI, Ensembl, UniProt, WormBase, FlyBase). implement partial mitigations for encountered challenges: detrimental interactions code generation style analysis, confusion identifiers, both actions taken. Our results identify avenues augment assistant and, more broadly, improve accuracy instruction following unmodified LLMs.

Language: Английский

Citations

0