Assessing the ability of GPT-4o to visually recognize medications and provide patient education DOI Creative Commons
Amjad H. Bazzari, Firas H. Bazzari

Scientific Reports, Год журнала: 2024, Номер 14(1)

Опубликована: Ноя. 5, 2024

Various studies have investigated the ability of ChatGPT (OpenAI) to provide medication information; however, a new promising feature has now been added, which allows visual input and is yet be evaluated. Here, we aimed qualitatively assess its visually recognize medications, through picture input, patient education via written output. The responses were evaluated by accuracy, precision clarity using 4-point Likert-like scale. In regards handling providing responses, GPT-4o was able all 20 tested medications from packaging pictures, even with blurring, retrieve their active ingredients, identify formulations dosage forms detailed, concise enough, in an almost completely accurate, precise clear manner score 3.55 ± 0.605 (85%). contrast, output generated images illustrating usage instructions contained many errors that would either hinder effectiveness or cause direct harm poor 1.5 0.577 (16.7%). conclusion, capable identifying pictures exhibits contrasting performance between very impressive scores, respectively.

Язык: Английский

Reducing LLM Hallucination Using Knowledge Distillation: A Case Study with Mistral Large and MMLU Benchmark DOI Creative Commons
Daniel McDonald, Rachael Papadopoulos, Leslie Benningfield

и другие.

Опубликована: Май 25, 2024

The application of knowledge distillation to reduce hallucination in large language models represents a novel and significant advancement enhancing the reliability accuracy AI-generated content. research presented demonstrates efficacy transferring from high-capacity teacher model more compact student model, leading substantial improvements exact match notable reductions rates. methodology involved use temperature scaling, intermediate layer matching, comprehensive evaluation using MMLU benchmark, which assessed model’s performance across diverse set tasks. Experimental results indicated that distilled outperformed baseline generating accurate contextually appropriate responses while maintaining computational efficiency. findings underscore potential as scalable solution for improving robustness models, making them applicable real-world scenarios demand high factual accuracy. Future directions include exploring multilingual multi-modal distillation, integrating reinforcement learning, developing refined metrics further enhance performance.

Язык: Английский

Процитировано

20

Reducing Hallucinations in Large Language Models Through Contextual Position Encoding DOI Open Access

Sarah Desrochers,

James Wilson,

Matthew Beauchesne

и другие.

Опубликована: Май 31, 2024

In natural language processing, maintaining factual accuracy and minimizing hallucinations in text generation remain significant challenges. Contextual Position Encoding (CPE) presents a novel approach by dynamically encoding positional information based on the context of each token, significantly enhancing model's ability to generate accurate coherent text. The integration CPE into Mistral Large model resulted marked improvements precision, recall, F1-score, demonstrating superior performance over traditional methods. Furthermore, enhanced architecture effectively reduced hallucination rates, increasing reliability generated outputs. Comparative analysis with baseline models such as GPT-3 BERT confirmed efficacy CPE, highlighting its potential influence future developments LLM architecture. results underscore importance advanced techniques improving applicability large across various domains requiring high accuracy.

Язык: Английский

Процитировано

20

Combining LoRA to GPT-Neo to Reduce Large Language Model Hallucination DOI Creative Commons

Shi-han Huang,

Chia-Yu Chen

Research Square (Research Square), Год журнала: 2024, Номер unknown

Опубликована: Июнь 4, 2024

Abstract The deployment of Large Language Models (LLMs) often suffers from generating hallucinations, leading to outputs that appear plausible but are factually inaccurate or nonsensical. Incorporating Low-Rank Adaptation (LoRA) into GPT-Neo presents a novel approach mitigating these hallucinations by leveraging the efficiency low-rank approximations. This research details integration LoRA GPT-Neo, demonstrating significant improvements in predictive performance, factual accuracy, and reduction hallucination rates. augmented model shows enhanced robustness efficiency, making it more suitable for applications requiring high accuracy reliability. Through comprehensive evaluations involving perplexity, BLEU, ROUGE-L scores, qualitative analysis, study highlights model's ability generate coherent contextually appropriate text. findings demonstrate potential transform LLM reducing computational complexity memory footprint, thus facilitating use large-scale models resource-constrained environments. advancement opens new possibilities across various domains, ensuring coherence generated content.

Язык: Английский

Процитировано

15

An Evaluation of the Safety of ChatGPT with Malicious Prompt Injection DOI Creative Commons

Jiang Han,

Mingming Guo

Research Square (Research Square), Год журнала: 2024, Номер unknown

Опубликована: Май 30, 2024

Abstract Artificial intelligence systems, particularly those involving sophisticated neural network architectures like ChatGPT, have demonstrated remarkable capabilities in generating human-like text. However, the susceptibility of these systems to malicious prompt injections poses significant risks, necessitating comprehensive evaluations their safety and robustness. The study presents a novel automated framework for systematically injecting analyzing prompts assess vulnerabilities ChatGPT. Results indicate substantial rates harmful responses across various scenarios, highlighting critical areas improvement model defenses. findings underscore importance advanced adversarial training, real-time monitoring, interdisciplinary collaboration enhance ethical deployment AI systems. Recommendations future research emphasize need robust mechanisms transparent operations mitigate risks associated with inputs.

Язык: Английский

Процитировано

11

Exploiting Privacy Vulnerabilities in Open Source LLMs Using Maliciously Crafted Prompts DOI Creative Commons

Géraud Choquet,

Aimée Aizier,

Gwenaëlle Bernollin

и другие.

Research Square (Research Square), Год журнала: 2024, Номер unknown

Опубликована: Июнь 18, 2024

Abstract The proliferation of AI technologies has brought to the forefront concerns regarding privacy and security user data, particularly with increasing deployment powerful language models such as Llama. A novel concept investigated involves inducing breaches through maliciously crafted prompts, highlighting potential for these inadvertently reveal sensitive information. study systematically evaluated vulnerabilities Llama model, employing an automated framework test analyze its responses a variety inputs. Findings significant flaws, demonstrating model's susceptibility adversarial attacks that could compromise privacy. Comprehensive analysis provided insights into types prompts most effective in eliciting private demonstrates necessity robust regulatory frameworks advanced measures. implications findings are profound, calling immediate action enhance protocols LLMs protect against breaches. Enhanced oversight continuous innovation privacy-preserving techniques crucial ensuring safe various applications. derived from this research contribute deeper understanding LLM urgent need improved safeguards prevent data leakage unauthorized access.

Язык: Английский

Процитировано

11

Benchmarking the Hallucination Tendency of Google Gemini and Moonshot Kimi DOI Open Access

Ruoxi Shan,

Qiang Ming,

Guang Hong

и другие.

Опубликована: Май 22, 2024

To evaluate the hallucination tendencies of state-of-the-art language models is crucial for improving their reliability and applicability across various domains. This article presents a comprehensive evaluation Google Gemini Kimi using HaluEval benchmark, focusing on key performance metrics such as accuracy, relevance, coherence, rate. demonstrated superior performance, particularly in maintaining low rates high contextual while Kimi, though robust, showed areas needing further refinement. The study highlights importance advanced training techniques optimization enhancing model efficiency accuracy. Practical recommendations future development are provided, emphasizing need continuous improvement rigorous to achieve reliable efficient models.

Язык: Английский

Процитировано

9

Measuring the Visual Hallucination in ChatGPT on Visually Deceptive Images DOI Open Access

Linzhi Ping,

Yue Gu,

Liefeng Feng

и другие.

Опубликована: Май 28, 2024

The evaluation of visual hallucinations in multimodal AI models is novel and significant because it addresses a critical gap understanding how systems interpret deceptive inputs. study systematically assessed ChatGPT's performance on synthetic dataset visually non-deceptive images, employing both quantitative qualitative analysis. Results revealed that while ChatGPT achieved high accuracy standard recognition tasks, its diminished when faced with highlighting areas for further improvement. analysis provided insights into the model's underlying mechanisms, such as extensive pretraining sophisticated integration capabilities, which contribute to robustness against deceptions. study's findings have important implications development more reliable robust technologies, offering benchmark future evaluations practical guidelines enhancing systems.

Язык: Английский

Процитировано

9

Knowledge Accuracy and Reducing Hallucinations in LLMs via Dynamic Domain Knowledge Injection DOI Creative Commons

Roman Capellini,

Frank Atienza,

Melanie Sconfield

и другие.

Research Square (Research Square), Год журнала: 2024, Номер unknown

Опубликована: Июнь 7, 2024

Abstract Natural language processing has seen substantial progress with the development of highly sophisticated models capable understanding and generating human-like text. However, a persistent challenge remains in enhancing accuracy these when dealing domain-specific knowledge, particularly avoiding hallucinations or plausible but incorrect information. The dynamic domain knowledge injection mechanism introduced this research represents significant advancement by allowing continuous integration prioritisation specialised information, thereby improving model's performance reliability. By dynamically adjusting hidden weights GPT-Neo based on relevance accuracy, modified model achieved higher precision, recall, F1-scores, exhibited reduced hallucination rates across diverse domains such as cybersecurity, medical financial data, legal documents. A comprehensive evaluation framework, including benchmark creation metrics, validated effectiveness approach, demonstrating that can substantially enhance utility large fields. results highlight transformative potential method, offering robust pathway for more accurate contextually aware models. Detailed analysis ablation studies further elucidate contributions each component within modification process, providing critical insights into optimisation future applications innovative approach.

Язык: Английский

Процитировано

9

Improving Generalization Beyond Training Data with Compositional Generalization in Large Language Models DOI Open Access

Wong Ho-tin,

Gar-lai Yip

Опубликована: Май 20, 2024

Enhancing compositional generalization in language models addresses a crucial challenge natural processing, significantly improving their ability to understand and generate novel combinations of known concepts. The investigation utilized the Mistral 7x8B model, employing advanced data augmentation refined training methodologies enhance performance. By incorporating diverse challenging compositions during training, model demonstrated substantial gains standard evaluation metrics, including accuracy, precision, recall, F1-score. Specialized metrics such as accuracy contextual coherence also showed marked improvement, reflecting model's enhanced capacity correct contextually relevant outputs when faced with compositions. study further highlighted significant reduction hallucination rates, underscoring increased logical consistency factual accuracy. This was statistically significant, indicating robust enhancement Qualitative analysis corroborated these findings, revealing more coherent narratives accurate information retrieval generated responses. These improvements are particularly important for real-world applications where reliability appropriateness essential. comprehensive effectiveness proposed techniques, providing valuable insights into underlying mechanisms that contribute improved findings underscore importance iterative experimentation validation refining architectures techniques. advancing capabilities models, this research contributes development robust, flexible, reliable AI systems capable handling broader range linguistic tasks greater understanding.

Язык: Английский

Процитировано

8

Boosting Long-term Factuality in Large Language Model with Real-World Entity Queries DOI Creative Commons

L Davies,

Samantha Bellington

Research Square (Research Square), Год журнала: 2024, Номер unknown

Опубликована: Авг. 2, 2024

Abstract The challenge of maintaining long-term factual accuracy in response to dynamic real-world entity queries is critical for the reliability and utility AI-driven language models. novel integration external knowledge bases fact-checking mechanisms modified Llama 3 model significantly enhances its ability generate accurate contextually relevant responses. Through architectural modifications, including multi-head attention domain-specific modules, model's performance was rigorously evaluated across various metrics such as precision, recall, F1 score, contextual accuracy. extensive experimental setup, involving high-performance computing resources sophisticated training methodologies, ensured robust testing validation capabilities. Comparative analysis with baseline models demonstrated substantial improvements relevance, while error provided insights into areas requiring further refinement. findings highlight potential broader applications set new standards development reliable capable handling dynamically evolving information. Future research directions include optimizing real-time data exploring hybrid enhance factuality robustness

Язык: Английский

Процитировано

6