Evaluating Prompt Injection Safety in Large Language Models Using the PromptBench Dataset DOI Open Access

Xiatong Sang,

Min Gu,

Haojun Chi

и другие.

Опубликована: Май 22, 2024

The safety evaluation of large language models against adversarial prompt injections introduces a novel and significant concept that addresses the critical need for robust AI systems. research presented offers comprehensive analysis Anthropic Claude Mistral Large, utilizing Microsoft PromptBench dataset to assess their resilience manipulations. demonstrated superior performance across multiple metrics, including response accuracy, context preservation, semantic consistency, highlighting effectiveness advanced mechanisms. Conversely, Large exhibited areas improvement, particularly in handling findings show importance integrating sophisticated protocols development, providing valuable insights creating secure reliable By systematically comparing models' robustness various scenarios, study contributes broader understanding paves way future advancements field.

Язык: Английский

Reducing LLM Hallucination Using Knowledge Distillation: A Case Study with Mistral Large and MMLU Benchmark DOI Creative Commons
Daniel McDonald, Rachael Papadopoulos, Leslie Benningfield

и другие.

Опубликована: Май 25, 2024

The application of knowledge distillation to reduce hallucination in large language models represents a novel and significant advancement enhancing the reliability accuracy AI-generated content. research presented demonstrates efficacy transferring from high-capacity teacher model more compact student model, leading substantial improvements exact match notable reductions rates. methodology involved use temperature scaling, intermediate layer matching, comprehensive evaluation using MMLU benchmark, which assessed model’s performance across diverse set tasks. Experimental results indicated that distilled outperformed baseline generating accurate contextually appropriate responses while maintaining computational efficiency. findings underscore potential as scalable solution for improving robustness models, making them applicable real-world scenarios demand high factual accuracy. Future directions include exploring multilingual multi-modal distillation, integrating reinforcement learning, developing refined metrics further enhance performance.

Язык: Английский

Процитировано

22

Evaluating Prompt Injection Safety in Large Language Models Using the PromptBench Dataset DOI Open Access

Xiatong Sang,

Min Gu,

Haojun Chi

и другие.

Опубликована: Май 22, 2024

The safety evaluation of large language models against adversarial prompt injections introduces a novel and significant concept that addresses the critical need for robust AI systems. research presented offers comprehensive analysis Anthropic Claude Mistral Large, utilizing Microsoft PromptBench dataset to assess their resilience manipulations. demonstrated superior performance across multiple metrics, including response accuracy, context preservation, semantic consistency, highlighting effectiveness advanced mechanisms. Conversely, Large exhibited areas improvement, particularly in handling findings show importance integrating sophisticated protocols development, providing valuable insights creating secure reliable By systematically comparing models' robustness various scenarios, study contributes broader understanding paves way future advancements field.

Язык: Английский

Процитировано

8