Cited by Enhanced Cross-Domain Named Entity Recognition of Large Language Model through Label Alignment

Investigating Hallucination Tendencies of Large Language Models in Japanese and English DOI

Hiromi Tsuruta,

Rio Sakaguchi

Research Square (Research Square), Journal Year: 2024, Volume and Issue: unknown

Published: June 4, 2024

Abstract The increasing reliance on artificial intelligence for natural language processing has brought to light the issue of hallucinations in models, where models generate content that appears plausible but is factually incorrect. Exploring comparative hallucination tendencies Japanese and English reveals significant differences, highlighting importance understanding language-specific challenges model performance. A rigorous methodology was employed quantify frequency severity hallucinations, with comprehensive data collection from diverse sources both languages. Quantitative analysis indicated a higher propensity responses, attributed complex syntactical contextual structures language. Qualitative examples provided concrete illustrations errors encountered, demonstrating impact linguistic cultural factors. findings emphasize necessity more linguistically contextually rich training datasets, along advanced fact-checking mechanisms, improve reliability models. study's implications extend development tailored strategies enhancing accuracy across different languages, contributing broader goal creating robust trustworthy systems global applications.

Language: Английский

Citations

Easy Problems that LLMs Get Wrong DOI

James Huckle,

Sean Williams

Lecture notes in networks and systems, Journal Year: 2025, Volume and Issue: unknown, P. 313 - 332

Published: Jan. 1, 2025

Language: Английский

Citations

Unveiling the Role of Feed-Forward Blocks in Contextualization: An Analysis Using Attention Maps of Large Language Models DOI

Michael Tremblay,

Sarah J. Gervais,

David Maisonneuve

et al.

Published: June 17, 2024

Transformer-based models have significantly impacted the field of natural language processing, enabling high-performance applications in machine translation, summarization, and modeling. Introducing a novel analysis feed-forward blocks within Mistral Large model, this research provides critical insights into their role enhancing contextual embeddings refining attention mechanisms. By conducting comprehensive evaluation through quantitative metrics such as perplexity, BLEU, ROUGE scores, study demonstrates effectiveness fine-tuning improving model performance across diverse linguistic tasks. Detailed map revealed intricate dynamics between self-attention mechanisms blocks, highlighting latter's importance refinement. The findings demonstrate potential optimized transformer architectures advancing capabilities LLMs, emphasizing necessity domain-specific architectural enhancements. Empirical evidence presented offers deeper understanding functional contributions informing design development future LLMs to achieve superior applicability.

Language: Английский

Citations

Enhancing IoT Security: Predicting Password Vulnerability and Providing Dynamic Recommendations using Machine Learning and Large Language Models DOI

Mariam Gewida,

Yanzhen Qu

European Journal of Electrical Engineering and Computer Science, Journal Year: 2025, Volume and Issue: 9(1), P. 8 - 16

Published: Feb. 12, 2025

The rapid growth of IoT has increased security vulnerabilities, especially from weak passwords. This study aims to develop and validate a machine learning tool predict password vulnerabilities in smart home devices provide dynamic recommendations using Large Language Model (LLM). research addresses gaps existing measures by offering data-driven model that predicts provides real-time, tailored recommendations. Archival data previous research, including cracking attempts, were used train the model. Testing involved real-world adversarial scenarios, with performance evaluated accuracy, precision, recall, F1-score. findings show significant improvements recall F1-score Retrieval Augmented Generation (RAG) architecture compared baseline, suggesting RAG’s potential enhancing security. Organizations can use this improve their infrastructure’s security, reducing risks

Language: Английский

Citations

Efficient Conceptual Knowledge Removal in Large Language Models: Methods and Evaluations DOI

Miyim Dimitriou,

Daniel Rogowski,

Michael C. Anderson

et al.

Research Square (Research Square), Journal Year: 2024, Volume and Issue: unknown

Published: Oct. 8, 2024

Abstract The increasing use of deep neural networks has led to models that accumulate vast amounts knowledge from their training data, often retaining outdated or biased information needs be selectively removed. Novel techniques are required efficiently erase specific conceptual these while maintaining overall performance and avoiding computationally expensive re-training processes. This paper introduces a scalable framework for removal through targeted weight modification sparse fine-tuning, demonstrating how representations can isolated erased without significant degradation the model's broader capabilities. methodology achieves high precision in suppression by leveraging probing gradient-based optimization, ensuring minimal disruption general task performance. Extensive experimental evaluations confirm effectiveness proposed approach, highlighting its application scenarios where adaptive model refinement is essential both accuracy ethical integrity. Contributions field include development flexible efficient mechanism erasure, applicable across various architectures, minimizes computational overhead enhancing responsiveness dynamic requirements.

Language: Английский

Citations

Elevating the Inference Performance of LLMs with Reverse Inference Federation DOI

Qinian Li,

Yuetian Gu

Research Square (Research Square), Journal Year: 2024, Volume and Issue: unknown

Published: June 12, 2024

Abstract Natural language processing has seen impressive progress, driven by increasingly sophisticated models capable of performing complex linguistic tasks. The introduction reverse inference federation represents a novel and significant advancement in optimizing the performance these models, offering scalable solution that distributes computational workloads across multiple nodes. Detailed modifications to GPT-Neo architecture, coupled with innovative task allocation synchronization algorithms, have led substantial improvements speed, accuracy, resource utilization. Extensive experimentation rigorous statistical analysis validated effectiveness this approach, demonstrating its potential enhance efficiency scalability large models. By leveraging distributed computing techniques, addresses challenges associated real-time inference, providing robust framework ensures optimal utilization reduced latency. findings highlight transformative impact distributing tasks, setting new benchmark for optimization natural applications.

Language: Английский

Citations

Growing Smaller Language Models Using Knowledge Distillation from Larger Models DOI

Michael Featherstone,

Emily Cuthbertson,

David Appleyard

et al.

Published: June 25, 2024

The rapid development of natural language processing technologies has necessitated models that are both high-performing and computationally efficient, posing a challenge for resource-constrained environments. Knowledge distillation, technique where smaller model learns from larger pre-trained model, offers novel significant solution by enhancing the capabilities while maintaining reduced computational footprint. This research explores application knowledge distillation to finetune GPT-Neo using Mistral Large, resulting in notable improvements accuracy, precision, recall, F1-score across tasks such as text generation, translation, summarization, question-answering. Comprehensive evaluations demonstrated substantial reductions inference time, memory usage, energy consumption, highlighting practical benefits approach. finetuned exhibited enhanced linguistic proficiency, coherence, fluency, contextual underscoring effectiveness optimizing performance. findings validate robust method advancing technologies, ensuring high performance environments with limited resources.

Language: Английский

Citations

Enhanced Cross-Domain Named Entity Recognition of Large Language Model through Label Alignment DOI

E. J. Ashworth,

B.L. Holman,

Jacob Coulson

et al.

Published: Aug. 1, 2024

Named Entity Recognition (NER) is a crucial component in extracting structured information from unstructured text across various domains. A novel approach has been developed to address the variability domain-specific annotations through integration of unified label schema, significantly enhancing cross-domain NER performance. The study involved comprehensive modifications Mistral Large model, including adjustments its architecture, output layer, and loss function, incorporate aligned schema effectively. methodology encompassed rigorous data collection, preprocessing, evaluation processes, ensuring robust model training validation. Evaluation metrics such as precision, recall, F1-score, accuracy demonstrated substantial improvements, validating efficacy alignment algorithm. research highlights model's ability generalize entity recognition capabilities diverse domains, making it adaptable linguistic contextual details. implications extend numerous applications reliant on accurate recognition, retrieval, question answering, knowledge base population, demonstrating broader impact findings. Through these significant advancements, contributes development more intelligent adaptive systems capable handling complexities evolving textual environments.

Language: Английский

Citations