Efficiency and Performance Optimization in Large Language Models through IB Fine-Tuning DOI Open Access

Ashly Ann Jo,

Ebin Deni Raj, Jayakrushna Sahoo

et al.

ACM Transactions on Intelligent Systems and Technology, Journal Year: 2025, Volume and Issue: unknown

Published: Feb. 18, 2025

In the rapidly evolving field of Natural Language Processing (NLP), optimizing methods for fine-tuning Large Models (LLMs) is increasingly critical improving generalization and performance. Fine-tuning LLMs challenging due to high costs, overfitting, difficulty adapting diverse tasks. These challenges grow as scale, making traditional inefficient expensive. To address these issues, a novel Information Bottleneck (IB) method proposed, focusing on retaining only most relevant information in model’s internal representations. By striking balance between compression predictive relevance, IB aims reduce overfitting enhance generalization. This approach also integrates reinforcement learning continual LLM performance further. The proposed framework considers two key metrics: (1) effectiveness, which reduces redundancy improves generalization, (2) ensures task-specific scheme achieves scalable across NLP tasks using lightweight proxy model computational efficiency. empirical evaluations ablation studies show that accuracy while significantly reducing enabling efficient, interpretable, adaptable optimization increasing convergence.

Language: Английский

GPT (Generative Pre-Trained Transformer)— A Comprehensive Review on Enabling Technologies, Potential Applications, Emerging Challenges, and Future Directions DOI Creative Commons
Gokul Yenduri,

M. Ramalingam,

G. Chemmalar Selvi

et al.

IEEE Access, Journal Year: 2024, Volume and Issue: 12, P. 54608 - 54649

Published: Jan. 1, 2024

The Generative Pre-trained Transformer (GPT) represents a notable breakthrough in the domain of natural language processing, which is propelling us toward development machines that can understand and communicate using manner closely resembles humans. GPT based on transformer architecture, deep neural network designed for processing tasks. Due to their impressive performance tasks ability effectively converse, have gained significant popularity among researchers industrial communities, making them one most widely used effective models related fields, motivated conduct this review. This review provides detailed overview GPT, including its working process, training procedures, enabling technologies, impact various applications. In review, we also explored potential challenges limitations GPT. Furthermore, discuss solutions future directions. Overall, paper aims provide comprehensive understanding applications, emerging challenges, solutions.

Language: Английский

Citations

156

Leveraging Generative AI and Large Language Models: A Comprehensive Roadmap for Healthcare Integration DOI Open Access
Ping Yu, Hua Xu, Xia Hu

et al.

Healthcare, Journal Year: 2023, Volume and Issue: 11(20), P. 2776 - 2776

Published: Oct. 20, 2023

Generative artificial intelligence (AI) and large language models (LLMs), exemplified by ChatGPT, are promising for revolutionizing data information management in healthcare medicine. However, there is scant literature guiding their integration non-AI professionals. This study conducts a scoping review to address the critical need guidance on integrating generative AI LLMs into medical practices. It elucidates distinct mechanisms underpinning these technologies, such as Reinforcement Learning from Human Feedback (RLFH), including few-shot learning chain-of-thought reasoning, which differentiates them traditional, rule-based systems. requires an inclusive, collaborative co-design process that engages all pertinent stakeholders, clinicians consumers, achieve benefits. Although global research examining both opportunities challenges, ethical legal dimensions, offer advancements enhancing management, retrieval, decision-making processes. Continued innovation acquisition, model fine-tuning, prompt strategy development, evaluation, system implementation imperative realizing full potential of technologies. Organizations should proactively engage with technologies improve quality, safety, efficiency, adhering guidelines responsible application.

Language: Английский

Citations

127

HuatuoGPT, Towards Taming Language Model to Be a Doctor DOI Creative Commons
Hongbo Zhang, Junying Chen, Feng Jiang

et al.

Published: Jan. 1, 2023

Hongbo Zhang, Junying Chen, Feng Jiang, Fei Yu, Zhihong Guiming Jianquan Li, Xiangbo Wu, Zhang Zhiyi, Qingying Xiao, Xiang Wan, Benyou Wang, Haizhou Li. Findings of the Association for Computational Linguistics: EMNLP 2023.

Language: Английский

Citations

82

BioLORD-2023: semantic textual representations fusing large language models and clinical knowledge graph insights DOI Creative Commons
François Remy, Kris Demuynck, Thomas Demeester

et al.

Journal of the American Medical Informatics Association, Journal Year: 2024, Volume and Issue: 31(9), P. 1844 - 1855

Published: Feb. 27, 2024

Abstract Objective In this study, we investigate the potential of large language models (LLMs) to complement biomedical knowledge graphs in training semantic for and clinical domains. Materials Methods Drawing on wealth Unified Medical Language System graph harnessing cutting-edge LLMs, propose a new state-of-the-art approach obtaining high-fidelity representations concepts sentences, consisting 3 steps: an improved contrastive learning phase, novel self-distillation weight averaging phase. Results Through rigorous evaluations diverse downstream tasks, demonstrate consistent substantial improvements over previous state art textual similarity (STS), concept representation (BCR), clinically named entity linking, across 15+ datasets. Besides our model English, also distill release multilingual compatible with 50+ languages finetuned 7 European languages. Discussion Many pipelines can benefit from latest models. Our enables range advancements learning, opening avenue bioinformatics researchers around world. As result, hope see BioLORD-2023 becoming precious tool future applications. Conclusion article, introduced BioLORD-2023, STS BCR designed domain.

Language: Английский

Citations

19

Enhancing Accuracy in Large Language Models Through Dynamic Real-Time Information Injection DOI Open Access
Qian Ouyang,

Shiyu Wang,

Bing Wang

et al.

Published: Dec. 26, 2023

This study presents a novel approach to enhance Large Language Models (LLMs) like Alpaca by dynamically integrating real-time information. method addresses the issue of content hallucination and data relevancy automatically collecting current from credible sources into model prompts. Experiments show significant improvement in accuracy decrease hallucination, with manageable increase response time. The research underscores potential integration making LLMs more accurate contextually relevant, setting foundation for future advancements dynamic processing AI.

Language: Английский

Citations

32

Me-LLaMA: Foundation Large Language Models for Medical Applications DOI Creative Commons
Qianqian Xie, Qingyu Chen, Aokun Chen

et al.

Research Square (Research Square), Journal Year: 2024, Volume and Issue: unknown

Published: May 22, 2024

Abstract Recent advancements in large language models (LLMs) such as ChatGPT and LLaMA have hinted at their potential to revolutionize medical applications, yet application clinical settings often reveals limitations due a lack of specialized training on medical-specific data. In response this challenge, study introduces Me-LLaMA, novel LLM family that includes foundation – Me-LLaMA 13/70B, along with chat-enhanced versions 13/70B-chat, developed through continual pre-training instruction tuning LLaMA2 using datasets. Our methodology leverages comprehensive domain-specific data suite, including large-scale, dataset 129B tokens, an 214k samples, new evaluation benchmark (MIBE) across six critical tasks 12 extensive the MIBE shows achieve overall better performance than existing open-source LLMs zero-shot, few-shot supervised learning abilities. With task-specific tuning, outperform 7 out 8 datasets GPT-4 5 addition, we investigated catastrophic forgetting problem, our results show other mitigating issue. is one largest use both biomedical It exhibits superior general compared LLMs, rendering it attractive choice for AI applications. We release models, datasets, scripts at: https://github.com/BIDS-Xu-Lab/Me-LLaMA.

Language: Английский

Citations

15

Global insights and the impact of generative AI-ChatGPT on multidisciplinary: a systematic review and bibliometric analysis DOI Creative Commons
Nauman Khan, Zahid A. Khan, Anis Koubâa

et al.

Connection Science, Journal Year: 2024, Volume and Issue: 36(1)

Published: May 16, 2024

In 2022, OpenAI's unveiling of generative AI Large Language Models (LLMs)- ChatGPT, heralded a significant leap forward in human-machine interaction through cutting-edge technologies. With its surging popularity, scholars across various fields have begun to delve into the myriad applications ChatGPT. While existing literature reviews on LLMs like ChatGPT are available, there is notable absence systematic (SLRs) and bibliometric analyses assessing research's multidisciplinary geographical breadth. This study aims bridge this gap by synthesising evaluating how has been integrated diverse research areas, focussing scope distribution studies. Through review scholarly articles, we chart global utilisation scientific domains, exploring contribution advancing paradigms adoption trends among different disciplines. Our findings reveal widespread endorsement multiple fields, with implementations healthcare (38.6%), computer science/IT (18.6%), education/research (17.3%). Moreover, our demographic analysis underscores ChatGPT's reach accessibility, indicating participation from 80 unique countries ChatGPT-related research, most frequent keyword occurrence, USA (719), China (181), India (157) leading contributions. Additionally, highlights roles institutions such as King Saud University, All Institute Medical Sciences, Taipei University pioneering dataset. not only sheds light vast opportunities challenges posed pursuits but also acts pivotal resource for future inquiries. It emphasises that (LLM) role revolutionising every field. The insights provided paper particularly valuable academics, researchers, practitioners disciplines, well policymakers looking grasp extensive impact technologies community.

Language: Английский

Citations

14

Parameter-efficient fine-tuning large language model approach for hospital discharge paper summarization DOI

Joyeeta Goswami,

Kaushal Kumar Prajapati,

Ashim Saha

et al.

Applied Soft Computing, Journal Year: 2024, Volume and Issue: 157, P. 111531 - 111531

Published: March 24, 2024

Language: Английский

Citations

13

Large language models for structured reporting in radiology: past, present, and future DOI Creative Commons
Felix Busch, Lena Hoffmann, Daniel Santos

et al.

European Radiology, Journal Year: 2024, Volume and Issue: unknown

Published: Oct. 23, 2024

Abstract Structured reporting (SR) has long been a goal in radiology to standardize and improve the quality of reports. Despite evidence that SR reduces errors, enhances comprehensiveness, increases adherence guidelines, its widespread adoption limited. Recently, large language models (LLMs) have emerged as promising solution automate facilitate SR. Therefore, this narrative review aims provide an overview LLMs for beyond. We found current literature on is limited, comprising ten studies generative pre-trained transformer (GPT)-3.5 ( n = 5) and/or GPT-4 8), while two additionally examined performance Perplexity Bing Chat or IT5. All reported results acknowledged potential SR, with six out demonstrating feasibility multilingual applications. Building upon these findings, we discuss limitations, regulatory challenges, further applications report processing, encompassing four main areas: documentation, translation summarization, clinical evaluation, data mining. In conclusion, underscores transformative efficiency accuracy processing. Key Points Question How can help make more ubiquitous ? Findings Current leveraging sparse but shows results, including . Clinical relevance transform processing enable However, their future role practice depends overcoming limitations opaque algorithms training

Language: Английский

Citations

13

Fairness in Large Language Models: A Taxonomic Survey DOI

Zhibo Chu,

Zichong Wang, Wenbin Zhang

et al.

ACM SIGKDD Explorations Newsletter, Journal Year: 2024, Volume and Issue: 26(1), P. 34 - 48

Published: July 24, 2024

Large Language Models (LLMs) have demonstrated remarkable success across various domains. However, despite their promising performance in numerous real-world applications, most of these algorithms lack fairness considerations. Consequently, they may lead to discriminatory outcomes against certain communities, particularly marginalized populations, prompting extensive study fair LLMs. On the other hand, LLMs, contrast traditional machine learning, entails exclusive backgrounds, taxonomies, and fulfillment techniques. To this end, survey presents a comprehensive overview recent advances existing literature concerning Specifically, brief introduction LLMs is provided, followed by an analysis factors contributing bias Additionally, concept discussed categorically, summarizing metrics for evaluating promoting fairness. Furthermore, resources including toolkits datasets, are summarized. Finally, research challenges open questions discussed.

Language: Английский

Citations

12