A Non-Ergodic Framework for Understanding Emergent Capabilities in Large Language Models DOI Creative Commons
J. Samuel Valenzuela

Published: Jan. 17, 2025

Large language models have emergent capabilities that come unexpectedly at scale, but we need a theoretical framework to explain why and how they emerge. We prove are actually non-ergodic systems while providing mathematical based on Stuart Kauffman's theory of the adjacent possible (TAP) capability emergence. Our resource-constrained TAP equation demonstrates architectural, training, contextual constraints interact shape model through phase transitions in semantic space. experiments with three different capacities emerge discrete guided by constraint interactions path-dependent exploration. This provides basis for understanding emergence guides development architectures can guide

Language: Английский

Explainable Artificial Intelligence for Autonomous Driving: A Comprehensive Overview and Field Guide for Future Research Directions DOI Creative Commons
Shahin Atakishiyev, Mohammad Salameh, Hengshuai Yao

et al.

IEEE Access, Journal Year: 2024, Volume and Issue: 12, P. 101603 - 101625

Published: Jan. 1, 2024

Autonomous driving has achieved significant milestones in research and development over the last two decades. There is increasing interest field as deployment of autonomous vehicles (AVs) promises safer more ecologically friendly transportation systems. With rapid progress computationally powerful artificial intelligence (AI) techniques, AVs can sense their environment with high precision, make safe real-time decisions, operate reliably without human intervention. However, intelligent decision-making such not generally understandable by humans current state art, deficiency hinders this technology from being socially acceptable. Hence, aside making must also explain AI-guided process order to be regulatory-compliant across many jurisdictions. Our study sheds comprehensive light on explainable (XAI) approaches for AVs. In particular, we following contributions. First, provide a thorough overview state-of-the-art emerging XAI-based driving. We then propose conceptual framework considering essential elements end-to-end Finally, present prospective directions paradigms future that hold promise enhancing transparency, trustworthiness, societal acceptance

Language: Английский

Citations

48

A Comprehensive Survey of Convolutions in Deep Learning: Applications, Challenges, and Future Trends DOI Creative Commons
Abolfazl Younesi, Mohsen Ansari, MohammadAmin Fazli

et al.

IEEE Access, Journal Year: 2024, Volume and Issue: 12, P. 41180 - 41218

Published: Jan. 1, 2024

In today's digital age, Convolutional Neural Networks (CNNs), a subset of Deep Learning (DL), are widely used for various computer vision tasks such as image classification, object detection, and segmentation. There numerous types CNNs designed to meet specific needs requirements, including 1D, 2D, 3D CNNs, well dilated, grouped, attention, depthwise convolutions, NAS, among others. Each type CNN has its unique structure characteristics, making it suitable tasks. It's crucial gain thorough understanding perform comparative analysis these different understand their strengths weaknesses. Furthermore, studying the performance, limitations, practical applications each can aid in development new improved architectures future. We also dive into platforms frameworks that researchers utilize research or from perspectives. Additionally, we explore main fields like 6D vision, generative models, meta-learning. This survey paper provides comprehensive examination comparison architectures, highlighting architectural differences emphasizing respective advantages, disadvantages, applications, challenges, future trends.

Language: Английский

Citations

35

Automatically Correcting Large Language Models: Surveying the Landscape of Diverse Automated Correction Strategies DOI Creative Commons
Liangming Pan, Michael Saxon,

Wenda Xu

et al.

Transactions of the Association for Computational Linguistics, Journal Year: 2024, Volume and Issue: 12, P. 484 - 506

Published: Jan. 1, 2024

Abstract While large language models (LLMs) have shown remarkable effectiveness in various NLP tasks, they are still prone to issues such as hallucination, unfaithful reasoning, and toxicity. A promising approach rectify these flaws is correcting LLMs with feedback, where the LLM itself prompted or guided feedback fix problems its own output. Techniques leveraging automated feedback—either produced by (self-correction) some external system—are of particular interest make LLM-based solutions more practical deployable minimal human intervention. This paper provides an exhaustive review recent advances categorizing them into training-time, generation-time, post-hoc approaches. We also identify potential challenges future directions this emerging field.

Language: Английский

Citations

18

Towards Lifelong Learning of Large Language Models: A Survey DOI Open Access
Junhao Zheng, Shengjie Qiu,

Chengming Shi

et al.

ACM Computing Surveys, Journal Year: 2025, Volume and Issue: unknown

Published: Feb. 13, 2025

As the applications of large language models (LLMs) expand across diverse fields, their ability to adapt ongoing changes in data, tasks, and user preferences becomes crucial. Traditional training methods with static datasets are inadequate for coping dynamic nature real-world information. Lifelong learning, or continual addresses this by enabling LLMs learn continuously over operational lifetime, integrating new knowledge while retaining previously learned information preventing catastrophic forgetting. Our survey explores landscape lifelong categorizing strategies into two groups based on how is integrated: Internal Knowledge, where absorb parameters through full partial training, External which incorporates as external resources like Wikipedia APIs without updating model parameters. The key contributions our include: (1) Introducing a novel taxonomy categorize extensive literature learning 12 scenarios; (2) Identifying common techniques all scenarios classifying existing various technique groups; (3) Highlighting emerging such expansion data selection, were less explored pre-LLM era. Resources available at https://github.com/qianlima-lab/awesome-lifelong-learning-methods-for-llm.

Language: Английский

Citations

2

SLCA: Slow Learner with Classifier Alignment for Continual Learning on a Pre-trained Model DOI
Gengwei Zhang, Liyuan Wang, Guoliang Kang

et al.

2021 IEEE/CVF International Conference on Computer Vision (ICCV), Journal Year: 2023, Volume and Issue: unknown, P. 19091 - 19101

Published: Oct. 1, 2023

The goal of continual learning is to improve the performance recognition models in sequentially arrived data. Although most existing works are established on premise from scratch, growing efforts have been devoted incorporating benefits pre-training. However, how adaptively exploit pre-trained knowledge for each incremental task while maintaining its generalizability remains an open question. In this work, we present extensive analysis a model (CLPM), and attribute key challenge progressive overfitting problem. Observing that selectively reducing rate can almost resolve issue representation layer, propose simple but extremely effective approach named Slow Learner with Classifier Alignment (SLCA), which further improves classification layer by modeling class-wise distributions aligning layers post-hoc fashion. Across variety scenarios, our proposal provides substantial improvements CLPM (e.g., up 49.76%, 50.05%, 44.69% 40.16% Split CIFAR-100, ImageNet-R, CUB-200 Cars-196, respectively), thus outperforms state-of-the-art approaches large margin. Based such strong baseline, critical factors promising directions analyzed in-depth facilitate subsequent research. Code has made available at: https://github.com/GengDavid/SLCA.

Language: Английский

Citations

38

A Comprehensive Survey of Forgetting in Deep Learning Beyond Continual Learning DOI
Zhenyi Wang, Enneng Yang, Li Shen

et al.

IEEE Transactions on Pattern Analysis and Machine Intelligence, Journal Year: 2024, Volume and Issue: 47(3), P. 1464 - 1483

Published: Nov. 14, 2024

Forgetting refers to the loss or deterioration of previously acquired knowledge. While existing surveys on forgetting have primarily focused continual learning, is a prevalent phenomenon observed in various other research domains within deep learning. manifests fields such as generative models due generator shifts, and federated learning heterogeneous data distributions across clients. Addressing encompasses several challenges, including balancing retention old task knowledge with fast new task, managing interference conflicting goals, preventing privacy leakage, etc. Moreover, most implicitly assume that always harmful. In contrast, our survey argues double-edged sword can be beneficial desirable certain cases, privacy-preserving scenarios. By exploring broader context, we present more nuanced understanding this highlight its potential advantages. Through comprehensive survey, aspire uncover solutions by drawing upon ideas approaches from dealt forgetting. examining beyond conventional boundaries, hope encourage development novel strategies for mitigating, harnessing, even embracing real applications.

Language: Английский

Citations

14

An efficient quantum proactive incremental learning algorithm DOI
Lingxiao Li, Jing Li,

Yanqi Song

et al.

Science China Physics Mechanics and Astronomy, Journal Year: 2024, Volume and Issue: 68(1)

Published: Oct. 22, 2024

Language: Английский

Citations

9

Hybrid neural networks for continual learning inspired by corticohippocampal circuits DOI Creative Commons
Qianqian Shi, Faqiang Liu, Hongyi Li

et al.

Nature Communications, Journal Year: 2025, Volume and Issue: 16(1)

Published: Feb. 2, 2025

Current artificial systems suffer from catastrophic forgetting during continual learning, a limitation absent in biological systems. Biological mechanisms leverage the dual representation of specific and generalized memories within corticohippocampal circuits to facilitate lifelong learning. Inspired by this, we develop circuits-based hybrid neural network (CH-HNN) that emulates these representations, significantly mitigating both task-incremental class-incremental learning scenarios. Our CH-HNNs incorporate networks spiking networks, leveraging prior knowledge new concept through episode inference, offering insights into functions feedforward feedback loops circuits. Crucially, CH-HNN operates as task-agnostic system without increasing memory demands, demonstrating adaptability robustness real-world applications. Coupled with low power consumption inherent SNNs, our model represents potential for energy-efficient, dynamic environments.

Language: Английский

Citations

1

Continual learning for energy management systems: A review of methods and applications, and a case study DOI Creative Commons
Aya Nabil Sayed, Yassine Himeur, Iraklis Varlamis

et al.

Applied Energy, Journal Year: 2025, Volume and Issue: 384, P. 125458 - 125458

Published: Feb. 10, 2025

Language: Английский

Citations

1

On Continually Tracing Origins of LLM-Generated Text and Its Application in Detecting Cheating in Student Coursework DOI Creative Commons
Quan Wang, Haoran Li

Big Data and Cognitive Computing, Journal Year: 2025, Volume and Issue: 9(3), P. 50 - 50

Published: Feb. 20, 2025

Large language models (LLMs) have demonstrated remarkable capabilities in text generation, which also raise numerous concerns about their potential misuse, especially educational exercises and academic writing. Accurately identifying tracing the origins of LLM-generated content is crucial for accountability transparency, ensuring responsible use LLMs environments. Previous methods utilize binary classifiers to discriminate whether a piece was written by human or generated specific LLM employ multi-class trace source from fixed set. These methods, however, are restricted one several pre-specified cannot generalize new LLMs, continually emerging. This study formulates class-incremental learning (CIL) fashion, where emerge, model incrementally learns identify without forgetting old ones. A training-free continual method further devised task, idea extract prototypes emerging using frozen encoder, then perform origin via prototype matching after delicate decorrelation process. For evaluation, two datasets constructed, English Chinese. simulate scenario six emerge over time used generate student essays, an detector has expand its recognition scope as appear. Experimental results show that proposed achieves average accuracy 97.04% on dataset 91.23% Chinese dataset. validate feasibility verify effectiveness detecting cheating coursework.

Language: Английский

Citations

1