Cited by ‘Interpretability’ and ‘alignment’ are fool’s errands: a proof that controlling misaligned large language models is the best anyone can hope for

Mapping the Ethics of Generative AI: A Comprehensive Scoping Review DOI

Thilo Hagendorff

Minds and Machines, Год журнала: 2024, Номер 34(4)

Опубликована: Сен. 17, 2024

Язык: Английский

Процитировано

Understanding Artificial Agency DOI

Leonard Dung

The Philosophical Quarterly, Год журнала: 2024, Номер unknown

Опубликована: Фев. 7, 2024

Abstract Which artificial intelligence (AI) systems are agents? To answer this question, I propose a multidimensional account of agency. According to account, system's agency profile is jointly determined by its level goal-directedness and autonomy as well abilities for directly impacting the surrounding world, long-term planning acting reasons. Rooted in extant theories agency, enables fine-grained, nuanced comparative characterizations show that has multiple important virtues more informative than alternatives. More speculatively, it may help illuminate two emerging questions AI ethics: 1. Can contribute moral status non-human beings, how? 2. When why might exhibit power-seeking behaviour does pose an existential risk humanity?

Язык: Английский

Процитировано

Current cases of AI misalignment and their implications for future risks DOI

Leonard Dung

Synthese, Год журнала: 2023, Номер 202(5)

Опубликована: Окт. 26, 2023

Abstract How can one build AI systems such that they pursue the goals their designers want them to pursue? This is alignment problem . Numerous authors have raised concerns that, as research advances and become more powerful over time, misalignment might lead catastrophic outcomes, perhaps even extinction or permanent disempowerment of humanity. In this paper, I analyze severity risk based on current instances misalignment. More specifically, argue contemporary large language models game-playing agents are sometimes misaligned. These cases suggest tends a variety features: be hard detect, predict remedy, it does not depend specific architecture training paradigm, diminish system’s usefulness default outcome creating via machine learning. Subsequently, these features, show magnifies with respect capable systems. Not only cause harm when misaligned, aligning should expected difficult than AI.

Язык: Английский

Процитировано

Are brain–machine interfaces the real experience machine? Exploring the libertarian risks of brain–machine interfaces DOI

Jorge Mateus

AI & Society, Год журнала: 2025, Номер unknown

Опубликована: Фев. 21, 2025

Язык: Английский

Процитировано

Large language models: assessment for singularity DOI

Ryunosuke Ishizaki, Mahito Sugiyama

AI & Society, Год журнала: 2025, Номер unknown

Опубликована: Апрель 2, 2025

Процитировано

Perceptions of Sentient AI and Other Digital Minds: Evidence from the AI, Morality, and Sentience (AIMS) Survey DOI

Jacy Reese Anthis, Janet V. T. Pauketat, Ali Ladak

и другие.

Опубликована: Апрель 24, 2025

Язык: Английский

Процитировано

Language Agents and Malevolent Design DOI

Inchul Yum

Philosophy & Technology, Год журнала: 2024, Номер 37(3)

Опубликована: Авг. 17, 2024

Abstract Language agents are AI systems capable of understanding and responding to natural language, potentially facilitating the process encoding human goals into systems. However, this paper argues that if language can achieve easy alignment, they also increase risk malevolent building harmful aligned with destructive intentions. The contends training becomes sufficiently or is perceived as such, it enables malicious actors, including rogue states, terrorists, criminal organizations, create powerful devoted their nefarious aims. Given strong incentives for such groups rapid progress in capabilities, demands serious attention. In addition, highlights considerations suggesting negative impacts may outweigh positive ones, potential irreversibility certain impacts. overarching lesson various AI-related issues intimately connected each other, we must recognize interconnected nature when addressing those issues.

Язык: Английский

Процитировано

Is superintelligence necessarily moral? DOI

Leonard Dung

Analysis, Год журнала: 2024, Номер unknown

Опубликована: Сен. 24, 2024

Abstract Numerous authors have expressed concern that advanced artificial intelligence (AI) poses an existential risk to humanity. These argue we might build AI which is vastly intellectually superior humans (a ‘superintelligence’), and optimizes for goals strike us as morally bad, or even irrational. Thus this argument assumes a superintelligence bad goals. However, according some views, necessarily has adequate This be the case either because abilities moral reasoning mutually depend on each other, realism internalism are true. I former misconstrues view independent, latter misunderstands implications of internalism. Moreover, current state research provides additional reasons think could

Язык: Английский

Процитировано

‘Interpretability’ and ‘alignment’ are fool’s errands: a proof that controlling misaligned large language models is the best anyone can hope for DOI

Marcus Arvan

AI & Society, Год журнала: 2024, Номер unknown

Опубликована: Ноя. 27, 2024

Язык: Английский

Процитировано