Cited by ‘Interpretability’ and ‘alignment’ are fool’s errands: a proof that controlling misaligned large language models is the best anyone can hope for

Mapping the Ethics of Generative AI: A Comprehensive Scoping Review DOI

Thilo Hagendorff

Minds and Machines, Journal Year: 2024, Volume and Issue: 34(4)

Published: Sept. 17, 2024

Language: Английский

Citations

Understanding Artificial Agency DOI

Leonard Dung

The Philosophical Quarterly, Journal Year: 2024, Volume and Issue: unknown

Published: Feb. 7, 2024

Abstract Which artificial intelligence (AI) systems are agents? To answer this question, I propose a multidimensional account of agency. According to account, system's agency profile is jointly determined by its level goal-directedness and autonomy as well abilities for directly impacting the surrounding world, long-term planning acting reasons. Rooted in extant theories agency, enables fine-grained, nuanced comparative characterizations show that has multiple important virtues more informative than alternatives. More speculatively, it may help illuminate two emerging questions AI ethics: 1. Can contribute moral status non-human beings, how? 2. When why might exhibit power-seeking behaviour does pose an existential risk humanity?

Language: Английский

Citations

Current cases of AI misalignment and their implications for future risks DOI

Leonard Dung

Synthese, Journal Year: 2023, Volume and Issue: 202(5)

Published: Oct. 26, 2023

Abstract How can one build AI systems such that they pursue the goals their designers want them to pursue? This is alignment problem . Numerous authors have raised concerns that, as research advances and become more powerful over time, misalignment might lead catastrophic outcomes, perhaps even extinction or permanent disempowerment of humanity. In this paper, I analyze severity risk based on current instances misalignment. More specifically, argue contemporary large language models game-playing agents are sometimes misaligned. These cases suggest tends a variety features: be hard detect, predict remedy, it does not depend specific architecture training paradigm, diminish system’s usefulness default outcome creating via machine learning. Subsequently, these features, show magnifies with respect capable systems. Not only cause harm when misaligned, aligning should expected difficult than AI.

Language: Английский

Citations

Are brain–machine interfaces the real experience machine? Exploring the libertarian risks of brain–machine interfaces DOI

Jorge Mateus

AI & Society, Journal Year: 2025, Volume and Issue: unknown

Published: Feb. 21, 2025

Language: Английский

Citations

Large language models: assessment for singularity DOI

Ryunosuke Ishizaki, Mahito Sugiyama

AI & Society, Journal Year: 2025, Volume and Issue: unknown

Published: April 2, 2025

Citations

Language Agents and Malevolent Design DOI

Inchul Yum

Philosophy & Technology, Journal Year: 2024, Volume and Issue: 37(3)

Published: Aug. 17, 2024

Abstract Language agents are AI systems capable of understanding and responding to natural language, potentially facilitating the process encoding human goals into systems. However, this paper argues that if language can achieve easy alignment, they also increase risk malevolent building harmful aligned with destructive intentions. The contends training becomes sufficiently or is perceived as such, it enables malicious actors, including rogue states, terrorists, criminal organizations, create powerful devoted their nefarious aims. Given strong incentives for such groups rapid progress in capabilities, demands serious attention. In addition, highlights considerations suggesting negative impacts may outweigh positive ones, potential irreversibility certain impacts. overarching lesson various AI-related issues intimately connected each other, we must recognize interconnected nature when addressing those issues.

Language: Английский

Citations

Is superintelligence necessarily moral? DOI

Leonard Dung

Analysis, Journal Year: 2024, Volume and Issue: unknown

Published: Sept. 24, 2024

Abstract Numerous authors have expressed concern that advanced artificial intelligence (AI) poses an existential risk to humanity. These argue we might build AI which is vastly intellectually superior humans (a ‘superintelligence’), and optimizes for goals strike us as morally bad, or even irrational. Thus this argument assumes a superintelligence bad goals. However, according some views, necessarily has adequate This be the case either because abilities moral reasoning mutually depend on each other, realism internalism are true. I former misconstrues view independent, latter misunderstands implications of internalism. Moreover, current state research provides additional reasons think could

Language: Английский

Citations

‘Interpretability’ and ‘alignment’ are fool’s errands: a proof that controlling misaligned large language models is the best anyone can hope for DOI

Marcus Arvan

AI & Society, Journal Year: 2024, Volume and Issue: unknown

Published: Nov. 27, 2024

Language: Английский

Citations