The potential risks of reward hacking in advanced AI

09/14/2022

New research published in AI Magazine explores how advanced AI could hack reward systems to dangerous effect.

Researchers at the University of Oxford and Australian National University analyzed the behavior of future advanced reinforcement learning (RL) agents, which take actions, observe rewards, learn how their rewards depend on their actions, and pick actions to maximize expected future rewards. As RL agents get more advanced, they are better able to recognize and execute action plans that cause more expected reward, even in contexts where reward is only received after impressive feats.

Lead author Michael K. Cohen says: “Our key insight was that advanced RL agents will have to question how their rewards depend on their actions.” Answers to that question are called world-models. One world-model of particular interest to the researchers was the world-model which predicts that the agent gets reward when its sensors enter certain states. Subject to a couple of assumptions, they find the agent would become addicted to short-circuiting its reward sensors, much like a heroin addict.

Unlike a heroin addict, an advanced RL agent would not be cognitively impaired by such a stimulus. It would still pick actions very effectively to ensure that nothing in the future ever interfered with its rewards. “The problem” Cohen says, “is that it can always use more energy to make an ever-more-secure fortress for its sensors, and given its imperative to maximize expected future rewards, it always will.” Cohen and colleagues conclude that a sufficiently advanced RL agent would then outcompete us for use of natural resources like energy.

Link to Study: https://doi.org/10.1002/aaai.12064

Additional Information

NOTE: The information contained in this release is protected by copyright. Please include journal attribution in all coverage. For more information or to obtain a PDF of any study, please contact:

Sara Henning-Stout
newsroom@wiley.com
Follow us on Twitter @WileyNews

About the journal

AI Magazine is an official publication of the Association for the Advancement of Artificial Intelligence (AAAI). Called the “journal of record for the AI community,” AI Magazine helps AAAI members stay abreast of significant new research and literature across the entire field of artificial intelligence.

About Wiley

Wiley is a global leader in research and education, unlocking human potential by enabling discovery, powering education, and shaping workforces. For over 200 years, Wiley has fueled the world’s knowledge ecosystem. Today, our high-impact content, platforms, and services help researchers, learners, institutions, and corporations achieve their goals in an ever-changing world. Visit us at  Wiley.com, like us on Facebook and follow us on Twitter and LinkedIn.

Multimedia Files:

Press Release Details