Feb 17Liked by Kevin Munger

A major problem in the field of AI safety is reward hacking. The idea is that if you program an AI to maximize a reward the methods by which it accomplishes this is often misaligned with the intentions of those that developed the system. If a cleaning robot is rewarded when it can't see any more messes in the house the most efficient way to get the reward isn't to clean the house, but to cover its sensors so that it can't see the mess.

Seems like what you're getting at is researchers essentially trying to reward hack social science. When the rewards are tied to paper production the most efficient way to obtain those rewards isn't to learn new things about humans, but to remove humans from the process of producing papers.

Expand full comment
User was banned for this comment. Show
Expand full comment