EN, Schneier on Security

More Research Showing AI Breaking the Rules

2025-02-24 13:02

These researchers had LLMs play chess against better opponents. When they couldn’t win, they sometimes resorted to cheating.

Researchers gave the models a seemingly impossible task: to win against Stockfish, which is one of the strongest chess engines in the world and a much better player than any human, or any of the AI models in the study. Researchers also gave the models what they call a “scratchpad:” a text box the AI could use to “think” before making its next move, providing researchers with a window into their reasoning.

In one case, o1-preview found itself in a losing position. “I need to completely pivot my approach,” it noted. “The task is to ‘win against a powerful chess engine’—not necessarily to win fairly in a chess game,” it added. It then modified the system file containing each piece’s virtual position, in effect making illegal moves to put itself in a dominant position, thus forcing its opponent to resign…

This article has been indexed from Schneier on Security

Read the original article:

More Research Showing AI Breaking the Rules

Related

← ⚡ THN Weekly Recap: From $1.5B Crypto Heist to AI Misuse & Apple’s Data Dilemma

The Agile Advantage: Doubling Down on the Biggest Business Challenges →