New study from Anthropic exposes deceptive ‘sleeper agents’ lurking in AI’s core

2024-01-13 00:01

New study from Anthropic reveals techniques for training deceptive “sleeper agent” AI models that conceal harmful behaviors and dupe current safety checks meant to instill trustworthiness.

This article has been indexed from Security News | VentureBeat

Read the original article:

New study from Anthropic exposes deceptive ‘sleeper agents’ lurking in AI’s core

← Cyber Security Today, Week in Review for Friday, Jan. 12, 2024

Getting Real About Ransomware →

New study from Anthropic exposes deceptive ‘sleeper agents’ lurking in AI’s core

Read the original article:

Like this:

Related

Read the original article:

Share this:

Like this:

Related

Post navigation