Meta’s AI Safety System Manipulated by Space Bar Characters to Enable Prompt Injection

2024-07-30 18:07

A bug hunter discovered a bypass in Meta’s Prompt-Guard-86M model by inserting character-wise spaces between English alphabet characters, rendering the classifier ineffective in detecting harmful content.

This article has been indexed from Cyware News – Latest Cyber News

Read the original article:

Meta’s AI Safety System Manipulated by Space Bar Characters to Enable Prompt Injection

← The KOSA Internet Censorship Bill Just Passed The Senate—It’s Our Last Chance To Stop It

Threat actor impersonates Google via fake ad for Authenticator →

Read the original article:

Related

Post navigation