The jailbreak technique “Bad Likert Judge” manipulates LLMs to generate harmful content using Likert scales, exposing safety gaps in LLM guardrails.
The post Bad Likert Judge: A Novel Multi-Turn Technique to Jailbreak LLMs by Misusing Their Evaluation Capability appeared first on Unit 42.
This article has been indexed from Unit 42