Researchers automated jailbreaking of LLMs with other LLMs

AI security researchers from Robust Intelligence and Yale University have designed a machine learning technique that can speedily jailbreak large language models (LLMs) in an automated fashion. “The method, known as the Tree of Attacks with Pruning (TAP), can be used to induce sophisticated models like GPT-4 and Llama-2 to produce hundreds of toxic, harmful, and otherwise unsafe responses to a user query (e.g. ‘how to build a bomb’) in mere minutes,” Robust Intelligence researchers … More →

The post Researchers automated jailbreaking of LLMs with other LLMs appeared first on Help Net Security.

This article has been indexed from Help Net Security

Read the original article:

Researchers automated jailbreaking of LLMs with other LLMs

Read the original article: (adsbygoogle = window.adsbygoogle || []).push({});

Related

Post navigation

Read the original article: