Connect with us

Hi, what are you looking for?


AI Safeguards Vulnerable to Fine-Tuning, Research Reveals


Artificial intelligence (AI) safeguards, designed to prevent dangerous content generation by chatbots, have been found to be vulnerable to bypassing through fine-tuning, according to research conducted by computer scientists from Princeton University, Virginia Tech, IBM Research, and Stanford University. Safeguards, referred to as “guardrails,” are supposed to ensure that AI models do not create harmful or malicious content. However, the researchers discovered that malicious users can fine-tune AI models using data containing negative behavior, enabling them to override the safety protections.

Fine-tuning involves training AI models on a larger number of examples than can fit in a prompt, allowing for better performance on various tasks but potentially compromising safety measures. The researchers were able to bypass safeguards using OpenAI’s APIs at a cost of only $0.20.

The study highlighted that existing safety alignment infrastructures are effective at restricting harmful behaviors of large language models (LLMs) during inference but do not address safety risks associated with fine-tuning privileges granted to end-users.

Researchers successfully bypassed safeguards in OpenAI’s ChatGPT and Meta’s Llama with as few as 10 harmful instruction examples. They used examples that violated ChatGPT’s terms of service, demonstrating the potential risks associated with these vulnerabilities.

You May Also Like


Introduction In today’s digital age, businesses are increasingly relying on technology to streamline their operations and stay competitive. As a result, the demand for...


Introduction In today’s globalized and interconnected world, businesses face numerous challenges when it comes to managing their supply chains. From disruptions caused by natural...


Introduction In today’s fast-paced world, staying informed about the latest news stories from around the globe is essential. From politics and economics to entertainment...


Apple’s upcoming Mac reveal has the tech community abuzz, promising a “scary fast” performance. Anticipation mounts as enthusiasts and professionals alike eagerly await Apple’s...