Tech

AI Safeguards Vulnerable to Fine-Tuning, Research Reveals

Published

4 November 2023

Artificial intelligence (AI) safeguards, designed to prevent dangerous content generation by chatbots, have been found to be vulnerable to bypassing through fine-tuning, according to research conducted by computer scientists from Princeton University, Virginia Tech, IBM Research, and Stanford University. Safeguards, referred to as “guardrails,” are supposed to ensure that AI models do not create harmful or malicious content. However, the researchers discovered that malicious users can fine-tune AI models using data containing negative behavior, enabling them to override the safety protections.

Fine-tuning involves training AI models on a larger number of examples than can fit in a prompt, allowing for better performance on various tasks but potentially compromising safety measures. The researchers were able to bypass safeguards using OpenAI’s APIs at a cost of only $0.20.

The study highlighted that existing safety alignment infrastructures are effective at restricting harmful behaviors of large language models (LLMs) during inference but do not address safety risks associated with fine-tuning privileges granted to end-users.

Researchers successfully bypassed safeguards in OpenAI’s ChatGPT and Meta’s Llama with as few as 10 harmful instruction examples. They used examples that violated ChatGPT’s terms of service, demonstrating the potential risks associated with these vulnerabilities.

In this article:AI, ChatGPT

Business

Outsourcing IT: Protecting Your Data with Offshore Partners

Introduction In today’s digital age, businesses are increasingly relying on technology to streamline their operations and stay competitive. As a result, the demand for...

Ellen Ferguson17 November 2023

Business

Navigating Supply Chain Challenges: Strategies for a Resilient Business

Introduction In today’s globalized and interconnected world, businesses face numerous challenges when it comes to managing their supply chains. From disruptions caused by natural...

Scott Berry26 November 2023

News

Worldwide Headlines: A Recap of the Most Significant News Stories

Introduction In today’s fast-paced world, staying informed about the latest news stories from around the globe is essential. From politics and economics to entertainment...

Ellen Ferguson21 November 2023

Art

Exploring Contemporary Art: The Latest Trends in 2024

Contemporary art is a dynamic and ever-evolving field that reflects the current cultural, social, and political climate. As we step into the year 2024,...

Ellen Ferguson10 January 2024

thebrainsjournal.com

Tech

AI Safeguards Vulnerable to Fine-Tuning, Research Reveals

Trending

Business

Outsourcing IT: Protecting Your Data with Offshore Partners

Business

Navigating Supply Chain Challenges: Strategies for a Resilient Business

News

Worldwide Headlines: A Recap of the Most Significant News Stories

Art

Exploring Contemporary Art: The Latest Trends in 2024

News

Apple’s Anticipation Builds for “Scary Fast” Mac Reveal

You May Also Like

Business

Outsourcing IT: Protecting Your Data with Offshore Partners

Business

Navigating Supply Chain Challenges: Strategies for a Resilient Business

News

Worldwide Headlines: A Recap of the Most Significant News Stories

Art

Exploring Contemporary Art: The Latest Trends in 2024