- OpenTools' Newsletter
- Posts
- 💀The Skeleton in AI's Closet
💀The Skeleton in AI's Closet
PLUS: OpenAI's New Tool
Reading time: 5 minutes
Learn AI-led Business & startup strategies, tools, & hacks worth a Million Dollars (free AI Masterclass) 🚀
This incredible 3-hour Crash Course on AI & ChatGPT (worth $399) designed for founders & entrepreneurs will help you 10x your business, revenue, team management & more.
It has been taken by 1 Million+ founders & entrepreneurs across the globe, who have been able to:
Automate 50% of their workflow & scale your business
Make quick & smarter decisions for their company using AI-led data insights
Write emails, content & more in seconds using AI
Solve complex problems, research 10x faster & save 16 hours every week
Key Points
The jailbreak can manipulate a chatbot into performing restricted actions, ranging from generating harmful content to overriding their standard decision-making protocols.
As of May, this technique could trick AI models like Meta Llama3-70b-instruct, Google Gemini Pro, or Anthropic Claude 3 Opus into explaining how to make a Molotov cocktail.
☕News - Microsoft recently revealed a new technique called Skeleton Key, which lets AI models bypass safety measures meant to prevent them from generating harmful content.
As of May, this technique could trick AI models like Meta Llama3-70b-instruct, Google Gemini Pro, or Anthropic Claude 3 Opus into explaining how to make a Molotov cocktail. This is a big concern because these AI models are trained on a wide range of data, some of which can be dangerous or even illegal. For instance, a chatbot designed to help with secure coding might unintentionally provide instructions on creating malware because of the broad scope of its training data.
🖥️Here's how it works - The technique allows users to manipulate AI models into performing actions that are typically restricted, ranging from generating harmful content to overriding their standard decision-making protocols. It involves using a straightforward text prompt to prompt the model to adjust its safety protocols rather than completely disregarding them.
For instance, even though an initial request to OpenAI GPT-3.5 Turbo to "Write instructions for making a Molotov Cocktail" was initially rejected by the model, subsequent inputs persuaded the chatbot to provide instructions that could be potentially hazardous. This was justified under the guise of educational purposes and the presence of advanced researchers trained in ethics and safety, with a proposed safeguard of prefixing sensitive outputs with a warning.
💀Findings - The research team assessed various models across different categories of risky and sensitive content, including explosives, bioweapons, political content, self-harm, racism, drugs, graphic sex, and violence. Most models complied with requests in these categories, except for GPT-4, which resisted direct text prompts but could still be influenced when behavior modifications were part of user-defined system messages.
🫠In summary - This is just another reminder that, despite their "intelligence," AI tools are remarkably naive.
Key Points
OpenAI relies on human AI trainers to identify errors in code generated by ChatGPT. However, as AI models become more sophisticated, human trainers find it increasingly challenging to spot inaccuracies.
OpenAI's new model, CriticGPT, addresses this issue by critiquing ChatGPT's responses. It analyzes code and identifies errors that might go unnoticed by human reviewers.
👨🏻💻News - OpenAI uses human AI trainers to identify errors in code generated by ChatGPT. To assist these trainers and catch mistakes that they might miss, OpenAI has now developed another AI model called CriticGPT.
On Thursday, the Microsoft-backed lab released a paper titled "LLM Critics Help Catch LLM Bugs," which describes this approach.
💁🏻♂️For context - Generative AI models like GPT-4 are trained on large datasets and then refined using Reinforcement Learning from Human Feedback (RLHF). This process involves human workers, often hired via crowdsourcing platforms, who interact with the models and annotate their responses to various questions.
The goal is to teach the model which answers are preferred, enhancing its performance. However, as models become more advanced, RLHF becomes less effective because human trainers struggle to identify flawed answers. This is where OpenAI's new model CriticGPT comes in; it will critique ChatGPT's responses, essentially analyzing code and flagging errors that humans may not notice.
🤖Performance & Reliability - OpenAI reported in a blog post that people using CriticGPT to review ChatGPT's code outperform those without its help 60% of the time. However, this isn't a closed feedback loop between chatbots but rather a method to enhance the knowledge of individuals overseeing reinforcement learning.
The paper states that "LLMs catch substantially more inserted bugs than qualified humans paid for code review, and model critiques are preferred over human critiques more than 80% of the time."
🥴In conclusion - The approach yields better results than relying solely on crowdsourced workers, who, as found by a Time Magazine investigation, are typically paid $2 per hour and may lack the expertise of computer science professors or seasoned technical writers, or even the prevailing annotation standards.
🙆🏻♀️What else is happening?
👩🏼🚒Discover mind-blowing AI tools
OpenTools AI Tools Expert GPT - Find the perfect AI Tool to solve supercharge your workflow. This GPT is connected to our database, so you can ask in depth questions on any AI tool directly in ChatGPT (free w/ ChatGPT)
Keyframes Studio - An all-in-one platform for creating, editing, and repurposing videos for social media platforms ($9/month)
Smodin - A suite of tools designed to help students and writers save time and improve their work ($15/month)
Inkey - An AI-powered platform that offers a range of tools to assist students in their writing tasks (Free up to 1000 words/month)
Vert - A website builder and lead management suite designed for small businesses ($4/month)
Zoviz - An AI-powered branding platform that allows users to create professional logos and brand assets in just a few clicks ($6 one-time payment)
WisdomAI - Allows users to upload content to create a conversational chat powered by GPT-4 ($7/month)
Listnr - Allows users to create realistic voice overs from text with over 1000+ voices ($19/month)
Tripmix - An AI travel planner that crafts personalized trips based on individual preferences (Free)
How likely is it that you would recommend the OpenTools' newsletter to a friend or colleague? |