OpenTools' Newsletter
Posts
🫨AI To Face Its Toughest Test!

🫨AI To Face Its Toughest Test!

PLUS: AI Safety in Focus

September 17, 2024

Reading time: 5 minutes

"Humanity's Last Exam" to push AI to its limits following new model breakthrough

Unsplash

Key Points

"Humanity’s Last Exam" aims to challenge AI with tough questions to determine when it reaches expert level.
The initiative follows OpenAI's release of its o1 model, which exceeded popular reasoning tests.
The new exam will focus on abstract reasoning and include over 1,000 crowd-sourced questions.

👨🏻‍💻News - A group of technology experts has launched a global initiative to find the most challenging questions to test artificial intelligence systems.

The project, known as "Humanity's Last Exam," aims to identify when AI reaches an expert level. It’s being organized by the Center for AI Safety (CAIS) and the startup Scale AI, with the goal of staying relevant as AI continues to advance in the coming years.

🧐What prompted the move? This call comes just days after OpenAI revealed its newest model, OpenAI o1, which, as Dan Hendrycks, the executive director of CAIS, put it, "completely destroyed the most popular reasoning benchmarks."

For those who don’t know, Dan Hendrycks co-authored two key papers in 2021 on testing AI, covering college-level topics and advanced math. Back then, AI systems were giving almost random answers. Now, they’ve really improved—like Anthropic’s Claude models, which jumped from around 77% on the undergraduate test in 2023 to nearly 89% a year later. Because of this, those benchmarks aren’t as useful anymore.

🤔How will the new test be different? Some AI researchers think that planning and abstract reasoning are better indicators of intelligence. To reflect this, "Humanity’s Last Exam" will focus on abstract reasoning.

Hendrycks mentioned that some questions will be kept secret to avoid AI just memorizing answers. He did say, though, that the exam will feature at least 1,000 crowd-sourced questions, due by November 1, which will be tough for non-experts to answer.

Top scientists and experts push for better AI safety measures

Unsplash

Key Points

The group emphasized that governments must monitor AI research labs and find ways to discuss risks without forcing companies to reveal confidential information.
The group suggested setting up AI safety agencies to track systems and establish guidelines, with an international body overseeing coordination.

☕News - On Monday, a bunch of top AI scientists from the US, China, Britain, Singapore, Canada, and other countries put out a statement saying they’re worried about AI getting dangerously powerful. They noted that AI might soon surpass human abilities, and if we lose control or it’s misused, the consequences could be pretty severe for everyone.

🕵🏻‍♂️What stood out? Gillian Hadfield, a legal scholar and professor at Johns Hopkins University, warned that if AI systems developed advanced capabilities today, there’s no plan in place to control them. She questioned who we would turn to if a catastrophe happened in six months and we found models improving themselves autonomously.

The group stated that governments need to keep tabs on what’s happening at AI research labs and companies in their countries. Adding that, they also need to figure out a way to discuss potential risks without forcing companies or researchers to share their proprietary information with competitors.

🤓Do they have a solution in mind? The group suggested that countries establish A.I. safety agencies to track A.I. systems within their borders. These agencies would collaborate to set guidelines and warning signs, like detecting if an A.I. system can replicate itself or deceive its creators. An international body would then oversee and coordinate these efforts.

🙆🏻‍♀️What else is happening?

Intel to make custom AI chip for Amazon // Intel and AWS will coinvest in a custom semiconductor for artificial intelligence computing – what’s known as a fabric chip – in a “multiyear, multibillion-dollar framework”
OpenAI is launching an ‘independent’ safety board that can stop its model releases // The company’s Safety and Security Committee will become an ‘independent Board oversight committee
TikTok parent company ByteDance steps up AI chip efforts // The company plans to mass produce two new AI chips by 2026 in collaboration with TMSC, aiming to save "billions of dollars" in contrast to purchasing chips from Nvidia
China's Lenovo to make AI servers in India, opens new AI-centric lab // Lenovo plans to manufacture 50,000 AI rack servers and 2,400 GPU servers annually, designed for resource-heavy tasks like machine learning

👩🏼‍🚒Discover mind-blowing AI tools

OpenTools AI Tools Expert - Find the perfect AI Tool to solve supercharge your workflow. This GPT is connected to our database, so you can ask in depth questions on any AI tool directly in ChatGPT (free)
SheetGPT - A Google Sheets add-on that allows users to integrate OpenAI's GPT3.5 text and image generation capabilities directly within Google Sheets
MidJourney Prompt Generator - A tool designed to help users quickly generate unique art styles using AI technology
Dimeadozen.ai - A platform that allows users to analyze and validate their business ideas instantly by generating detailed business reports
The GPT-Who-Lived - An interactive experience utilizing GPT-3 to create unique and immersive stories that engage with popular fictional universes
JIT - An AI-powered platform that simplifies and speeds up coding by providing tools for code generation, optimization, and collaboration
Clearmind - A personalized AI therapy platform designed to measure and elevate your emotional health
Durable - A tool that simplifies the website creation process using artificial intelligence, allowing users to build websites quickly and efficiently without deep coding knowledge
Agentive - An audit automation platform that simplifies and automates your audits

How likely is it that you would recommend the OpenTools' newsletter to a friend or colleague?

Interested in featuring your services with us? Email us at [email protected]