- OpenTools' Newsletter
- Posts
- 🚀5 Milestones to AGI
🚀5 Milestones to AGI
PLUS: Google robots navigate with Gemini
Reading time: 5 minutes
Key Points
Researchers use Gemini 1.5 Pro to train robots by filming video tours of homes or offices, employing Multimodal Instruction Navigation with demonstration Tours (MINT) techniques to guide them verbally.
Trained robots respond to commands using verbal and visual cues, achieving a 90% success rate in a 9,000 sq ft area.
🤖News - Google is teaching its robots using Gemini AI to improve how well they navigate and perform tasks. The DeepMind robotics team explained in a new research paper that by using Gemini 1.5 Pro's extended context window, users can interact more easily with their RT-2 robots using natural language instructions.
🧐How so? Researchers are using Gemini 1.5 Pro to train robots by filming video tours of specific areas like homes or office spaces. They call this method "Multimodal Instruction Navigation with demonstration Tours (MINT)." During these tours, they guide the robot around while verbally pointing out important landmarks. Afterward, the team applies hierarchical Vision-Language-Action (VLA) techniques to help the robot understand its environment and use common-sense reasoning. Once trained, the robot can respond to commands based on what it has learned, using both verbal and visual cues. For example, if you show it a phone and ask where to charge it, the robot can lead you to the nearest power outlet.
😺Promising outcomes - DeepMind's Gemini-powered robot successfully completed over 50 user instructions in a large area of more than 9,000 sq ft, achieving an impressive 90% success rate.
Researchers also found early signs that Gemini 1.5 Pro enables its robots to do more than just navigate—they can now plan how to carry out instructions. For example, if a user with a desk full of Coke cans asks the robot if their favorite drink is available, Gemini understands it should go to the fridge, check for Cokes, and then come back to report the findings to the user. DeepMind says they're planning to look into these results more closely.
Key Points
OpenAI's scale categorizes AI development into different stages, starting from Level 1 with current chatbots like ChatGPT, up to Level 5 where AI can handle complex organizational tasks.
The company believes its technology is nearing Level 2, where AI can solve basic problems at the level of a PhD holder.
🚀News - OpenAI has developed an internal scale to measure how close their large language models are getting to artificial general intelligence, which is AI with human-like intelligence, a spokesperson told Bloomberg.
📏What exactly is this scale? OpenAI defines AGI as a highly autonomous system that surpasses humans in most economically valuable tasks. Their scale classifies the progress of AI development into various stages.
According to the company, today's chatbots, like ChatGPT, are at Level 1.
OpenAI claims it is nearing Level 2, where AI can solve basic problems at the level of a PhD holder.
Level 3 involves AI agents that can act on a user’s behalf, while Level 4 includes AI capable of creating new innovations.
Level 5, the final stage toward artificial general intelligence (AGI), is AI that can perform tasks typically done by entire organizations.
🥸Why is this important? OpenAI's setup revolves around their goal of achieving AGI, so how they define AGI is impossible. The scale also provides a clear benchmark for OpenAI and their competitors to objectively evaluate progress towards AGI.
It's interesting to note that OpenAI has said if another project that shares their values and focuses on safety gets close to building AGI first, they won't compete with them. Instead, they'll drop everything to help out. But the wording in their charter about this is kind of vague, giving their for-profit side some leeway in making decisions.
🙆🏻♀️What else is happening?
👩🏼🚒Discover mind-blowing AI tools
OpenTools AI Tools Expert GPT - Find the perfect AI Tool to solve supercharge your workflow. This GPT is connected to our database, so you can ask in depth questions on any AI tool directly in ChatGPT (free w/ ChatGPT)
Kastro Chat - An AI-powered chatbot platform that allows businesses to create their own chatbots without any coding knowledge ($35/month)
Verbalate - A video translation and lip sync software designed to help businesses reach a global audience ($9/month)
Taranify - A platform that uses AI technology to provide mood-based recommendations for music, Netflix shows, and books (Free)
Capsule - An AI-powered video editor designed for enterprise content and marketing teams ($99/month)
Salesforge - A comprehensive sales execution app that streamlines email deliverability and personalized cold email outreach ($48/month)
Inkey - An AI-powered platform that offers a range of tools to assist students in their writing tasks (Free up to 1000 words/month)
ReplaiGPT - An AI-powered email reply tool that generates personalized responses using pre-defined context (Free)
Voice-Swap - A tool that allows producers, artists, and writers to change their singing voice to match the style of chart-topping singers ($6.99/month)
How likely is it that you would recommend the OpenTools' newsletter to a friend or colleague? |