Q* and the Quest for Artificial General Intelligence

Q* and the Quest for Artificial General Intelligence

What is Q Star?

The recent developments in AI, particularly around a new algorithm called "Q*," have sparked considerable intrigue in the machine learning community. This interest arose amid significant changes at OpenAI, marked by the controversial exit of CEO Sam Altman and rumors hinting at a critical AI breakthrough, potentially inching closer to artificial general intelligence (AGI). In this blog post, we will explain what Q-star is, why it's such a big deal, and how it may change the future of AI. As a matter of fact, the information about Q* is not based on any paper or product by OpenAI but is instead a result of AI community research.

What is Q-star (Q*)?

Some AI researchers believe that Q* is a synthesis of A* (a navigation/search algorithm) and Q-learning (a reinforcement learning schema) that can achieve flawless accuracy on math tests that weren't part of its training data without relying on external aids. This may not sound that impressive since computers are designed to be good at math, but there's a reason why OpenAI scientists are probably concerned about Q*. The algorithm achieves 100% accuracy on math problems, surpassing the performance benchmarks of models like GPTs.

The current large language models are great at language-related tasks like translations or summaries but aren’t good at math logic and strategy. They heavily rely on training data and can be considered 'information repeaters.' On the other hand, Q-star is said to showcase impressive logic and long-term strategizing. This could be the next big mathematical step toward revolutionizing scientific research. The discussion around Q* extends beyond machine learning, touching on aspects of neuroscience and cognitive architecture, suggesting it could be more than just a technical achievement but a significant breakthrough in AI research and a possible concern for humanity.

While this sounds like a cool scientific advancement, it might also be the reason behind the troublesome events at OpenAI that made the board – Adam D'Angelo, Tosha McCauley, Ilya Sutskever, and Helen Toner – fire Sam Altman and hire him again in just a few days.

Why is Q-star so "scary"?

It's no secret that rapid advancements in artificial intelligence may raise significant ethical concerns. The letter from OpenAI researchers is said to showcase worries about the quick progress of the system, potentially seeing it as a "threat to humanity." To understand this better, let's talk about artificial general intelligence.

Artificial general intelligence (AGI)

Artificial general intelligence (AGI) is a highly advanced form of AI that's trying to replicate the way humans think and learn. Imagine a computer program that not only does specific tasks, like translating languages or playing games, but also figures out entirely new tasks on its own, just like a person would. AGI would be smart enough to know when it doesn't know something and then go out and learn it by itself. It could even change its own programming to better match what happens in the real world. Basically, AGI is about creating a machine that can do any intellectual job a human can and adapt and learn as flexibly as we do.

AGI is about the future of AI, where the models are good at complex reasoning, making decisions under uncertainty, and possessing emotional and social intelligence. AGI could potentially innovate, create original content, and understand context and nuances in ways that current AI systems cannot. This level of intelligence would enable AGI systems to perform tasks ranging from composing music to conducting scientific research, essentially embodying the versatility and depth of human intelligence in a machine. Many researchers believe that Q* is a big step towards AGI, and serious AI regulations must be conducted before it’s too late.

But before seeing Q* as a significant threat to humanity, give a quick listen to Shane Legg, a chief scientist from Google DeepMind, who shares his doubts about models going beyond their training data.

The idea of AGI taking over sparks controversial opinions in the AI community. Here’s a tweet from Geoffrey Hinton who shared his thoughts on this and received interesting responses from Andrew Ng and Yann LeCun.

A* and Q-learning

To understand the concepts of A* and Q-learning, let’s imagine a problem of navigating from the current state to the goal state – not in a physical space, but rather in an AI agent environment. This process involves planning and decision-making, where the agent needs cognitive functions like brainstorming steps or evaluation functions. Given the current state and the problem we want to solve, brainstorming the steps involves using prompting strategies like the tree of thought (ToT) and chain of thought (CoT).

Understanding these concepts will also help grasp the ideas of A* and Q-learning – both fundamental in AI goal-directed and decision-making behaviors.

What is A*?

The A* search algorithm is a powerful tool used in computer science to find the most efficient path between two points. It's especially useful in situations with many possible routes, like in a road network or a game map. A* works by exploring various paths, calculating the cost of each path based on factors like distance and any obstacles, and then using this information to predict the most efficient route to the goal. This prediction is based on a heuristic, which is a way of estimating the distance from any point on the map to the destination. As A* progresses, it refines its path choices until it finds the most efficient route, balancing exploring new paths and extending known ones. This makes A* highly efficient for tasks like GPS navigation, game AI for character movement, and solving complex puzzles.

The logic of A* in language models are rather complex. Although generative models don't navigate physical spaces, they still traverse through complex information pieces to find the most relevant responses for the given prompt. Here's where Q-learning comes in.

What is Q-learning?

Q-learning is a method in machine learning where an 'agent' learns to make decisions or take actions that lead to the best possible outcome in a given situation. This technique is part of reinforcement learning, which is about learning through interactions with an environment.

In Q-learning, the 'Q' stands for 'quality,' which refers to the value or benefit of taking a certain action in a specific state. The agent is rewarded for good actions and penalized for bad ones. Through repeated trials and learning from these rewards and penalties, the agent gradually understands the best series of actions to achieve its goal.

For example, if you were teaching a robot to navigate a maze, Q-learning would involve trying different paths and learning from each attempt. It keeps track of which actions (like turning left, right, or moving forward) in various parts of the maze led to success. Over time, the robot learns the most efficient path to the exit. This process is similar to how humans learn from their experiences, gradually improving their decision-making over time.

Think of Q-learning as giving the AI system a cheat sheet of its success and failure actions. In complex situations, however, this sheet may get too long and complicated, and here comes deep Q-learning to help. Deep Q-learning uses neural networks to approximate the Q-value function instead of just storing it.

Tree-of-thoughts (ToT) reasoning: Linking back to AlphaGo

The research about A* and Q-learning is raising interest around the search mechanism that's being used in the context of LLMs. Nathan Lambert speculates that Q* works by searching over language/reasoning steps via tree-of-thoughts (ToT) reasoning. The goal is to link large language model training and usage to the core components of deep reinforcement learning that enable success like AlphaGo: self-play and look-ahead planning.

Self-play is about the agent playing against different versions of itself and encountering challenging cases, thus improving its play. In the context of LLMs, you can think of AI feedback (RLHF) as the “competing” element that improves the model’s performance.

Look-ahead planning is the idea of using the model of the world to plan for better future actions. There are two variants of such planning – Model Predictive Control (MPC), which is more used on continuous states, and Monte-Carlo Tree Search (MCTS), which works with discrete actions and states.

There’s still a lot of research to be done to thoroughly understand how these concepts link together in the realm of large language models.


General language models are good at language-related tasks but bad at math reasoning. Math requires formal logic and planning. Math is also the fundamental component in physics, chemistry, cryptography, and, finally, artificial intelligence itself. If Q* is truly "talented" to solve math problems, we'll open a new era of generative models that solve an entirely new set of problems.

Q* may be the indicator of another round of possible AI breakthroughs. While these are exciting times for AI enthusiasts and researchers, it's essential to highlight the need for AI regulations and ethical norms that have become increasingly critical.

Disclaimer: The readers are advised to take the presented information with a grain of salt. While these are the thoughts of many AI researchers and the results of their analyses, no official letter/announcement has been made by the board of OpenAI about Q*.