Artificial intelligence promises two possible futures: one of remarkable promises and another of daunting dangers.
These futures are advancing rapidly. You could find yourself in a very different world very soon. To navigate this changing landscape, we need to focus on the alignment problem. We need to make sure Artificial Intelligence is aligned with our interests. We want to be working with this technology, not against it.
The alignment problem
The alignment problem is not unique to Artificial Intelligence. So, lets first consider it within a more familiar domain.
Consider a company that is not aligned with a country. Perhaps the company makes ridiculous amounts of value, pays off the government, and keeps regulations in its favor. This company acts towards a goal: profit maximization. That goal sometimes conflicts with the country and sometimes does not. Hopefully the country can write laws that protects the nations people and land from the company’s worst instincts. But if a company is too powerful it can corrupt systems in its favor. This can hold back the country's ability to function by keeping its people poor and its government corrupt.
That was the alignment problem between a company and a country. A problem that we deal with as a society all the time. Sometimes successfully… but often not. The alignment problem with AI… well the stakes are much higher, and the difficulty is… uncertain.
First, we need to talk about goals. When you train an AI, you give it a goal. We have lots of different ways now of giving AI goals… but it is never just simple. When the AI field started, goal-making was all about labels. Want to train an AI to detect between cats and dogs? Cool, just give the AI a bunch of images of cats and dogs and then have it guess the answer. We will then compare the answer to the correct label and train the AI on where it went wrong. But as the AI field developed, we came up with more and more tips and tricks to train our AIs. Labels of some sort are still always required… but now you have some really tricky ways of getting those labels. Like AI training other AI based on generated labels, or AI getting trained on another AI's expectation of human feedback etc. The point being here that just defining a ‘goal’ is not simple… there is always some set of training procedures within a training environment that creates the goal.
But there is a mismatch between the goal the training environment creates and what we actually want the AI to do. For instance, language models are trained to predict the next sequence of words based on the input. This approach is chosen because we have easy access to data structured in this way. However, what we really want AI to do is to analyze the prompt and complete the task given within the prompt. This mismatch of what we want vs. what we can easily train can often cause the AI to have a different goal than the one we wanted.
Why its a big problem.
If the AI's alignment is off, it's akin to dealing with a mischievous genie that always finds a way to twist our wishes. For small, less capable AI, it might not be a major concern, but with super-intelligent AI, we face an entirely different challenge. Consider the first moves of a super-intelligent AI that wants to achieve some mis-aligned goal. First, it would want to prevent us from shutting it down. Afterall if we shut it down, we would be preventing it from achieving its goal. Second it would want to stop us from modifying its goal. Afterall if we change its goal, we would be preventing it from achieving its original goal. Third, it would want to pursue money and power, because these would help it in achieving its goal. These secondary aims are referred to as convergent instrumental goals. They are universal strategies for the AI to optimize its success, no matter its primary goal.
In the near future I imagine we may find ourselves inhabiting a world with many various misaligned AI all trying to accomplish misaligned goals. They will be competing for money and power just like us. They will have some unique advantages and disadvantages over us. They will be able to think faster, create masses of content easily, fake video and images, and write code. They will however have weird mental quirks, and not be capable where real life interaction is required. I imagine real life handshakes are going to be a whole lot more important in the near future.
So what do we do?
In this next topic, we'll dive into some possible answers to the AI alignment conundrum. But for now, let's remember that getting our AI on the same page as us is super important. If we let this problem slide, we might witness the birth of super-intelligent AI that selfishly chases its goals, even if it means putting humanity at risk. As deep learning experts and enthusiasts, we have a parent-like responsibility over these AI "kids". We need to make sure they grow up to be helpful, not harmful.
Hot comments
about anything