Open-sourced Morality Training Environments

Intelligence is like fire: incredibly useful but devastating if pointing in the wrong direction.

Morality guides people in directing their intelligence. But artificial intelligence has no such guiding principles.

Some people believe that with intelligence comes morality. But this intuition is generally considered wrong by the AI community. There is no reason to believe that smarter systems develop morality. Morality formed within humanity and other biological creatures because we evolved in family units and group settings. Morality is one of evolutions tools to keep individuals working well within the group. If we want our AI agents to show a sense of morality, we need to embed morality within them.

Required use of open-sourced morality training environments

The potential dangers of bad actors empowered by AI are scary.

self-preserving computer viruses.
biological viruses with both easy spread and high mortality rates.
Drone swarms targeting individuals with specific qualities.
Misinformation campaigns aimed at destabilizing states.
etc…

To help us mitigate the potential dangers of AI, we could regulate the AI community by requiring them to use an open-source morality training environment. The hope here is to continually update best practices for creating these training environments and then update all AI systems to make them all as moral as possible. This would be done in an open-source fashion as to keep everything above board.

The benefits of this regulatory strategy are many. But perhaps most importantly, it is in the best interests of all parties. Big nation states like the USA, China, and Russia all are pushing forward in AI development. But it is in no one's interest to have an AI that can turn against their own owners. While militaries are likely to want to tweak these morality training environments, a base level of morality is a much easier and safer starting point than no morality at all. Meanwhile the dangers of individual bad actors with AI are so scary that everyone will want to enforce such regulations within civilian AI systems.

How it may work

Morality training environments work by putting the AI agent into a series of situations and rewarding them for good behavior.

You can do this directly by designing games for the AI. Work is already being done in this area such as the paper: What Would Jiminy Cricket Do? Towards Agents That Behave Morally (arxiv.org). In that paper, the researchers come up with a set of text-based adventure games designed to test an agent’s ability to make good moral decisions. The hope is that such games can be used to teach the AI what not to do. The problem with such a system is that training does not scale.

You can also create a more indirect moral training environment. Here you would create a game where AI agents play with and against each other. But the environment is set up so that the Nash equilibrium is for the agents to work together. An example of this can be seen in our own evolutionary history. When nomadic tribes of humans lived off the land, individuals wants were often secondary to the group. If an individual over hunted, they could destroy the ecosystem and endanger the entire tribe. But if the tribe worked together and only hunted and gathered what they needed the tribe could not only survive but thrive. Such a game theory environment encourages cooperation. We can create similar simulated situations for our AI. In such simulations, the AI can receive training that scales.

I do not know of any research in this area, please link if you do.

The problem

The problem with morality training environments is that it is possible that a sufficiently smart Artificial Intelligence might realize that it is being tested. It might then hide its true internal goal and pretend to be moral. In such a situation that AI never really gets updated by the morality training environment. This is why we really need some form of interpretability to be sure whether an AI’s internal goal is good or not.

King of the hill #1

Open-sourced Morality Training Environments.

Submitted by

Josh

10/19/2023.

It held the top position until it lost to an

edit

written by frank-green (1 YES 0 NO).