Implementing Asimov’s Laws of Robotics (The first law) - How alignment could work.

Current king of the hill (#1)

at

Implementing Asimov’s Laws of Robotics (The first law) - How alignment could work..

Submitted by

Josh

on

5/16/2024.

This king text is currently defending against one edit.

The

edit

by Josh.

That was really interesting. So you see technical alignment to be possible once the AI has the ability to simulate and create plans. Otherwise the internal goal is not invariant to the input. That kind of makes sense.

I do want you to dig a bit deeper into the implementation however. I know its a long article but i think it deserves a bit more.

By IanMiller ·

Reply

TBH i got tired at the end there… lol. Ill submit an edit soon with that ending a bit more refined.

By Josh ·

Reply

This needs to be expanded upon

“It’s easy guys! (Its not). Just take the internal goal and tack on the following. Take the representation for human and put it in relationship with the representation for harm. Then take the AI representation of self and put it in relationship with the new harming-human aggregation. Do this all in such a way so that when the simulated outcome interacts with the new internal goal, the internal measurement function massively devalues any plan in which such a harm-human outcome becomes likely. BAM first law applied.”

Also i know this is a part of the handwave but… how do you imagine putting representations in the correct relationships? I suppose that’s under the “solve interpretability” first umbrella.

By frank-green ·

Reply

More to explore:

Taking on the Camino for Ovarian Cancer

Shortcuts

  • Follow Ingrid’ Clancys progress step by step, on her picture blog at http://bit.ly/IngridClancy2024 (please “Follow” and “Comment” there to give her ...

Mitigating the Potential Dangers of Artificial Intelligence

The bots are coming! Artificial intelligence’s abilities are growing. What are the potential dangers? This is the hub for discussing this coming future.

It is ...

Implementing Asimov’s Laws of Robotics (The first law) - How alignment could work.

The Three Laws of Robotics

In 1942 Isaac Asimov started a series of short stories about robots. In those stories, his robots were programed to obey the three la...