Implementing Asimov’s Laws of Robotics (The first law) - How alignment could work.

King of the Hill 1

Current king of the hill (#1)

at

Implementing Asimov’s Laws of Robotics (The first law) - How alignment could work..

Submitted by

Josh

on

5/16/2024.

This king text has no current edits attempting to take its place.

This needs to be expanded upon

“It’s easy guys! (Its not). Just take the internal goal and tack on the following. Take the representation for human and put it in relationship with the representation for harm. Then take the AI representation of self and put it in relationship with the new harming-human aggregation. Do this all in such a way so that when the simulated outcome interacts with the new internal goal, the internal measurement function massively devalues any plan in which such a harm-human outcome becomes likely. BAM first law applied.”

Also i know this is a part of the handwave but… how do you imagine putting representations in the correct relationships? I suppose that’s under the “solve interpretability” first umbrella.

By frank-green ·

Reply

That was really interesting. So you see technical alignment to be possible once the AI has the ability to simulate and create plans. Otherwise the internal goal is not invariant to the input. That kind of makes sense.

I do want you to dig a bit deeper into the implementation however. I know its a long article but i think it deserves a bit more.

By IanMiller ·

Reply

TBH i got tired at the end there… lol. Ill submit an edit soon with that ending a bit more refined.

By Josh ·

Reply

More to explore:

Taking on the Camino for Ovarian Cancer

Shortcuts

  • Follow Ingrid’ Clancys progress step by step, on her picture blog at http://bit.ly/IngridClancy2024 (please “Follow” and “Comment” there to give her ...

How to make social media smarter

Some of the best explanations and deep dives I have ever found were on social media platforms. Do not ask me to link to them. They are gone. Lost in the flood.

...

A story about creativity and feedback

Creativity is like polishing a hypnotic diamond covered with shit.

Stick with me.

Lets say you are drawing a picture. Or writing an article. Or designing a ...