Implementing Asimov’s Laws of Robotics (The first law) - How alignment could work.

King of the hill #1


Implementing Asimov’s Laws of Robotics (The first law) - How alignment could work..

Submitted by




It held the top position until it lost to an


written by Josh (4 YES 0 NO).

That was really interesting. So you see technical alignment to be possible once the AI has the ability to simulate and create plans. Otherwise the internal goal is not invariant to the input. That kind of makes sense.

I do want you to dig a bit deeper into the implementation however. I know its a long article but i think it deserves a bit more.

By IanMiller

· Reply

TBH i got tired at the end there… lol. Ill submit an edit soon with that ending a bit more refined.

By Josh

· Reply

This needs to be expanded upon

“It’s easy guys! (Its not). Just take the internal goal and tack on the following. Take the representation for human and put it in relationship with the representation for harm. Then take the AI representation of self and put it in relationship with the new harming-human aggregation. Do this all in such a way so that when the simulated outcome interacts with the new internal goal, the internal measurement function massively devalues any plan in which such a harm-human outcome becomes likely. BAM first law applied.”

Also i know this is a part of the handwave but… how do you imagine putting representations in the correct relationships? I suppose that’s under the “solve interpretability” first umbrella.

By frank-green

· Reply

More to explore:

Generating social media posts - AI prompting

Most AI prompting for social media are automated t…

AI Prompting collaborative notes

Everyone and their grandma is a prompt engineer ap…

Image preview

AI & Over Optimization

Artificial Intelligence is based on optimization.…