“It’s easy guys! (Its not). Just take the internal goal and tack on the following. Take the representation for human and put it in relationship with the representation for harm. Then take the AI representation of self and put it in relationship with the new harming-human aggregation. Do this all in such a way so that when the simulated outcome interacts with the new internal goal, the internal measurement function massively devalues any plan in which such a harm-human outcome becomes likely. BAM first law applied.”
Also i know this is a part of the handwave but… how do you imagine putting representations in the correct relationships? I suppose that’s under the “solve interpretability” first umbrella.
This needs to be expanded upon
Also i know this is a part of the handwave but… how do you imagine putting representations in the correct relationships? I suppose that’s under the “solve interpretability” first umbrella.
· Reply
Make
frank-green
your Representive in the
Implementing Asimov’s Laws of Robotics (The first law) - How alignment could work.
topic?
Share
Moderate