Interpretability is hard.
even when we get it, we have to know what to do with it.
we would have to manipulate the Internal goals of AI.
Internal goals are based on internal simulations and internal representations.
We have to manipulate those in order to write hard rules into AI’s.
TLDR?
· Reply
Make
Diego
your Representive in the
Implementing Asimov’s Laws of Robotics (The first law) - How alignment could work.
topic?
Share
Moderate