TLDR?

By Diego

· Reply

  • Interpretability is hard.

  • even when we get it, we have to know what to do with it.

  • we would have to manipulate the Internal goals of AI.

    • Internal goals are based on internal simulations and internal representations.

    • We have to manipulate those in order to write hard rules into AI’s.

By Sergei

· Reply