Hello everyone,
Lawfare has posted some additional short-form materials related to our Law-Following AI article, covered in yesterday’s newsletter. These might be of interest to people who, understandably, do not want to read the whole
Ketan and I coauthored this short-form post, summarizing some of the core arguments of the longer article.
I joined the Lawfare Daily podcast to talk about the article. You can also view this on YouTube.
Please share widely!
Well done on getting this out - I remember seeing your earlier work on LFAI on LessWrong. A few questions came to mind as I read the blogpost (I have not read the paper so please let me know if the answers are in there):
Would you agree that LFAI becomes most compelling in a world where we’ve already solved (or substantially mitigated) the problem of deceptive alignment? That is, where models no longer scheme against us.
How confident are you that law-following constraints meaningfully reduce the risk of AI takeover? It seems plausible that more capable models could follow the letter of the law while still accumulating power and gradually disempowering human actors. Is there a risk that LFAI gives us a false sense of security?
What incentives do current labs have to integrate law-following constraints into model specifications? Do you see this being driven by internal alignment goals, reputational concerns, or external regulation? And if the latter, what kind of regulation would you expect (or hope) to require this?