Safe Recursive Self-Improvement with Verified…

Adam Chlipala

Mar 24

A just-right challenge problem in Al alignment

Read →

6 Comments

Adam Chlipala

Mar 24

I'm experimenting again with crossposting to LessWrong, and you may find additional discussion there.

https://www.lesswrong.com/posts/pnPDd8uo2NZqn2dtG/safe-recursive-self-improvement-with-verified-compilers

Kiran

Apr 18

This is not quite the challenge you outline in this post, but seems fairly related, scoping out a smaller subset of the problem which seems out of the capabilities of current models: https://www.basis.ai/blog/verified-compiler/

Reply (1)

Adam Chlipala

Apr 18

Thanks for the pointer to that work. Your case study is a good representative of today's consensus approach to this kind of challenge. In the coming months on this blog, I'm going to make the case for a very different style, which doesn't just assume "the models will keep getting better." In fact, machine learning as we know it today will play a supporting role, at best. I don't want to drop too many spoilers at this point, but I'll preview a few of the elements I'll suggest handling differently.

- Taking several days to generate this kind of compiler (even fully verified) is way too long, and it turns out my very next post will explain why we should expect deep learning to bottleneck on this kind of task. Of course, everyone is going to agree that faster is better, so this point won't be controversial.

- I think WASM is a very 20th-century kind of language that should be abandoned, and I'll explain why in a later post.

- Neurosymbolic approaches are going to work way better than the Claude Code style, and I'll present my favorite in stages.

- An important aspect is going to be retiring proof assistants like Lean and Rocq, which aren't so well-designed to take advantage of what's coming in AI, and I'll explain what we should build instead.

- I probably won't test the patience of this blog's audience by explaining the following one, but simulation arguments are a bad way to structure compiler proofs. Omnisemantics (see https://adam.chlipala.net/papers/OmniTOPLAS23/ ) offers a nicer approach, especially when we commit to declaring program nontermination as undefined behavior, which is also the right choice.

P.S.: I think your article's statement that "JavaScript has no substantial prior formalization work" is dramatically wrong, e.g. see JSCert at https://jscert.org/ .

Reply (1)

Kiran

Apr 18

Oh, that sounds very interesting! I will look forward to hearing more as your planned blog posts come out.

> P.S.: I think your article's statement that "JavaScript has no substantial prior formalization work" is dramatically wrong, e.g. see JSCert at https://jscert.org/ .

Ah, you're right. Thanks for the catch. Somehow the JSCert formalisation effort entirely slipped my mind. The fact that I hadn't heard more publicity about it in the past few years had lead to it leaving my memory or concluding that it was only a partial/incomplete formalisation effort, but that was wrong. Will update the blog post.

James Torre

Mar 25

The formal verification that will enable safe recursive self-improvement is precisely that which solves the Trusting Trust and Lobian Obstacle problems:

https://gist.github.com/mietek/ec9baffc6d154f1febaa

I am working towards this from the foundation of Dan Willard's Self-Justifying Axiom Systems.

Reply (1)

Adam Chlipala

Mar 25

I had a little related text in a draft of this post, but LLMs convinced me to remove it as too much of a detour into formalism that most readers won't follow. Thanks for the chance to work it into the comments. :-)

I agree with the importance of the direction you're noting. Let me try to spell out the principle that I have in mind, and we can see if it lines up with your plan.

We can fix one formal deduction system at the start of the history of self-improvement. It is a syntactic definition, including the syntax of formulas and of proofs, with an algorithm for proof checking. As self-improvement progresses, each artifact is created alongside a proof in the fixed formal system -- or at least the existence of such a proof is proved. Crucially, there is never a need to develop a *new* formal system, let alone prove its consistency. Every later step takes the available proof system as a fixed requirement.

Structure and Guarantees

Safe Recursive Self-Improvement with Verified…