Designing Decision-Making Systems to Help Trust Their Answers
Deep learning's disadvantage compared to an unpopular cousin
This post is the last in my trio situating deep learning in terms of strengths and weaknesses, relevant to building the most trustworthy systems for automatic decision-making. My positive take was that deep learning is a natural and powerful generalization of search engines, working most effectively when we use it to find and combine prior examples related to some goal. My first major negative take was that deep learning is inherently slow, due to the requirements it creates for many steps of computation to happen in-order. I’ll wrap up now considering challenges in believing that systems built on deep learning (and a variety of other flavors of machine learning) produce reliable answers.
From Training Data to Parameters
I suggested previously that deep learning is better-understood in analogy to search engines than to reasoning engines. It typically succeeds in proportion to its ability to find examples it already knows about that are related to a request. However, it is not just memorizing examples. We covered how it generalizes solving for m and b in the equation y = mx + b from algebra class. The difference is that instead of just those two parameters, foundation models can get up to hundreds of billions of parameters today!
The double-edged sword is that, while so much information or even insight can be encoded in so many parameters, it is an uphill battle to extract structure from learned parameters, an obstacle to deriving mathematical guarantees. This general pursuit for AI is called explainability. A good intuition is that, when the training methods that learn parameters aren’t designed around deliberate structure, more work is required to find structure (and thus convincing explanations) after-the-fact.
One category of explainability for deep learning analyzes which parts of a prompt influence the final answer significantly. This kind of analysis already doesn’t explain how the relevant parts of the question informed the answer, but it’s a good start. However, such analysis suffers from classic challenges of software testing: when we only evaluate a program on a particular set of inputs, it is hard to build confidence about possible behavior on all inputs. We may have neglected to test a corner case that matters in some important scenario (or even many and frequent scenarios that escaped our imagination). That risk is serious-enough in a setting where all users have good intentions. In a cybersecurity setting where an adversary is doing his best to drive the software to bad behavior, we must assume that the adversary finds exactly the input that will trigger the worst behavior.
In a deep neural net, we can imagine corner cases in the form of weights (parameters) that have little influence in the questions being asked during evaluation but that turn out to matter for other important questions. No matter how good of a job we do explaining what parts of a question influenced the answer, we can’t be sure that future answers won’t have rather-different, problematic explanations. This example is inspired by adversarial examples and backdoor attacks in machine learning, where realistic failures would involve many weights being moderately off, though I simplify in the diagram to a single problematic weight.
Continuing the march through analogues of classic software-quality techniques, we find interesting analogies with compiler verification as I discussed it previously. With that subject matter of compilers, programs that translate between computer languages, the goal is to confirm that translation preserves behavior/meaning. I explained that one approach is certifying compilation, where every run of the compiler outputs not just a translated program but also a certificate of some kind that translation was carried out correctly. One very-flexible kind of certificate is a mathematical proof. That style of certifying explanation is being explored by many teams now not just for its “obvious” application in getting AI to do math but also in getting AI to write correct code, which could be accompanied by proof of correctness. It remains a niche approach today (e.g., there’s no similar mechanism applied by LLMs to explain arbitrary responses), though a variety of more-targeted projects involve different notions of certificates and their checkers.
I can give one U.S.-centric analogy for the nature of certificates. U.S. taxpayers need to submit annual tax returns where they don’t just declare how much they owe in taxes but also lay out, sometimes in excruciating detail across many forms, the calculations that justify their answers. A certificate is like that kind of “showing your work,” referencing mechanized rules instead of the United States tax code. A given tax return impinges on just a tiny slice of the total tax code, allowing relatively cheap auditing of any given tax return, though a search for a taxpayer’s ideal tax-return strategy could explore many parts of the law that turn out not to be relevant (i.e., some could be used but not in ways that reduce tax owed).
The following diagram shows how a single defect in the model leads both to a wrong answer and a flaw in the certificate. A checker that flags the latter helps us avoid proceeding with the former, though in a way that leaves us without a clear alternative.
I should also briefly mention LLM-associated techniques called chain-of-thought, where models are guided through spelling out intermediate steps in finding their answers; and self-consistency, producing multiple chains of thought and choosing as “winner” the answer appearing most frequently. To the extent these techniques write natural language and remain subject to the vagaries of LLM randomness, they still don’t help produce strong guarantees. Variants using formal languages converge toward generating formal proofs.
Certifying algorithms (and we could charitably view chain-of-thought as a special case) have common strengths and weaknesses. The key strength is that it is often much easier to demonstrate that a single answer is correct than to demonstrate that a system only ever outputs correct answers. This benefit can translate into lower engineering costs to build a decision engine. A counteracting weakness is the inherent potential for unpleasant surprises: a given run of the certifying algorithm may eventually output an invalid certificate, which fails independent checking. For instance, a certificate may be a math proof carried out in a series of steps, and one of the steps turns out not to follow from the previous deductions, after all. Then what is the user to do? The result is practically indistinguishable from the decision program running forever or crashing, neither of which tends to go over well with users. The system winds up in an indeterminate state, unable to commit to an answer. Considering AI broadly, it is not always the case that we have a safe default action to take, when a decision engine fails to make a recommendation. For instance, we may be driving an aerial vehicle in tricky weather, dependent on constant clever decision-making to avoid a literal crash. In those settings, reaching an indeterminate state is itself a failure.
For many reasonable notions of certificates, it remains unclear if deep learning will evolve to produce them at all, for problems that are sufficiently complex and novel. Sometimes finding the certificate is itself the hard part, e.g. looking for proofs of longstanding mathematical conjectures. Even if deep learning can be scaled to produce a certain kind of certificate, it may take unreasonably long to do so. For instance, one easy upgrade for such an engine is to add a loop that checks certificates coming out, restarting the system every time checking fails (perhaps with new prompting about what went wrong last time, to help avoid a repeat). In the worst case, such a system will run forever on hard-enough inputs. Even if it does find a certifiable answer eventually, computational cost can easily balloon with this approach, and these retry costs should be considered a special case of what I highlighted in a previous post about how top-level “agentic loops” can lead to very long delays in producing final answers. Our earlier example of control software for an aerial vehicle is a good one, where we may be able to afford neither long delays nor fallback to simple defaults; and it is not hard to find other domains where faster specialized answers at least provide clear economic value.
The counterpart from compiler verification is certified compilers, where a mathematical proof shows that a compiler behaves properly for all possible inputs. Constructing such a proof can be much harder, but it saves us from worrying about unpleasant surprises on new inputs. Compiler verification has a decent analogy to mechanistic interpretability for AI, which reverse-engineers machine-learning models into human-meaningful explanations, but the foundations of this field are still being established. In the mean time, we might also wonder about how other approaches to decision-making are more suited to proven guardrails.
Our goal will be to avoid needing to catch reasoning errors after-the-fact at all. Instead, we want to take advantage of system structure to make reasoning errors impossible.
Good Old-Fashioned AI
Deep learning and its closest relatives are contrasted with good old-fashioned AI (GOFAI), a cheeky name for symbolic AI. Whatever we call it, this older approach is centered on formal logic and other symbolic ways of representing reasoning, more in the style we’re used to from working out math derivations on blackboards. A particularly influential style is expert systems, which apply domain-specific logical rules to solve new problems. It turns out that style can avoid the challenges we just surveyed around getting stuck in indeterminate states or facing expensive checking of answers. You’ve probably heard about how thoroughly out-of-style rule-based systems are today, and we’ll come back to that infamous history, but let me review first what kind of technology we are talking about. Later posts will get into relevant and interesting developments since expert systems fell out-of-favor, including advancements in programming tools and hardware, plus the chance to take advantage of statistical machine learning where its weaknesses are less relevant.
This kind of expert system is based on rules that deduce new facts from those already known. For instance, here is a set of rules that could be applied by venture capitalists to decide on valuations for AI startups.
RULE: Add $1M to the valuation for every occurrence of the word "agentic" in the pitch deck.
RULE: If person P has a LinkedIn profile that lists employment at company C, then consider that P worked for C.
RULE: If a cofounder has worked at a major AI company, add $5M to the valuation.
RULE: If the product has probability R of destroying humanity, add $10M/(1 - R) to the valuation.To run such a logic program, we can start with a set of known facts (like the contents of cofounders’ LinkedIn profiles) and keep deducing new facts via our rules until no more follow. At the end, we sum up all the incremental amounts that have been deduced.
It is readily apparent how a simpler explainability approach applies to this kind of reasoning system. Every rule can be scrutinized by experts, in advance of deploying a system. In practice, the rules would be written in some programming language rather than English, for extra avoidance of potential ambiguity. Notations of formal logic are typically used. However, the narrative above of how one “obviously” executes a logic program works pretty well to explain how the real thing operates.
One way to trust outputs of a rule-based system follows certifying compilers: have every execution output a trace for how the rules were used to come to a conclusion. The example I’ve used here is a little unnecessarily complicated from this perspective, but let’s simplify by saying (1) the goal of the system is to certify a minimum valuation for a startup, and (2) assume we have no rules that add negative amounts to the tally. Then a certificate is a list of rule instances: each element is one rule from the knowledge base, along with a further trace for each one, explaining how we deduce its premises. For instance, we can justify a use of the rule about cofounder-from-major-AI-company by including a trace demonstrating that the cofounder worked for a particular company. That nested trace may itself invoke another rule and need to include further traces, but eventually the process bottoms out in referencing only facts that we assumed are true. The point is that, while in general it may be complex to make a conclusion and produce its trace, it is simple and cheap to check a trace.
The story is quite similar to what we’ve considered for my own specialty of formal verification and its applications to recursive self-improvement and code-sharing by agents. Indeed, formal verification, including automated theorem proving, is an example of an area that has been so successful as to be ejected from the popular conception of “AI,” following the old maxim that the “AI” category only includes reasoning tasks that seem sufficiently unsolved today.
Thinking back to our discussion of performance bottlenecks inherent to deep learning, we can see an appealing advantage for rule-based systems beyond explainability. A key principle we relied on was that the latency, or time it takes to get the complete answer to a question, is proportional to the critical path length of a system, measuring the longest sequence of steps that necessarily must occur in sequence. Some conclusions of a rule-based system can be justified with shallow traces, meaning that we don’t need to use rules whose premises are justified with rules whose premises appeal to rules… to too extreme of a depth. In principle, with a good parallel implementation, the critical path is determined by the depth of the trace. In other words, answers with shallow proofs should be findable quickly, with convincing evidence for their truth. The same expert system could still be ready to run for longer on more complex questions, taking time proportional to that complexity.
A Discredited Approach?
Now I can return to confronting how thoroughly expert systems have gone out of fashion. A major AI winter starting in the 1980s was centered on disillusionment with expert systems. Then we had a period when roughly all AI approaches were widely viewed with skepticism, followed by the explosion of deep learning in the 2010s, putting expert systems at even more of a relative disadvantage in the popular imagination. Maybe I should feel sheepish, then, to reveal the truth about this blog: a major theme is going to be the increasingly compelling argument for returning to this style of automated decision-making (sometimes in cooperation with other techniques like deep learning). There are two major apparent obstacles in the way of achieving good results.
First, there has been massive full-stack investment in deep learning and related techniques, leaving competitors with quite some catching-up to do. Consider:
Foundational work on algorithms in the domain
AI hardware accelerators like GPUs, as well as the programming tools that complement them, which together provide amazing parallel performance
Wide distribution of expertise in this kind of computing
Even wider familiarity with how to get good results prompting generative-AI systems
Second, statistical machine learning has shown amazing results in so many domains where approaches based on symbolic logic stalled. If we randomly sample populations like business users for their most-important problems where they want help from artificial intelligence, we mostly get answers where generative AI is way ahead of other methods today. The burden of proof is on someone boosting another approach, to show how it can be at all competitive.
I’ll take a first shot at that argument in my next post. The key is to zoom out and apply a codesign approach, looking beyond the normal scopes in which AI problems are defined. Later posts will return to performance engineering of whole computing stacks for systems based on symbolic logic.



