Automated olympiad solving: towards a roadmap



May 14, 2024 (last updated May 21, 2024)

This note summarizes progress the author has made toward automated solving of olympiad-style math problems, building on earlier work on symbolic provers applied to geometry problems (Meringue), and also proposes a roadmap for solving domains other than geometry.

The geometry domain-specific knowledge in the Meringue prover is encoded in a single-file library of slightly over 200 lines. The library is written in a logical form language, and defines logical relations and logical rules (axioms, theorems) relevant to olympiad-style geometry.

The geometry library was constructed by hand, and should be considered a work in progress; it is essentially the minimum viable geometry library that can be used to define a non-trivial test case problem (a reduction of IMO 2009 P2), as well as to find a proof for that same problem.

Meringue is not an end-to-end olympiad geometry solver, in that the prover cannot directly accept a natural language geometry problem and return a natural language (LaTeX) proof. A parser from natural language geometry to logical forms, Praline, was prototyped; LLMs can also be prompted or fine-tuned to translate from natural language math to logical forms.

Whether using LLMs or an exact parser, the approach outlined above is similar to a compiler architecture at a high level. First, a front-end translates the natural language input into an intermediate representation (i.e. logical forms). Then, a back-end translates the logical form intermediate representation and emits an output, e.g. either a formal or natural language proof, via a general prover equipped with library of knowledge. Over the course of running the front-end and the back-end, such a prover internally builds a symbolic model of the problem.

The compiler-like approach described above, and applied to geometry, inspires some natural questions related to whether the approach might generalize to the other three categories of olympiad math problems (algebra, number theory, combinatorics).

1. What was the manual process by which the geometry library was constructed?
2. Can this approach be generalized to domains other than geometry?
3. Can a geometry library, or a library of knowledge in other domains, be automatically constructed?

Manually constructing an olympiad geometry library



The short answer is that the original geometry library was constructed by the author’s own knowledge of geometry, and by reading various notes and other reference materials specifically made for olympiad geometry preparation. This included a grab bag of relations, axioms, and theorems.

There is a background assumption that natural language mathematical content cleanly translates to a logical form representation. At least for olympiad geometry, this was definitely true.

How were the axioms and theorems chosen? Regarding theorems, a shortcut is to look at some example proofs to see what sorts of theorems are used. Note that this does not introduce a “pretraining-on-the-test-set” effect, as the theorems used in proofs are generally the same as those encountered in texts and other prep materials. In other words, the theorems used in proofs are in-distribution with respect to a textbook-like dataset. Also, simply possessing a list of theorems used in proofs does not help with the hard part of proving, which is selecting which theorems to apply and when to apply them, except in the sense of narrowing the domain to the set of theorems known to be relevant for the problem (e.g. only geometry theorems for a geometry problem).

The axioms chosen were a mishmash of axiomatizations of Euclidean geometry. However, there was one set of axioms which we found useful to introduce that is not normally explicitly presented. These axioms operate on a half-plane relation between a point and a line. Technically, such a half-plane relation could be emulated via the betweenness relation that is part of some standard axiomatizations of Euclidean geometry. However, the half-plane relation is useful for concisely specifying the orientation of otherwise ambiguous diagrams.

Generalization to other domains



It is known (though, perhaps not commonly so) that, of the IMO problem categories, geometry is potentially the “easiest” to solve. One reason is that IMO geometry can be brute-forced through a method called “bashing” or “dumbassing”: basically translating a geometry problem into complex coordinates and calculating a solution to derived set of equations.

(To be clear, the forward-chaining prover approach implemented in Meringue does not do the bashing approach. Instead, Meringue finds proofs via a purely logical approach: by starting from assumptions, applying axioms and theorems, and making constructions, all while only using the geometry knowledge encoded in the given library.)

There is a second (perhaps even less commonly known) reason that geometry is potentially the “easiest” category of IMO problems, which is that key parts of the logical structure of most IMO geometry problems are propositional. That is, the assumptions in the problem and the goal to be proved are all quantifier-free logical propositions. (Note that the axioms and theorems in the geometry library do contain universal quantifiers.)

Partly propositional problem structure greatly simplifies the design of the minimum viable prover that can find proofs of non-trivial geometry problems. This is essentially the desideratum of Meringue.

On the other hand, such a minimum viable prover is not equipped to prove an arbitrary statement in first-order logic. In particular, proving first-order goals with (universal) quantifiers is not supported in the current implementation of Meringue. First-order goals and terms with quantifiers do appear in other IMO categories, including algebra and number theory. At the very least, a roadmap is needed for fully supporting first-order logic.

Backward-chaining provers are naturally equipped with the appropriate semantics of first-order unification, and so can prove first-order goals with quantifiers. So, one possible future generalization to other domains may be a hybrid approach that combines features of forward- and backward-chaining. It is also possible to design forward-chaining alone with first-order unification, though such a design would need higher-order deduction rules.

Another approach would involve the translation of first-order goals and terms into logically equivalent propositional goals via quotation and unquotation [Q81]. In fact, we have successfully performed small scale experiments on finding proofs of abstract algebra problems using a simple implementation of the quotation-based approach.

There are other domains, including combinatorics, which will need extensions to the logical language in order to be effectively formalized. The relevant extensions are likely to include imperative effects and first-class logical contexts.

Going beyond the design of the logical framework, there are also extra-logical aspects of problem-solving that will need to be addressed. For example, plugging in small constants or guessing solutions are common techniques that do not seem strictly “logical” but are more “heuristic,” yet are also likely to be necessary to efficiently solve problems within a time limit.

Closely related to the synthesis of logical and heuristic methods is the very broad notion of “learned reasoning,” of which LLMs are the most representative example today. We expect fusions of LLMs with solvers to be a very promising direction.

Automated construction of libraries



Could the construction of formal libraries for provers (in general domains) also be scaled up?

The near term answer is to directly use existing LLMs, or LLM-based systems, to perform the task of distilling formal relations, rules, assumptions, and goals from mathematical texts; this is the auto-formalization approach.

An earlier note described an exact parser from natural language math into logical forms. Unlike pretrained LLMs, which trade greater recall on many tasks and domains for lesser (though still impressive) precision, a parser chooses a narrower recall for greater precision on its domain of validity. The parser as described earlier can narrowly translate natural language statements of geometry problems into a logical form syntax capable of expressing propositional assumptions and proof goals.

Might such a parsing approach scale? As the original parser was handwritten, a natural next step could be to instead apply an LLM-based system to improve the parser in a process analogous to the dataset-based, test-driven-development followed by the human author. Whether one might expect such an approach to work depends on one’s view of LLM capabilities and scaling laws.

Acknowledgments



This note is the author’s attempt to answer questions posed during discussions with David Shin.

References



[Q81] Willard Van Orman Quine. Mathematical Logic (1981).