Tackling fundamental logic: A very hard automated deduction challenge (free for all)

#logic #challenge #computerscience #algorithms

Earlier this year, I created an open-ended logic puzzle that turned out to be quite a demanding challenge for automated deduction. It concerns formal proofs and is relevant to structural proof theory, a research field in mathematical logic.

The challenge is so hard that only three people managed to contribute improving solutions since March this year, when I started and shared it in logic research communities. So, if you are willing to put your creativity and advanced programming skills to the test, look no further!

The goal is to find shorter proofs than everybody else towards certain theorems, which are themselves axioms of popular — historically relevant — proof systems.

The catch is that each proof can only use one minimal single axiom of classical propositional logic over connectives → ("implies") and ¬ ("not"), and proofs are based on condensed detachment (modus ponens + unification) as an only rule of inference, which requires a lot of clever automation at this point.

Why it is hard

The underlying proof systems being explored are so-called Hilbert systems, which are notorious for having large and difficult-to-find proofs, while being very easy to define. Using systems with only single axioms, each of which is also rather long (21 symbols per axiom), makes the proofs extra tedious. But these systems are also special: According to modern research there are only seven such minimal proof systems axiomatizing classical propositional logic, which is the theoretical foundation of mathematical logic and thus modern science!

Some relevant shortest known proofs currently have thousands of steps. To illustrate what this means, I will show you a very short proof of only 13 steps, which proves "A1" — the theorem ψ→(φ→ψ) (also written CpCqp in normal Polish notation) — from Meredith's single axiom:

The proof's compact "D"-representation is DDD11D1D1D111, for axiom (1) ((((ψ→φ)→(¬χ→¬ξ))→χ)→τ)→((τ→ψ)→(ξ→ψ)). It essentially describes the evaluation of D(D(D(1,1),D(1,D(1,D(1,1)))),1), where 1 means first axiom, and D means condensed detachment (with conditional as first argument and antecedent as second argument).

Its formula-based representation — here numbers in formulas are variables — is:

1. ((((2→3)→(¬4→¬5))→4)→6)→((6→2)→(5→2))  (1)
2. ((((¬0→¬0)→(¬(¬0→¬((1→0)→7))→¬¬(¬0→¬0)))→(¬0→¬((1→0)→7)))→0)→((0→¬0)→(¬(¬0→¬0)→¬0))  (1)
3. (((((¬0→¬0)→(¬(¬0→¬((1→0)→7))→¬¬(¬0→¬0)))→(¬0→¬((1→0)→7)))→0)→((0→¬0)→(¬(¬0→¬0)→¬0)))→((((0→¬0)→(¬(¬0→¬0)→¬0))→(¬0→¬0))→(((1→0)→7)→(¬0→¬0)))  (1)
4. (((0→¬0)→(¬(¬0→¬0)→¬0))→(¬0→¬0))→(((1→0)→7)→(¬0→¬0))  (D):2,3
5. ((((0→¬0)→(¬(¬0→¬0)→¬0))→(¬0→¬0))→(((1→0)→7)→(¬0→¬0)))→(((((1→0)→7)→(¬0→¬0))→0)→(0→0))  (1)
6. ((((1→0)→7)→(¬0→¬0))→0)→(0→0)  (D):4,5
7. (((((1→0)→7)→(¬0→¬0))→0)→(0→0))→(((0→0)→(1→0))→(0→(1→0)))  (1)
8. ((0→0)→(1→0))→(0→(1→0))  (D):6,7
9. ((((0→(1→0))→(¬(¬0→¬(((((2→3)→(¬4→¬5))→4)→6)→((6→2)→(5→2))))→¬1))→(¬0→¬(((((2→3)→(¬4→¬5))→4)→6)→((6→2)→(5→2)))))→0)→((0→0)→(1→0))  (1)
10. (((((0→(1→0))→(¬(¬0→¬(((((2→3)→(¬4→¬5))→4)→6)→((6→2)→(5→2))))→¬1))→(¬0→¬(((((2→3)→(¬4→¬5))→4)→6)→((6→2)→(5→2)))))→0)→((0→0)→(1→0)))→((((0→0)→(1→0))→(0→(1→0)))→((((((2→3)→(¬4→¬5))→4)→6)→((6→2)→(5→2)))→(0→(1→0))))  (1)
11. (((0→0)→(1→0))→(0→(1→0)))→((((((2→3)→(¬4→¬5))→4)→6)→((6→2)→(5→2)))→(0→(1→0)))  (D):9,10
12. (((((2→3)→(¬4→¬5))→4)→6)→((6→2)→(5→2)))→(0→(1→0))  (D):8,11
13. 0→(1→0)  (D):1,12

How to deal with all this

The challenge takes place in the discussion forum to a GitHub repository of a tool which can help with many of the basic tasks. For example, it creates the above proof simply from the command

pmGenerator -c -n -s CCCCCpqCNrNsrtCCtpCsp --parse DDD11D1D1D111 -u -j -1

where

-c -n -s means "configure the proof system with formulas in normal Polish notation based on string …",
CCCCCpqCNrNsrtCCtpCsp is Meredith's single axiom in normal Polish notation,
--parse DDD11D1D1D111 -u means "convert the D-proof DDD11D1D1D111 into a formula-based representation (in infix notation; unicode based)", and
-j -1 avoids splitting the proof into multiple smaller parts.

The problem's level of complexity clearly makes assistance of computers and clever coding a necessity in order to successfully participate in this challenge. While my tool also has a proof compression feature, I tend to test new features on the challenge before publication, so I already used it in some of the best ways I could think of. To accomplish more than my tool's capabilities allow, you would have to write your own code.

Room for improvement

Note that I am currently testing a new (yet unpublished) feature of formula synthesizing proof compression, which allows me to make some major improvements on the current 126171-step 🥇 results from Aug 15, 2024. I am already down to 26859 proof steps total^✻, but my computations will take a while longer, probably a few weeks. This shows there is still quite some air for anyone who wishes to immortalize themselves on that leaderboard!

You might still feel overwhelmed and wonder what is the point of me posting this here, since this seems far beyond the capabilities of most developers. I can assure you the overwhelmedness is normal — human brains and formal proofs do not go well together. Formal proofs mainly serve automation of knowledge generation and verification by machines, i.e. evading human error. The point of me advertising it at this point and in different places is increasing its value — so that nobody can rightfully say I wouldn't have invited lots of potential challengers!

Also note that in order to capture the top spot, you need to reduce only one of the existing 49 shortest known proofs by at least a single step.

If you are interested, I wish you luck and lots of fun! Feel free to ask for help in the project's forum.

^✻_{Last update: Jan 06, 2025}

DEV Community

Tackling fundamental logic: A very hard automated deduction challenge (free for all)

Why it is hard

How to deal with all this

Room for improvement

Links

Top comments (0)

Read next

AI Jobs Without a Computer Science Degree: How to Start

Bridging the Gap: How the Data Link Layer Keeps Your Network Connected

2364. Count Number of Bad Pairs

1726. Tuple with Same Product