Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Group theory crash #519

Closed
wisnesky opened this issue Feb 11, 2025 · 23 comments
Closed

Group theory crash #519

wisnesky opened this issue Feb 11, 2025 · 23 comments

Comments

@wisnesky
Copy link

Hi all, I've been experimenting with encoding equational theories in the sense of universal algebra into Egglog. For example, group theory. I understand that I could enter a complete set of re-write rules as discovered by Knuth-Bendix completion but I wanted to see what would happen if I just use the usual 3 equation encoding:

(datatype G
 (id)
 (inv G)
 (plus G G)
)

(rewrite (plus (plus a b) c) (plus a (plus b c)))
(rewrite (plus a (plus b c)) (plus (plus a b) c))

(rewrite (plus a (id)) a)
(rewrite a (plus a (id))) //crash

(rewrite (plus (inv a) a) (id))
(rewrite (id) (plus (inv a) a)) //error

I understand why the erroneous rule is flagged - its right hand side refers to the variable a that is not bound on the left, and the crash is on a kind of rule that is usually disallowed (variable -> non-variable). Here is the message:

[INFO ] Welcome to Egglog REPL! (build: 0.4.0_2025-02-11_)
...
(rewrite a (plus a (id)))
thread 'main' panicked at src/actions.rs:80:69:
called `Option::unwrap()` on a `None` value

I'm wondering - can Egglog be used to semi-decide ground equality of arbitrary equational theories, such as group theory? This paper: https://arxiv.org/pdf/2501.02413 makes it seem like the answer is 'yes' and so I'm trying to figure out how to do, and wondering if the above approach is correct or if there are other approaches.

@yihozhang
Copy link
Collaborator

Right, rule like a -> (plus a (id)) is supported in egg, but not directly in egglog. As egglog translates rules to Datalog/chase-like rules, all variables need to be bound. A standard workaround is to use a universe relation:

(datatype G
 (id)
 (inv G)
 (plus G G)
)

(relation universe (G))
(rule ((= e (id))) ((universe e)))
(rule ((= e (inv a))) ((universe e)))
(rule ((= e (plus a b))) ((universe e)))

(birewrite (plus (plus a b) c) (plus a (plus b c)))

(birewrite (plus a (id)) a :when ((universe a)))

(birewrite (plus (inv a) a) (id) :when ((universe a)))
;; which gets translate to two (slightly inefficient) rules
;; (rule ((= e (plus (inv a) a)) (universe a)) ;; we don't need (universe a) here
;;       ((union a (id))))
;; (rule ((= e (id)) (universe a)) 
;;       ((union a (plus (inv a) a))))
      

@wisnesky
Copy link
Author

wisnesky commented Feb 11, 2025 via email

@yihozhang
Copy link
Collaborator

In this case, (id) has not been populated in the "database", and checking (= (id) (id)) implicitly checks the existence of (id), so it failed. It would work if we manually insert (id) into the database. I also made a slightly more involved example here.

(datatype G
 (id)
 (inv G)
 (plus G G)
)

(relation universe (G))
(rule ((= e (id))) ((universe e)))
(rule ((= e (inv a))) ((universe e)))
(rule ((= e (plus a b))) ((universe e)))

(birewrite (plus (plus a b) c) (plus a (plus b c)))

(birewrite (plus a (id)) a :when ((universe a)))
(birewrite (plus (id) a) a :when ((universe a)))

(birewrite (plus (inv a) a) (id) :when ((universe a)))
(birewrite (plus a (inv a)) (id) :when ((universe a)))

;; ab=1 --> b=a^-1
(rule ((= (id) (plus a b)))
      ((union b (inv a))
       (union a (inv b))))


;; need to populate (id) in the e-graph first
(id)
(check (= (id) (id)))

(constructor a () G)
(constructor b () G)

(let iab (inv (plus (a) (b))))
(let ibia (plus (inv (b)) (inv (a))))

(run 5)

(check (= iab ibia))

@wisnesky
Copy link
Author

wisnesky commented Feb 11, 2025 via email

@yihozhang
Copy link
Collaborator

Yes! The issue I had in my previous program is that it didn't put a and b into the universe relation. We don't actually need the extra rule

(datatype G
 (id)
 (inv G)
 (plus G G)
)

(relation universe (G))
(rule ((= e (id))) ((universe e)))
(rule ((= e (inv a))) ((universe e)))
(rule ((= e (plus a b))) ((universe e)))

(birewrite (plus (plus a b) c) (plus a (plus b c)))

(birewrite (plus a (id)) a :when ((universe a)))
(birewrite (plus (id) a) a :when ((universe a)))

(birewrite (plus (inv a) a) (id) :when ((universe a)))
(birewrite (plus a (inv a)) (id) :when ((universe a)))

(constructor a () G)
(constructor b () G)
(universe (a))
(universe (b))

(push)
(union (id) (plus (a) (b)))
(run 5)
(check (= (a) (inv (b))))
(pop)

(push)
(let iab (inv (plus (a) (b))))
(let ibia (plus (inv (b)) (inv (a))))
(run 5)
(check (= iab ibia))
(pop)

@wisnesky
Copy link
Author

wisnesky commented Feb 11, 2025 via email

@yihozhang
Copy link
Collaborator

yihozhang commented Feb 13, 2025

how does this work with multiple sorts? For example, if I wanted to use a multi-sorted structure such as a category instead of a group, where the constructors depend on two sorts, Ob and Hom, how would the syntax change?

Is this something you have in mind?

(sort Ob)
(sort (Hom a b) for ((a b : Ob)))

(function id () (Hom a b) for ((a b : Ob)))

Assuming I understand your question correctly, I think this is hard in current egglog, since egglog does not have real parametric/dependent types. Real parametric/dependent types introduce problems since each function is supposed to be backed by a database table, and it's not clear when to instantiate a parametric function (like id in the above case) with concrete types.

I'm guessing Egglog can semi-decide entailment of cartesian theories (skolemized horn theories)

Does it suffice to show EqSat subsumes Skolem chase (which we show in the semantics paper)?

  1. can this solution to the ground word problem be lifted to the full word problem?

I think egglog can prove forall c, c+a = c+b by skomizing c to an uninterpreted constant. For entailment, if you have forall c, c+a = c+b as an axiom and check if a+d() = b+d() holds (for Skolem constant d()), would that just work?

@wisnesky
Copy link
Author

wisnesky commented Feb 13, 2025 via email

@yihozhang
Copy link
Collaborator

something like this for the action of a group on a set

Yeah egglog supports multiple sorts. There are also many examples in the web demo.

(datatype*
 (G 
  (id) 
  (inv G)
  (plus G G))
 (A
  (action G A)))

in general, there is no way to turn an arbitrary semi-decision procedure for ground equality in universal algebra into a semi-decision procedure for non-ground equality

I never thought of it! This sounds interesting. Yes EqSat can semi-decide the ground word problem but I kinda just take the full word problem for granted. Let me know if you make any new discoveries about the connection between EqSat and the full word problem.

@wisnesky
Copy link
Author

wisnesky commented Feb 18, 2025 via email

@yihozhang
Copy link
Collaborator

yihozhang commented Feb 19, 2025

This is interesting. I actually don't know! Maybe another example is that given a signature containing an empty/singleton sort, how can egglog prove for all x,y, x=y

(datatype S (id))
(check ((= x y)) ;; ??

I don't think egglog can directly prove this.

Here is one idea using denial constraint:

(datatype S (id))
(relation U (S))
(U (id))

(rule (
 (U s)
 (U t)
 (!= s t)
) (
 ;; alternatively, put something in a denial database)
 (panic "the statement is false")
))

For the denial rule to be sound, all the axiomatic rules need to saturate first, which is unlikely when the universal model is not finitely representable. So yeah, I'm not quite sure.

Another idea is to encode a more powerful logic in egglog?

(datatype E
 (Forall Var E)
 (Exists Var E)
 (And E E)
  ...
)

I don't know if this is realistic though.

@wisnesky
Copy link
Author

wisnesky commented Feb 19, 2025 via email

@wisnesky
Copy link
Author

Is there a modern egglog equivalent to "saturate" from some previous papers? I'm now trying to hook up Egglog as a chase engine / initial model finder instead of theorem prover, so I'm trying to run rules until they reach a fixed point, even at risk of divergence. The examples I've found such as towers of Hanoi just choose a really big number and use 'run'. Thanks again!

@yihozhang
Copy link
Collaborator

Yes! You can do (run-schedule (saturate (run))). The examples are outdated.

@wisnesky
Copy link
Author

wisnesky commented Feb 22, 2025 via email

@wisnesky
Copy link
Author

wisnesky commented Feb 22, 2025 via email

@yihozhang
Copy link
Collaborator

what does egglog do with what the chase would call contradictions? For example, if one and two are distinct constants of sort Nat, and we chase with a rule such as “one = two”, the desired behavior is usually for the chase to fail, rather than form a quotient where one = two are equated, as would happen in building an initial model. Can egglog be made to fail in such cases?

In chase's terminology, every sort value in egglog is a "marked null", so unioning one and two does not cause an error. Instead, users wrote rules like this

(rule (
  ((= (True) (False))
) (
  (panic "error")
))

(rule (
  (= (Num x) (Num y))
  (!= x y)
) (
  (extract x)
  (extract y)
  (panic "error")
))

In 1:32-38: (= x y)
Unbound function =

Ah right! To say two things are equivalent in the RHS, egglog uses the union keyword.

(rule (
  (= (name x) (name y))
) (
  (union x y)
))

This also needs to be a rule than rewrite. A rewrite of the form (rewrite t1 t2) is equivalent to

(rule ((= x t1))
(
  (let y t2)
  (union x y)
))

@wisnesky
Copy link
Author

Understood: re: the chase using only labeled nulls. BTW, in "Fast Left Kan Extension using the Chase": https://arxiv.org/abs/2205.02425 we found a fairness condition (condition 3 in Proposition 3 on p. 22) required for chase sequences implementing left kan extensions that is similar to yours, although so far we haven't been able to prove the conditions equivalent. Left Kan extensions can be used very directly to construct initial models.

Also, I wanted to report that "birewrite" on ground terms does not seem to trigger "union", which I thought was surprising. The egglog below sets up a database table called Person with an attribute (and primary key) called name of type dom, and two people g1 and g2:

(datatype*
(Person
(g1)
(g2))
(dom
(name Person))
)
(g1)
(g2)

(relation univPerson (Person))
(relation univdom (dom))

(rule ((= var0 (g1))) ((univPerson var0)))
(rule ((= var0 (g2))) ((univPerson var0)))
(rule ((= var0 (name var1))) ((univdom var0)))
(rule ((= (name x) (name y))) ((union x y)) )

If I use:

(union (name (g1)) (name (g2)))

the result is one Person, as expected. But if I use:

(birewrite (name (g1)) (name (g2)))

then egglog still reports two people.

@yihozhang
Copy link
Collaborator

yihozhang commented Feb 27, 2025

Sorry was/am really swamped in a deadline push!

Fast Left Kan Extension using the Chase

This looks interesting! I'll make sure to read it when I'm freer

then egglog still reports two people.

Ah, I think this is because you don't (run) the rules. This is one weird thing about egglog if you come from the Chase/datalog world :)

edit: another common reason is maybe you haven't populated (name (g1)) or (name (g2)) in your database

@wisnesky
Copy link
Author

Sorry, the 1 vs 2 sized thing is after running to saturation. Here's the two full programs:

(datatype*
(Person
(g1)
(g2))
(dom
(name Person))
)
(g1)
(g2)

(relation univPerson (Person))
(relation univdom (dom))

(rule ((= var0 (g1))) ((univPerson var0)))
(rule ((= var0 (g2))) ((univPerson var0)))
(rule ((= var0 (name var1))) ((univdom var0)))
(rule ((= (name x) (name y))) ((union x y)) )

(birewrite (name (g1)) (name (g2)))

(run-schedule (saturate (run)))

(print-size univPerson)

gives size two but

(datatype*
(Person
(g1)
(g2))
(dom
(name Person))
)
(g1)
(g2)

(relation univPerson (Person))
(relation univdom (dom))

(rule ((= var0 (g1))) ((univPerson var0)))
(rule ((= var0 (g2))) ((univPerson var0)))
(rule ((= var0 (name var1))) ((univdom var0)))
(rule ((= (name x) (name y))) ((union x y)) )

(union (name (g1)) (name (g2)))

(run-schedule (saturate (run)))

(print-size univPerson)

gives size one. The only difference between the two is "union" vs "birewrite".

I was able to hook up egglog to CQL as a chase engine for both EGDs and TGDs using "union" for all ground equations and it seems to be working ok so far, but I'm still not clear why union and bi-rewrite would result in different models after saturation.

@yihozhang
Copy link
Collaborator

I see it now! I think it's because in your first program, (name (g1)) is never populated, so the rule doesn't fire. Populating it works. See this demo

@wisnesky
Copy link
Author

wisnesky commented Feb 28, 2025 via email

@yihozhang
Copy link
Collaborator

Great! I'm going to close this thread for now. We have an active Zulip where you can ask questions! https://egraphs.zulipchat.com/#narrow/stream/375765-egglog

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants