Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal to improve the regex circuit #26

Open
katat opened this issue Jan 7, 2023 · 5 comments
Open

Proposal to improve the regex circuit #26

katat opened this issue Jan 7, 2023 · 5 comments

Comments

@katat
Copy link
Contributor

katat commented Jan 7, 2023

This is probably related to #27.

Rational

The regex circuit seems playing a key role in the content verification in zk email. I believe it is unique in the circuit market at this stage and would provide great value to the circom community if the regex circuit compilation can be encapsulated as a standalone circuit library.

To make the regex circuits usable for wider audience, there would be some changes needed as outlined in the following.

Features

Restructure counting for the matches

For example, assuming there are more than one match in the input message signal, then the output reveal signal would be like [0,0,0,0,1,1,1,1,0,2,2,2,2,0,0,0,3,3,3,0].

The sample reveal signal serves for two purposes.

  1. Embed information for the counting.
  2. Embed information for locations for nth matches.

With these two embedded information, the reveal array signal can be further processed to provide additional data points as outline in the next feature for different verification needs.

Additional data points

Given an input parameter for the nth match, it should output the following signals

  • start position
  • end position
  • matched value

For example, given the input is 2 and the original message is [a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t], the output signal values would be

  • start position: 9
  • end position: 13
  • matched value: [j,k,l,m]

Implementation

  1. It is likely to consider together with Streamline compilation for new regexes #27. My current thought is to refactor the regex circuit compiler to include the counting restructure outlined in the feature above, and create a helper circuit for the features outlined in additional data points.

  2. For the regex circom compilation Streamline compilation for new regexes #27, it currently uses a python script. If there is no particular reason for the preference over python language, I think it might be helpful to convert the script to JS as it is consistent with the parts of project for easier maintenance and testing.

  3. Test coverage. The tests, similar to how the 0xparc-ecdsa does, would be very helpful for the development process. With the tests in place, the concerns can be isolated and imrove the efficiency of circuit compilation for testing. Overall, this would help improve the development process cycle in practice.

@Divide-By-0
Copy link
Member

Looks good, we can discuss more tomorrow! We have code for start and end position (along with matched text) in email.circom, perhaps it can be moved here, outputs chosen can be toggled via template parameter).

Tests and restrurcturing would be awesome -- would love to hear what else you think could use some restructuring. Feel free to convert to JS! That wilk also help for in browser regex circuit generation if we ever decide to make that client side only.

@katat
Copy link
Contributor Author

katat commented Jan 12, 2023

would love to hear what else you think could use some restructuring

My current thoughts are more for the ease of circuit usage, trying to make it align with how the regex is being used in conventional programming languages.

Based on my understanding of the code, to use regex circuit, the client has to calculate the position index of the matches text and provide it as an input signal for the regex circuit to locate and process the matched text.

With the restructure, I hope to make the regex circuit easier by only providing message input and the nth match input signal (or template parameter). Then the output signals will include start/end/text/matched# for the circuit of outer layer to verify if applicable.

Keep in mind, it might not be necessary to have the counted array as illustrated above. It is more for a intermediate purpose.

Feel free to convert to JS! That wilk also help for in browser regex circuit generation if we ever decide to make that client side only.

I have done the JS conversion on my side and will refine the commands for regex compilation.
Let me know if you want to create a standalone repo for the regex circuit lib. I will push the code there.

@Divide-By-0
Copy link
Member

Sounds great! I've added you to https://github.com/zk-email-verify/zk-regex/invitations.

@katat katat mentioned this issue Jan 14, 2023
Merged
@katat
Copy link
Contributor Author

katat commented Jan 29, 2023

Thanks for the arrangement. @Divide-By-0

I think now zk-regex got a skeleton and is ready to be integrated back into this repo. There are a couple of targets for the integration:

  1. zk-regex now has a CLI to generate circuits for a downgraded regular expression(such as not yet supported character class, but will do in the near future). The README would be updated on how to use the new regex circuit compiler.

  2. There will be changes to the regex circuit input and output signals to simplify the interaction with regex circuits in the email.circom. For example, these lines of code would be encapsulated in the regex circuits instead. The newly compiled regex circuits would include the input signals input_msg, nth_match (nth match would be optional), output signals start_position, end_position, revealed_text for nth_match(default 1st match). Overall, the email.circom code relating to regex verification would be simplified via the use of new regex circuits generated from zk-regex.

  3. Tests will be set up to cover the email.circom circuit.

Meanwhile, I think it would be necessary to benchmark the email.circom circuit to see if there are any significant overheads introduced by the refactor. I am new to this area, happy to know if there are any ways to do the benchmark from your experiences.

Please feel free to let me know if any other items are also reasonable to be included in this batch of work.

@katat
Copy link
Contributor Author

katat commented Feb 3, 2023

In the latest commit, the zk-regex is now able to encapsulate the circuit better and simplify the email.circom circuit as mentioned in the 2 point above. I think with these circuit simplifications from the zk-regex, email.circom would be easier to be abstract for different use cases.

If you get a chance, please help check to see if the code is optimal.

@Divide-By-0 Divide-By-0 transferred this issue from zkemail/zk-email-verify Oct 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

2 participants