Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plonky3 keccak optimization #2089

Open
wants to merge 42 commits into
base: main
Choose a base branch
from
Open

Conversation

qwang98
Copy link
Collaborator

@qwang98 qwang98 commented Nov 13, 2024

Issue
#1832 is ready to be merged, but currently uses 100 links to increment memory pointer using add_sub machine, which is quite inefficient for proving. increment_ptr constraint API won't work in #1832, as explained in issue #2022. It succeeds at the witness generation stage but fails at the proving/verification stage, depending on which prover is used.

Solution Implemented
This PR aims to implement the same increment pointer logic without relying on increment_ptr, by directly implementing the constraints in the Keccak main machine. On the high level, instead of creating carry and inverse columns, which is used to check if the next memory address will overflow using a zero check, I moved these two columns to the second and the penultimate (second last) rows of each 24 row blocks. Because addr_h[50] and addr_l[50] only uses the first and last row of each block, I put the inverse values a row below/above the first/last row of addr_l[50] and the carry values a row below/above the first/last row of addr_h[50]. This solution should be more efficient than the increment_ptr API in the context of this machine, because we avoid creating 50 inverse columns and 50 carry columns. It also solves the prior issue with increment_ptr mentioned in #1832 by only enabling the is zero check constraint in proper rows.

Current Status
Witness generation is functional, proving works with Halo2 mock prover when using dynamic vadcop limiting dynamic machines to min_degree: 1024, max_degree: 2048 (though not setting this limit kills the process after a very long run). Proving doesn't work with Plonky3 (see CI error). It's a polynomial commitment scheme on commit polynomial evaluation. I never reached a constraint error for Plonky3, which is what blocked our prior increment_ptr solution, but this could mean that we didn't even reach the stage for constraint errors. Previously Plonky3 errored out at verification stage.

I think this means that the implementation is correct, but am not sure how to debug the Plonky3 error. It seems that we might be able to just set some external parameters regarding size to make this work, so I'd appreciate some insights.

Here's my Plonky3 error:

Running `target/release/powdr pil test_data/std/keccakf16_memory_test.asm --force --field bb --prove-with plonky3`
[00:02:23 (ETA: 00:00:00)] ████████████████████ 100% - 102469 rows/s, 1639k identities/s, 100% progress                                                                                                                                                                             thread 'main' panicked at /Users/steve/.cargo/git/checkouts/plonky3-2a8c63bcd5ef83b0/2192432/fri/src/two_adic_pcs.rs:186:9:
assertion failed: lde.height() >= domain.size()

Additional Questions
Do we have a CLI option to provide the witness in future runs after it's generated in a specific run? I couldn't find an example from our official docs. This has been costing quite a bit of time as I was good with the witness but just wanted to test the prover with different parameters.

@qwang98 qwang98 changed the title Plonky3 keccak memory new new Plonky3 keccak optimization Nov 14, 2024
@qwang98
Copy link
Collaborator Author

qwang98 commented Nov 14, 2024

@qwang98 qwang98 requested a review from georgwiese November 14, 2024 01:54
@qwang98
Copy link
Collaborator Author

qwang98 commented Nov 14, 2024

It's also based on a different commit from #1832 so a few comments addressed in that PR are not addressed here, as it's quite non trivial to move them here. I'd wait for that PR merged to merge main to this.

@georgwiese
Copy link
Collaborator

Can be rebased now :)

The error you have looks like a constraint has a too high degree. The error message is terrible, so I just opened #2091 to fix it.

@qwang98
Copy link
Collaborator Author

qwang98 commented Nov 14, 2024

Can be rebased now :)

The error you have looks like a constraint has a too high degree. The error message is terrible, so I just opened #2091 to fix it.

Thanks and that makes sense. It gotta be the degree-4 constraint I added then (/~https://github.com/powdr-labs/powdr/pull/2089/files#diff-f9fcf1597f4c7e16da854fd1ded4cbcfed738cc44655382ec0cec5140701f8e9R67-R69), because prior versions with degree-3 constraints worked.

However, it's confusing from the Plonky3 doc that they seem to support arbitrary high degree, so in theory degree 4 constraint should work here, but with a proving time cost: https://docs.polygon.technology/learn/plonky3/examples/rangecheck/#constraint-degree

I'm not sure for folks writing Plonky3 circuits, are we concerned about the degree of constraints and what's the max degree we are targeting for so far? @leonardoalt

@georgwiese
Copy link
Collaborator

So, currently in Plonky3, there is a dependency between the "blow up" parameter of FRI (which we set to 2) and the maximum degree (for this parameter, it would be 3). Otherwise this assertion fails. From the TODO comment above the assertion I conclude that it doesn't have to be like this, but right now it is...

But a degree bound of 3 seems to a typical choice (our parameters are inspired from Plonky3's examples), so I think we should make it work for a bound of 3. You can always make it work by introducing more witness columns.

BTW, in the future, this should be done automatically (see section "Intermediate polynomials can be turned into witness polynomials to reduce the constraint degree" in #2009). But currently, it still needs to be done manually.

@qwang98
Copy link
Collaborator Author

qwang98 commented Nov 14, 2024

So, currently in Plonky3, there is a dependency between the "blow up" parameter of FRI (which we set to 2) and the maximum degree (for this parameter, it would be 3). Otherwise this assertion fails. From the TODO comment above the assertion I conclude that it doesn't have to be like this, but right now it is...

But a degree bound of 3 seems to a typical choice (our parameters are inspired from Plonky3's examples), so I think we should make it work for a bound of 3. You can always make it work by introducing more witness columns.

BTW, in the future, this should be done automatically (see section "Intermediate polynomials can be turned into witness polynomials to reduce the constraint degree" in #2009). But currently, it still needs to be done manually.

Fully makes sense now. Technically I can make it work by sticking data to the next next row of the first row and the third last row, but might need to do hints to make witgen more efficient. I might also just use more columns if needed.

…d carry; correct calculation of first round values; however witgen stopped by the bug 'we have been stuck in the same row for 100 rounds'; i suspect that if we set this parameter higher, we can solve the bug, but am not sure how to do that
@qwang98
Copy link
Collaborator Author

qwang98 commented Nov 26, 2024

@georgwiese Sorry for the delay, but this is ready for a final-ish review except one caveat.

The issue is to reduce the degree-4 constraint required by the increment pointer by 4 logic in the main Keccak machine. My final solution is to add 50 helper columns to calculate intermediate values, while putting as much intermediate value as possible to empty rows of some other columns. This comes with a caveat that this solution triggers a witgen error "In witness generation for block machine, we have been stuck in the same row for 100 rounds". I printed out the variable for the number of rounds in the witgen trace, and found out that in the first row, each inverse calculation takes one more round, which I'm not sure why. Inverse is needed here, because we need to test if addr_l - 0xfffc is zero, which determines whether to add a carry to the next addr_h. While you can pull the output_max_rounds_102.txt trace file in this branch yourself to inspect, you can see a snippet below which calculates a very big number for the inverse (which I stored to rotation of addr_l) and this repeats many times. To be exact, the first row takes 102 rounds, so I changed the parameter MAX_ROUNDS_PER_ROW_DELTA to 102 (previously set to 100), and this circuit works. Another solution that might work is to calculate these inverses using hints, but this will require rotations to hints, which I'm not sure if we've implemented yet.

I also tried some more optimized version initially, which sticks intermediate columns for calculating the increment 4 logic to the first three rows of the address columns. This has the benefit of shaving off the 50 helper columns. I actually implemented this in another branch #2146, found out that witgen won't work, and proposed further solutions. I concluded that this is probably something to be done only in the future.

current_round_count: 33
    Updates from: main_keccakf16_memory::step_flags[0] * (main_keccakf16_memory::helper[16] - (main_keccakf16_memory::addr_l[16]' * (main_keccakf16_memory::addr_l[16] - 65532) - 1)) = 0;
      => main_keccakf16_memory::addr_l[16] (Row 1) = 12528784674138808148
    Updates from: main_keccakf16_memory::step_flags[0] * (main_keccakf16_memory::addr_h[16]' + main_keccakf16_memory::addr_l[16]' * (main_keccakf16_memory::addr_l[16] - 65532) - 1) = 0;
      => main_keccakf16_memory::addr_h[16] (Row 1) = 0
    Updates from: main_keccakf16_memory::step_flags[0] * (main_keccakf16_memory::addr_h[16]' * main_keccakf16_memory::addr_l[17] + (1 - main_keccakf16_memory::addr_h[16]') * (main_keccakf16_memory::addr_l[17] - main_keccakf16_memory::addr_l[16] - 4)) = 0;
      => main_keccakf16_memory::addr_l[17] (Row 0) = 68
    Updates from: main_keccakf16_memory::step_flags[0] * (main_keccakf16_memory::addr_h[17] - main_keccakf16_memory::addr_h[16] - main_keccakf16_memory::addr_h[16]') = 0;
      => main_keccakf16_memory::addr_h[17] (Row 0) = 0
Query addr=0x44, step=27, write: false, value: (main_keccakf16_memory::preimage[33] main_keccakf16_memory::preimage[32])
Memory read: addr=0x44, step=27, value=0x0
    Updates from: main_keccakf16_memory::sel[0] * main_keccakf16_memory::step_flags[0] $ [0, main_keccakf16_memory::addr_h[17], main_keccakf16_memory::addr_l[17], main_keccakf16_memory::time_step, main_keccakf16_memory::preimage[33], main_keccakf16_memory::preimage[32]] is main_memory::selectors[19] $ [main_memory::m_is_write, main_memory::m_addr_high, main_memory::m_addr_low, main_memory::m_step_high * 65536 + main_memory::m_step_low, main_memory::m_value1, main_memory::m_value2];
      => main_keccakf16_memory::preimage[33] (Row 0) = 0
      => main_keccakf16_memory::preimage[32] (Row 0) = 0
current_round_count: 34
    Updates from: main_keccakf16_memory::step_flags[0] * main_keccakf16_memory::helper[17] * (main_keccakf16_memory::addr_l[17] - 65532) = 0;
      => main_keccakf16_memory::helper[17] (Row 0) = 0
    Updates from: main_keccakf16_memory::step_flags[0] * (main_keccakf16_memory::preimage[32] - main_keccakf16_memory::a[32]) = 0;
      => main_keccakf16_memory::a[32] (Row 0) = 0
    Updates from: main_keccakf16_memory::step_flags[0] * (main_keccakf16_memory::preimage[33] - main_keccakf16_memory::a[33]) = 0;
      => main_keccakf16_memory::a[33] (Row 0) = 0
    Updates from: (main_keccakf16_memory::preimage[32]' - main_keccakf16_memory::preimage[32]) * (1 - (main_keccakf16_memory::step_flags[23] + main_keccakf16_memory::is_last)) = 0;
      => main_keccakf16_memory::preimage[32] (Row 1) = 0
    Updates from: (main_keccakf16_memory::preimage[33]' - main_keccakf16_memory::preimage[33]) * (1 - (main_keccakf16_memory::step_flags[23] + main_keccakf16_memory::is_last)) = 0;
      => main_keccakf16_memory::preimage[33] (Row 1) = 0
current_round_count: 35
    Updates from: main_keccakf16_memory::step_flags[0] * (main_keccakf16_memory::helper[17] - (main_keccakf16_memory::addr_l[17]' * (main_keccakf16_memory::addr_l[17] - 65532) - 1)) = 0;
      => main_keccakf16_memory::addr_l[17] (Row 1) = 216692322335632032
    Updates from: main_keccakf16_memory::step_flags[0] * (main_keccakf16_memory::addr_h[17]' + main_keccakf16_memory::addr_l[17]' * (main_keccakf16_memory::addr_l[17] - 65532) - 1) = 0;
      => main_keccakf16_memory::addr_h[17] (Row 1) = 0
    Updates from: main_keccakf16_memory::step_flags[0] * (main_keccakf16_memory::addr_h[17]' * main_keccakf16_memory::addr_l[18] + (1 - main_keccakf16_memory::addr_h[17]') * (main_keccakf16_memory::addr_l[18] - main_keccakf16_memory::addr_l[17] - 4)) = 0;
      => main_keccakf16_memory::addr_l[18] (Row 0) = 72
    Updates from: main_keccakf16_memory::step_flags[0] * (main_keccakf16_memory::addr_h[18] - main_keccakf16_memory::addr_h[17] - main_keccakf16_memory::addr_h[17]') = 0;
      => main_keccakf16_memory::addr_h[18] (Row 0) = 0
Query addr=0x48, step=27, write: false, value: (main_keccakf16_memory::preimage[39] main_keccakf16_memory::preimage[38])
Memory read: addr=0x48, step=27, value=0x0
    Updates from: main_keccakf16_memory::sel[0] * main_keccakf16_memory::step_flags[0] $ [0, main_keccakf16_memory::addr_h[18], main_keccakf16_memory::addr_l[18], main_keccakf16_memory::time_step, main_keccakf16_memory::preimage[39], main_keccakf16_memory::preimage[38]] is main_memory::selectors[20] $ [main_memory::m_is_write, main_memory::m_addr_high, main_memory::m_addr_low, main_memory::m_step_high * 65536 + main_memory::m_step_low, main_memory::m_value1, main_memory::m_value2];
      => main_keccakf16_memory::preimage[39] (Row 0) = 0
      => main_keccakf16_memory::preimage[38] (Row 0) = 0
current_round_count: 36
    Updates from: main_keccakf16_memory::step_flags[0] * main_keccakf16_memory::helper[18] * (main_keccakf16_memory::addr_l[18] - 65532) = 0;
      => main_keccakf16_memory::helper[18] (Row 0) = 0
    Updates from: main_keccakf16_memory::step_flags[0] * (main_keccakf16_memory::preimage[38] - main_keccakf16_memory::a[38]) = 0;
      => main_keccakf16_memory::a[38] (Row 0) = 0
    Updates from: main_keccakf16_memory::step_flags[0] * (main_keccakf16_memory::preimage[39] - main_keccakf16_memory::a[39]) = 0;
      => main_keccakf16_memory::a[39] (Row 0) = 0
    Updates from: (main_keccakf16_memory::preimage[38]' - main_keccakf16_memory::preimage[38]) * (1 - (main_keccakf16_memory::step_flags[23] + main_keccakf16_memory::is_last)) = 0;
      => main_keccakf16_memory::preimage[38] (Row 1) = 0
    Updates from: (main_keccakf16_memory::preimage[39]' - main_keccakf16_memory::preimage[39]) * (1 - (main_keccakf16_memory::step_flags[23] + main_keccakf16_memory::is_last)) = 0;
      => main_keccakf16_memory::preimage[39] (Row 1) = 0

@qwang98 qwang98 marked this pull request as ready for review December 6, 2024 19:17
@qwang98
Copy link
Collaborator Author

qwang98 commented Dec 6, 2024

FYI this can be merged now. @georgwiese

@qwang98 qwang98 force-pushed the plonky3-keccak-memory-new-new branch from 1b8c8ca to 096560e Compare December 6, 2024 19:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants