-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: index historical votes #38
Conversation
78b4b59
to
a123734
Compare
b2f888d
to
eb57e69
Compare
while testing this historical votes piece a bit further in my local environment, i'm observing that proposals that are rejected, do not get saved by the indexer in our proposals table. this seems to be because for that reason, historical rejected proposals are currently not supported with the indexer. consequently, we are unable to store historical votes for rejected proposals. is this a problem we need to solve in a separate issue? /cc @blushi @ryanchristo |
Could you tell how you tested this in details? I believe such proposals should get pruned eventually and emitting |
thanks @blushi i think the max exec period explains why i'm seeing this. it looks like that defaults to two weeks. i only created the rejected proposal a couple days ago, so that would explain it for my case. i'm going to re-test this with an example @ryanchristo gave me where you can lower the exec time |
so this example was just about adjusting the min exec time. that said, it's something to be aware of when testing the contents of the indexer database. |
eb57e69
to
208c702
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good. Haven't tested yet. A couple questions/comments.
Successfully tested the vote is not stored if the proposal has not yet been indexed and the vote is stored after the proposal is indexed: https://github.com/regen-network/indexer/actions/runs/6042565087/job/16397918758?pr=41 |
In addition to adding comments, can you make sure we add |
a822d46
to
6f29a66
Compare
with the latest commit i've changed the architecture of the index votes process to solve a few problems. i've also dropped the relationship between the proposals table and the votes. last but not least i added a new function for figuring out which events to process (
that's why i needed to add this new function another important difference between |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we address the multiple chains issue separately? I think there is more work to be done and I'm not sure one process for all chains is the best approach.
Also updated tests, which look good: https://github.com/regen-network/indexer/actions/runs/6099955306/job/16553243546
all_chain_nums = [ | ||
record[0] for record in gen_records(cur, "select num from chain;") | ||
] | ||
max_block_heights = { | ||
chain_num: max_block_height | ||
for chain_num, max_block_height in gen_records( | ||
cur, | ||
"select chain_num, MAX(block_height) from votes group by chain_num;", | ||
) | ||
} | ||
logger.debug(f"{all_chain_nums=}") | ||
logger.debug(f"{max_block_heights=}") | ||
for chain_num in all_chain_nums: | ||
if chain_num not in max_block_heights.keys(): | ||
max_block_heights[chain_num] = 0 | ||
logger.debug(f"{max_block_heights=}") | ||
for chain_num, max_block_height in max_block_heights.items(): | ||
logger.debug(f"{chain_num=} {max_block_height=}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this is ok for now but this should either be extracted to reuse in other processes or we should rethink how we are running processes, i.e. whether we want to run separate processes for separate chains.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also noticing we no longer make use of the _chain_num
parameter with this addition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if we should address this issue (#33) separately. Any reason for adding it to this pull request?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this is ok for now but this should either be extracted to reuse in other processes or we should rethink how we are running processes, i.e. whether we want to run separate processes for separate chains.
we can figure out a way to extract it. maybe we can do that as a part of refactoring indexing credit class issuers to use the same method (new_events_to_process)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if we should address this issue (#33) separately. Any reason for adding it to this pull request?
it’s not really added to this PR. it’s just that this PR will index votes in a forward looking way. if we want to index multiple chains in the same processes, this process will still work for that. (as do all the other processes we’ve written to date).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also noticing we no longer make use of the _chain_num parameter with this addition.
this is an unused parameter for all the indexing processes so far i.e.
harmless to leave it for now IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it’s not really added to this PR. it’s just that this PR will index votes in a forward looking way
It's not included in the other processes, e.g. index_proposals.py
, index_class_issuers
, etc. We are adding a loop here that does not exist in the other processes, and which seems more relevant to #33 than #22. This pull request is already addressing two issues. It would be nice to focus on one issue at a time.
if we want to index multiple chains in the same processes, this process will still work for that
As mentioned in the review comment, I'm not sure one process for all chains is the best approach. It might be easier to run multiple instances of the process, one for each chain. We have a single environment variable for a a regen node. It is diverging from the current architecture and its not required for indexing votes as far as I can tell.
this is an unused parameter for all the indexing processes so far i.e.
Why was it added in the first place? Is there a future use case for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is an unused parameter for all the indexing processes so far i.e.
Why was it added in the first place? Is there a future use case for it?
Same question here. Maybe it could be useful if we want to run multiple instances of the process for each chain?
Adding the loop on max_block_heights.items()
is needed for the new new_events_to_process
logic. But the previous loop on all_chain_nums
could potentially be avoided if we use the _chain_num
param instead and distribute this logic into different processes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was it added in the first place? Is there a future use case for it?
I honestly forgot why I put it there...
Maybe it could be useful if we want to run multiple instances of the process for each chain?
But yes indeed, it can be useful for this.
Adding the loop on max_block_heights.items() is needed for the new new_events_to_process logic. But the previous loop on all_chain_nums could potentially be avoided if we use the _chain_num param instead and distribute this logic into different processes.
Yes, I think a good follow up task will be to simplify this as you've described here:
This is also something that @ryanchristo and I discussed last week, so good that we are all in agreement it's a good thing to do.
the way all the indexing processes are set up, is such that if we were to index multiple chains, we can successfully index events from multiple chains. the method i implemented in this PR was so that this stays true for all of the indexing processes. the indexing processes in general support that by paying attn to |
We have one environment variable for a single regen node, meaning yes it supports multiple chains but only if you run multiple instances, which I do think is the right approach. Adding a loop in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per discussion, I think we should consider other solutions to the voter address not being available in the event as well as how we configure and support multiple chains. We should also make sure that we smooth out any inconsistencies now that we are introducing a new approach to how we index votes (i.e. different than how we index class issuers).
I tested a single chain. single proposal, and multiple votes with #41. Not necessarily thorough testing. You can see the logs for each service (i.e. db, indexer, etc.) separately here.
Nice work overall! Not an easy set of problems to solve.
Closes: #22
Closes: #37 (I decided to add the fix for this here, since it's small and potentially needed in this PR)