feat: index historical votes #38

wgwz · 2023-08-23T00:10:19Z

Closes: #22
Closes: #37 (I decided to add the fix for this here, since it's small and potentially needed in this PR)

sql/V1_12__votes.sql

index_votes.py

wgwz · 2023-08-29T20:09:24Z

while testing this historical votes piece a bit further in my local environment, i'm observing that proposals that are rejected, do not get saved by the indexer in our proposals table. this seems to be because cosmos.group.v1.EventProposalPruned does not get emitted for rejected proposals. is that intended? am i wrong?

for that reason, historical rejected proposals are currently not supported with the indexer. consequently, we are unable to store historical votes for rejected proposals. is this a problem we need to solve in a separate issue? /cc @blushi @ryanchristo

blushi · 2023-08-30T08:14:09Z

while testing this historical votes piece a bit further in my local environment, i'm observing that proposals that are rejected, do not get saved by the indexer in our proposals table. this seems to be because cosmos.group.v1.EventProposalPruned does not get emitted for rejected proposals. is that intended? am i wrong?

for that reason, historical rejected proposals are currently not supported with the indexer. consequently, we are unable to store historical votes for rejected proposals. is this a problem we need to solve in a separate issue? /cc @blushi @ryanchristo

Could you tell how you tested this in details? I believe such proposals should get pruned eventually and emitting cosmos.group.v1.EventProposalPruned at voting period + max exec period: https://github1s.com/cosmos/cosmos-sdk/blob/HEAD/x/group/keeper/keeper.go#L370

wgwz · 2023-08-30T12:02:01Z

Could you tell how you tested this in details? I believe such proposals should get pruned eventually and emitting cosmos.group.v1.EventProposalPruned at voting period + max exec period: https://github1s.com/cosmos/cosmos-sdk/blob/HEAD/x/group/keeper/keeper.go#L370

thanks @blushi i think the max exec period explains why i'm seeing this. it looks like that defaults to two weeks. i only created the rejected proposal a couple days ago, so that would explain it for my case.

i'm going to re-test this with an example @ryanchristo gave me where you can lower the exec time

wgwz · 2023-08-30T17:22:50Z

i'm going to re-test this with an example @ryanchristo gave me where you can lower the exec time

so this example was just about adjusting the min exec time.
as mentioned in stand-up today, we'd have to customize the max exec time locally (by adjusting this value if i'm not mistaken), so i'll just assume it works and move on.

that said, it's something to be aware of when testing the contents of the indexer database.
a rejected proposal and it's votes will not appear in the database until max exec period has passed.

ryanchristo

Looking good. Haven't tested yet. A couple questions/comments.

utils.py

index_votes.py

ryanchristo · 2023-08-31T21:33:36Z

Successfully tested the vote is not stored if the proposal has not yet been indexed and the vote is stored after the proposal is indexed: https://github.com/regen-network/indexer/actions/runs/6042565087/job/16397918758?pr=41

ryanchristo · 2023-08-31T21:50:50Z

In addition to adding comments, can you make sure we add V1_12__votes.sql to the all migrations script?

sql/V1_12__votes.sql

wgwz · 2023-09-06T01:48:15Z

with the latest commit i've changed the architecture of the index votes process to solve a few problems.
most importantly, properly handling the fact that we need to get all votes for a proposal with the current implementation of EventVote (which only specifies proposal_id and not the voter and option).

i've also dropped the relationship between the proposals table and the votes.
this just means we can't traverse that relationship in graphql as i had mentioned in a previous comment.
that should be fine, we'll still have the ability to write graphql that queries the votes table.

last but not least i added a new function for figuring out which events to process (new_events_to_process).
this is a slightly different version of the already existing events_to_process function.
the difference being that instead of finding all msg_events of a particular type that have yet to be indexed, it finds only the newest events that occurred since the last time it ran.

events_to_process returns absolutely all events that have not been processed.
this is great when you want to have all historical data at every point in time (i.e. retirements (might not be the best example)).
but it's not great when you only care about having "most recent state".
i.e. for votes we only care about the votes of a proposal "now".
and we wouldn't want to see the state of the votes as of two weeks ago (time = block_height).

that's why i needed to add this new function new_events_to_process.
i mentioned to @blushi in our 1-1 today that we should actually convert the index_class_issuers process to use this function.
because the same thing applies for class issuers, we only care about the state of the class issuers "now".
the current class issuers indexing process is inefficient for this reason.
we're storing the historical state of the class issuers, and using a flag latest to mark the most recent state
i'll open up an issue for this.

another important difference between new_events_to_process and events_to_process is resiliency (not sure this is the best word).
if you make use of events_to_process, you always get all events of a particular type at any point in time.
i.e. if i were to delete a retirement from the retirements table at some old block height, the next time the indexer for retirements runs, it would re-index that retirement without any further manual interactions.
Whereas with new_events_to_process, it wouldn't work like that because it only ever pays attention to the events that occurred since the last indexed block height.
in this particular example, you would have to delete all retirements after the initial deleted retirement to get the whole thing re-indexed.

index_votes.py

ryanchristo

Should we address the multiple chains issue separately? I think there is more work to be done and I'm not sure one process for all chains is the best approach.

Also updated tests, which look good: https://github.com/regen-network/indexer/actions/runs/6099955306/job/16553243546

sql/V1_12__votes.sql

ryanchristo · 2023-09-06T17:06:15Z

index_votes.py

+        all_chain_nums = [
+            record[0] for record in gen_records(cur, "select num from chain;")
+        ]
+        max_block_heights = {
+            chain_num: max_block_height
+            for chain_num, max_block_height in gen_records(
+                cur,
+                "select chain_num, MAX(block_height) from votes group by chain_num;",
+            )
+        }
+        logger.debug(f"{all_chain_nums=}")
+        logger.debug(f"{max_block_heights=}")
+        for chain_num in all_chain_nums:
+            if chain_num not in max_block_heights.keys():
+                max_block_heights[chain_num] = 0
+        logger.debug(f"{max_block_heights=}")
+        for chain_num, max_block_height in max_block_heights.items():
+            logger.debug(f"{chain_num=} {max_block_height=}")


Maybe this is ok for now but this should either be extracted to reuse in other processes or we should rethink how we are running processes, i.e. whether we want to run separate processes for separate chains.

Also noticing we no longer make use of the _chain_num parameter with this addition.

Not sure if we should address this issue (#33) separately. Any reason for adding it to this pull request?

Maybe this is ok for now but this should either be extracted to reuse in other processes or we should rethink how we are running processes, i.e. whether we want to run separate processes for separate chains.

we can figure out a way to extract it. maybe we can do that as a part of refactoring indexing credit class issuers to use the same method (new_events_to_process)

Not sure if we should address this issue (#33) separately. Any reason for adding it to this pull request?

it’s not really added to this PR. it’s just that this PR will index votes in a forward looking way. if we want to index multiple chains in the same processes, this process will still work for that. (as do all the other processes we’ve written to date).

Also noticing we no longer make use of the _chain_num parameter with this addition.

this is an unused parameter for all the indexing processes so far i.e.

https://github.com/regen-network/indexer/blob/main/index_retires.py#L7

harmless to leave it for now IMO

it’s not really added to this PR. it’s just that this PR will index votes in a forward looking way

It's not included in the other processes, e.g. index_proposals.py, index_class_issuers, etc. We are adding a loop here that does not exist in the other processes, and which seems more relevant to #33 than #22. This pull request is already addressing two issues. It would be nice to focus on one issue at a time.

if we want to index multiple chains in the same processes, this process will still work for that

As mentioned in the review comment, I'm not sure one process for all chains is the best approach. It might be easier to run multiple instances of the process, one for each chain. We have a single environment variable for a a regen node. It is diverging from the current architecture and its not required for indexing votes as far as I can tell.

this is an unused parameter for all the indexing processes so far i.e.

Why was it added in the first place? Is there a future use case for it?

this is an unused parameter for all the indexing processes so far i.e.

Why was it added in the first place? Is there a future use case for it?

Same question here. Maybe it could be useful if we want to run multiple instances of the process for each chain?

Adding the loop on max_block_heights.items() is needed for the new new_events_to_process logic. But the previous loop on all_chain_nums could potentially be avoided if we use the _chain_num param instead and distribute this logic into different processes.

Why was it added in the first place? Is there a future use case for it?

I honestly forgot why I put it there...

Maybe it could be useful if we want to run multiple instances of the process for each chain?

But yes indeed, it can be useful for this.

Adding the loop on max_block_heights.items() is needed for the new new_events_to_process logic. But the previous loop on all_chain_nums could potentially be avoided if we use the _chain_num param instead and distribute this logic into different processes.

Yes, I think a good follow up task will be to simplify this as you've described here:

#44

This is also something that @ryanchristo and I discussed last week, so good that we are all in agreement it's a good thing to do.

sql/V1_12__votes.sql

wgwz · 2023-09-07T16:31:32Z

Should we address the multiple chains issue separately? I think there is more work to be done and I'm not sure one process for all chains is the best approach.

the way all the indexing processes are set up, is such that if we were to index multiple chains, we can successfully index events from multiple chains. the method i implemented in this PR was so that this stays true for all of the indexing processes.

the indexing processes in general support that by paying attn to chain_num and having that as key in the db tables.

ryanchristo · 2023-09-07T17:53:23Z

the way all the indexing processes are set up, is such that if we were to index multiple chains

We have one environment variable for a single regen node, meaning yes it supports multiple chains but only if you run multiple instances, which I do think is the right approach. Adding a loop in the index_votes.py script to support multiple chains seems to diverge from the existing architecture.

ryanchristo

Per discussion, I think we should consider other solutions to the voter address not being available in the event as well as how we configure and support multiple chains. We should also make sure that we smooth out any inconsistencies now that we are introducing a new approach to how we index votes (i.e. different than how we index class issuers).

I tested a single chain. single proposal, and multiple votes with #41. Not necessarily thorough testing. You can see the logs for each service (i.e. db, indexer, etc.) separately here.

Nice work overall! Not an easy set of problems to solve.

wgwz marked this pull request as ready for review August 29, 2023 01:59

wgwz marked this pull request as draft August 29, 2023 01:59

wgwz force-pushed the kyle/22-index-votes branch 3 times, most recently from 78b4b59 to a123734 Compare August 29, 2023 17:53

wgwz changed the title ~~feat(WIP): index votes~~ feat: index historical votes Aug 29, 2023

wgwz commented Aug 29, 2023

View reviewed changes

sql/V1_12__votes.sql Outdated Show resolved Hide resolved

wgwz commented Aug 29, 2023

View reviewed changes

index_votes.py Outdated Show resolved Hide resolved

wgwz commented Aug 29, 2023

View reviewed changes

index_votes.py Outdated Show resolved Hide resolved

wgwz requested a review from a team August 29, 2023 18:11

wgwz marked this pull request as ready for review August 29, 2023 18:11

wgwz force-pushed the kyle/22-index-votes branch from b2f888d to eb57e69 Compare August 29, 2023 19:39

wgwz added 3 commits August 30, 2023 13:24

feat: index historical votes

3bc720f

fix: add REGEN_IS_ARCHIVE_NODE config value with default and helper

b8eddf0

fix: use is_archive_node in fetch_votes_by_proposal

208c702

wgwz force-pushed the kyle/22-index-votes branch from eb57e69 to 208c702 Compare August 30, 2023 17:27

ryanchristo reviewed Aug 31, 2023

View reviewed changes

utils.py Show resolved Hide resolved

index_votes.py Outdated Show resolved Hide resolved

index_votes.py Show resolved Hide resolved

wgwz added 3 commits August 31, 2023 18:59

chore: comment about votes_by_proposal

f169e57

fix: add V1_12__votes.sql to migrations script

2e1cb8a

fix: break from votes loop if proposal is not yet pruned

af2fcd9

ryanchristo reviewed Aug 31, 2023

View reviewed changes

sql/V1_12__votes.sql Outdated Show resolved Hide resolved

wgwz marked this pull request as draft September 1, 2023 20:42

refactor: only insert each vote once

6f29a66

wgwz force-pushed the kyle/22-index-votes branch from a822d46 to 6f29a66 Compare September 6, 2023 01:47

wgwz marked this pull request as ready for review September 6, 2023 01:52

wgwz requested a review from ryanchristo September 6, 2023 01:52

ryanchristo reviewed Sep 6, 2023

View reviewed changes

index_votes.py Show resolved Hide resolved

ryanchristo reviewed Sep 6, 2023

View reviewed changes

ryanchristo approved these changes Sep 7, 2023

View reviewed changes

chore: apply sql formatting

27846dc

wgwz mentioned this pull request Sep 12, 2023

Simplify logic in index votes process #44

Closed

wgwz merged commit 927ddb3 into main Sep 12, 2023

wgwz deleted the kyle/22-index-votes branch September 12, 2023 20:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: index historical votes #38

feat: index historical votes #38

wgwz commented Aug 23, 2023 •

edited

Loading

wgwz commented Aug 29, 2023

blushi commented Aug 30, 2023

wgwz commented Aug 30, 2023 •

edited

Loading

wgwz commented Aug 30, 2023

ryanchristo left a comment

ryanchristo commented Aug 31, 2023

ryanchristo commented Aug 31, 2023

wgwz commented Sep 6, 2023

ryanchristo left a comment •

edited

Loading

ryanchristo Sep 6, 2023

ryanchristo Sep 6, 2023

ryanchristo Sep 6, 2023

wgwz Sep 6, 2023 •

edited

Loading

wgwz Sep 6, 2023

wgwz Sep 7, 2023

ryanchristo Sep 7, 2023

blushi Sep 12, 2023

wgwz Sep 12, 2023

wgwz commented Sep 7, 2023 •

edited

Loading

ryanchristo commented Sep 7, 2023

ryanchristo left a comment

feat: index historical votes #38

feat: index historical votes #38

Conversation

wgwz commented Aug 23, 2023 • edited Loading

wgwz commented Aug 29, 2023

blushi commented Aug 30, 2023

wgwz commented Aug 30, 2023 • edited Loading

wgwz commented Aug 30, 2023

ryanchristo left a comment

Choose a reason for hiding this comment

ryanchristo commented Aug 31, 2023

ryanchristo commented Aug 31, 2023

wgwz commented Sep 6, 2023

ryanchristo left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wgwz Sep 6, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wgwz commented Sep 7, 2023 • edited Loading

ryanchristo commented Sep 7, 2023

ryanchristo left a comment

Choose a reason for hiding this comment

wgwz commented Aug 23, 2023 •

edited

Loading

wgwz commented Aug 30, 2023 •

edited

Loading

ryanchristo left a comment •

edited

Loading

wgwz Sep 6, 2023 •

edited

Loading

wgwz commented Sep 7, 2023 •

edited

Loading