Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Goal: onboard Kwil as Node Operator #393

Closed
16 tasks done
zolotokrylin opened this issue Jul 12, 2024 · 45 comments
Closed
16 tasks done

Goal: onboard Kwil as Node Operator #393

zolotokrylin opened this issue Jul 12, 2024 · 45 comments
Assignees

Comments

@zolotokrylin
Copy link
Contributor Author

zolotokrylin commented Jul 12, 2024

@truflation/team-tsn-kwil

Hey guys, shall we start onboarding you as a TSN node operator?
We already have these stats displayed and would like to have as many partners running our node as possible.

@brennanjl
Copy link
Collaborator

Yeah we definitely can.

I'm not totally sure if this is the right time to do it, but this is ultimately Truflation's decision to make. My two cents is that the node software itself that is needed for TSN v1 is still under heavy development. The notice function (which doesn't even exist in Kwil's main yet, only in Truflations) is an example. There are additional functions (like date functions) that I am now adding this week (outlined by @MicBun here #164 (comment)).

Right now, since Truflation is the only node operator, it is much easier to break the network when these get added. However, if there were multiple node operators, we would need to coordinate each time to do so, which would slow down development speed.

Ultimately this is your team's decision, and we're more than happy to go along with whatever you choose. I'm simply raising this in case your team hasn't already considered these factors.

On the other side of the coin, running a distributed testnet is always very educational, and is a great dry-run if we are trying to launch soon. We run a distributed testnet for each release prior to release, and it always uncovers something unexpected.

@zolotokrylin
Copy link
Contributor Author

@brennanjl thank you for your detailed reply.
I agree that TSN is under heavy development. When do you think would be a better time to have external node operators run our software? We are between two rocks: we need more node operators to dry-run TSN and for a marketing purpose (the more nodes, the better), and at the same time, we want to be quick with our breaking updates.

@brennanjl
Copy link
Collaborator

brennanjl commented Jul 15, 2024

@zolotokrylin For dry-running TSN prior to a public launch, setting up a testnet 3-4 weeks ahead of the release should be enough I think. Internally, we set up our testnet 1 week prior to release, but there will obviously be extra coordination needed for TSN, so 3-4 weeks should be enough.

The marketing angle is an interesting one that I'm not really equipped to answer. An option could be to run a stable version using what already exists and onboard node operators that way, and have that be distinctly separate from the development that is occurring. This won't be as useful as a "network dry-run", since it won't be a full deployment and it will still retain a lot of validating power for the core TSN team (thus, not making it representative of a true mainnet), but it should help with the marketing angle and community engagement.

If this is of interest, I can put together a more in-depth explanation of how it can work. There are some things we can do to allow you to onboard any number of operators easily, while not constraining your network throughput and liveness to their hardware and connections.

@brennanjl

This comment was marked as duplicate.

@zolotokrylin
Copy link
Contributor Author

zolotokrylin commented Jul 17, 2024

@srust99, what's your take here?

I agree with @brennanjl that it is better to start coordinating with the node operators 3-4 weeks before our first public (production) release. This release is when we serve customers through the TSN API and replace our website dashboard API with the TSN API.

However, as you would like to onboard as many node operators as soon as possible, please let me know your decision here.
This decision will also be an indicator for the @truflation/team-tsn-kwil team when they should start running the nodes on their side.

Thank you.

P.S. Meanwhile, I will continue inviting node operators to our Github repo.

@srust99
Copy link

srust99 commented Jul 17, 2024

Can we have a quick call tomorrow @zolotokrylin

@zolotokrylin
Copy link
Contributor Author

@srust99, I just sent an invitation for the call.

@zolotokrylin
Copy link
Contributor Author

@srust99 Please ping me in DM when you are available.

@markholdex
Copy link
Collaborator

@zolotokrylin did you have a call with Stefan about this goal? Any logs?

@zolotokrylin
Copy link
Contributor Author

zolotokrylin commented Jul 23, 2024

Yes. Sorry I didn't leave the log. Here it is: we well push to onboard node operators when we are 3-4 weeks away from TSN production release. Basically after these goals:

@zolotokrylin
Copy link
Contributor Author

hey guys (@truflation/team-tsn-kwil)
Could you please start deploying the node? We are planning an official soft launch with our close-node operators and making a marketing campaign out of it.
Please deploy and connect to our network. Cheers!

@outerlook
Copy link
Contributor

We might want a shared configuration file for our chain data, correct? As @truflation/team-tsn-kwil is the expert on the configuration, please let me know if sharing a list of node addresses and attaching a subdomain to them is enough for us to connect. E.g. node-0.tsn.staging.test.truflation.com

And if this port is enough of what you planned to be public for them to communicate:

  • P2P Port 26656

We also have other ports, but they can be public or protected based on our own requirements, correct?

  • CometBFT RPC: we use 26657, used by the indexer
  • RPC Port: we use 8484, but it is already behind the KGW

@KwilLuke
Copy link
Contributor

KwilLuke commented Aug 6, 2024

Hey guys, this is super exciting!

please let me know if sharing a list of node addresses and attaching a subdomain to them is enough for us to connect.

Could you clarify what you mean by this? Do you mean each partner runs a node in their own cloud account, but you have truflation DNS records for each node?

@brennanjl
Copy link
Collaborator

@outerlook you will need a few things in order for external parties to connect. There is a tutorial for doing this here, but I'll go through it below.

What You Need

You will need to publish two things for somebody else to connect to the network:

  1. The node ID and IP address of a node they can connect to. You can get this from a local node using kwil-admin, following the instructions here. The IP address needs to be for the p2p port (by default, 26656). This is the only port you need to expose publicly.
  2. The genesis file of your network, which can be found in ~/.kwild/genesis.json

A user will need some place that they can copy these from, and they can then use to run their tsn binary.

My Recommendation

To make onboarding and future evolution easier, I'd recommend creating a public Git repo that contains this information, as well as some scripts. Within this repo, you would have:

  • The genesis.json file.
  • A node ID and IP (or a list of several) that users can connect to.
  • Scripts for downloading the tsn binary, as well as kwil-admin (which will be necessary for the setup process).
  • Scripts for setting up the node and joining (a nice-to-have).

You could also publish your own tsn release binaries to this repo, since your main development repo is still private.

@outerlook
Copy link
Contributor

outerlook commented Aug 6, 2024

@KwilLuke

Do you mean each partner runs a node in their own cloud account, but you have truflation DNS records for each node?

Exactly, just to keep each node decoupled from a dedicated public IP address, having better control over it.

@brennanjl

Thanks for the tips. I agree that a shared repo would be good.

Just to keep everyone on the same page, those are some tasks we must perform and attach to our pipeline:

  • Generate a genesis.json file for the network
  • Store our node's private keys somewhere and make them static (a list that can grow if we need more)
  • Adapt our deployment pipeline to use these configs and generate the rest
  • Make our deployment upgrades graceful instead of deleting all instances and redeploying (Goal: graceful TSN upgrade mechanism #213)

Currently, we dynamically generate everything from scratch with kwil admin setup commands.

@KwilLuke
Copy link
Contributor

KwilLuke commented Aug 7, 2024

Do you mean each partner runs a node in their own cloud account, but you have truflation DNS records for each node

Exactly, just to keep each node decoupled from a dedicated public IP address, having better control over it.

One concern I have with this approach is that if Truflation has a DNS record pointed to a third party's node, and that third party shuts down their node, Truflation could unknowingly have orphaned DNS records. Depending on the record type, this is a significant vulnerability.

It probably makes sense for each node operator to handle assigning domain names to their respective nodes. (e.g. https://truflation.kwil.com, https://truflation.northwestnodes.com). That allows each party to be responsible for their DNS management.

You can maintain a list of node IDs and IP addresses that new nodes should first connect to when joining the network.

@outerlook
Copy link
Contributor

@KwilLuke sorry, I misunderstood the previous question. Yes, the intention is to have DNS only for nodes controlled by us, not yours or any third parties

The point was just to check if DNS is fine instead of IP per configuration. But thanks for checking it too 🙏

@outerlook
Copy link
Contributor

@zolotokrylin @markholdex
There seems to be a clear objective on this issue. Should we create a spec, or may we start defining sub-issues and start tackling them?

@markholdex
Copy link
Collaborator

@outerlook if the specs are clear, feel free to begin and define the problems.

@MicBun
Copy link
Contributor

MicBun commented Aug 12, 2024

Hi @brennanjl, I'm trying to use this step to set up genesis config according to https://docs.kwil.com/docs/admin/setup#updating-genesis-config-with-initial-sqlite-data

kwil-admin setup genesis-hash [--genesis GENESIS] [DBDIR]

but it looks like the command genesis-hash is wrong here. Do you mind checking if this step is correct in order to start a node with a pre-configured genesis file?

kwil-admin setup genesis-hash --genesis deployments/network/staging/genesis.json
Error: unknown flag: --genesis
unknown flag: --genesis

kwil-admin version

 Version:       0.8.1+release
 Git commit:    8880c1923ca34a97f2ca2e30357484efa6bf8a63
 Built:         2024-06-11T21:48:51Z
 API version:
 Go version:    go1.21.0
 OS/Arch:       linux/amd64

Edit: here I found another doc about modifying genesis:
https://docs.kwil.com/docs/ref/kwil-admin/setup/peer#examples

kwil-admin setup peer --root-dir ./kwil-node --genesis /path/to/genesis.json --peers
Error: unknown flag: --peers
unknown flag: --peers

The --peers at the ends will cause errors. But when I tried it without --peers the command was successful.
It might be not mentioned, but it also overrides config.toml like persistent_peers, become an empty string. Is it also expected?

@brennanjl
Copy link
Collaborator

@MicBun it appears we have some outdated docs. Thanks for flagging.

While we get those updated, I'll help you out manually. You said that you are trying to set up a genesis config, however based on the steps you provided, it seems like you are trying to join an existing network. Can you confirm which one you are doing?

@MicBun
Copy link
Contributor

MicBun commented Aug 12, 2024

Can you confirm which one you are doing?

@brennanjl I'm trying to run a node based on Genesis. Previously it was set up automatically with testnet.

@brennanjl
Copy link
Collaborator

Ah, then yes you are using the right command here:

kwil-admin setup peer --genesis path/to/genesis.json --root-dir ./output/path

Then to run it:

kwild --root-dir ./output/path

This is bad documentation on our end, will get fixed asap.

@markholdex
Copy link
Collaborator

@outerlook @MicBun what is the ETA for all the preparations on our side?

@brennanjl when do you think you can have the node up and running?

@rsoury rsoury mentioned this issue Aug 14, 2024
27 tasks
@brennanjl
Copy link
Collaborator

@markholdex whenever the setup repo is done, we will set it up! If it's ready now, we can do it tomorrow!

@MicBun
Copy link
Contributor

MicBun commented Aug 14, 2024

what is the ETA for all the preparations on our side?

We are closer to being done after everything is set up and we can run TSN nodes from a static genesis, we will begin testing.
Hopefully, it is done at the end of this week and testing can be done on Monday or Tuesday.

@MicBun
Copy link
Contributor

MicBun commented Aug 15, 2024

log
TSN binaries have been released: https://github.com/truflation/tsn/releases/tag/v1.1.6
TSN has been deployed and is using static genesis, databases also have been migrated.

Although the guide is still in progress, it is available at:
trufnetwork/truf-node-operator#3

@markholdex
Copy link
Collaborator

log TSN binaries have been released: https://github.com/truflation/tsn/releases/tag/v1.1.6 TSN has been deployed and is using static genesis, databases also have been migrated.

Although the guide is still in progress, it is available at: truflation/tsn-node-operator#3

@MicBun @outerlook does it mean we are pretty much ready to pass the Goal to Kwill?

@outerlook
Copy link
Contributor

Yes, @truflation/team-tsn-kwil is already able to find information on that branch about our nodes to connect.

And please share any feedback that you have during this process 🙏

@KwilLuke
Copy link
Contributor

KwilLuke commented Aug 15, 2024

@outerlook - just left a review on trufnetwork/truf-node-operator#3.

Overall, it is in the right direction. I ran into some issues with the persistent peers; however, I was eventually able to sync a sentry (read-only) node against the TSN testnet nodes using their IP addresses.

We will plan on deploying our node and upgrading it to a validator once trufnetwork/truf-node-operator#3 is merged. Does that work for everyone?

@markholdex
Copy link
Collaborator

@KwilLuke sounds good. We'll let you know once completed.

@outerlook
Copy link
Contributor

outerlook commented Aug 19, 2024

@KwilLuke trufnetwork/truf-node-operator#3 is completed. But we're also changing the domain today to remove test.

i.e.

- staging.node-2.tsn.truflation.com
+ staging.node-2.tsn.test.truflation.com

If you haven't started tackling it, I'd ask you to wait for my signal to connect to the new domain already from the beginning, which will happen soon

@KwilLuke
Copy link
Contributor

@outerlook sounds good, we will wait for your signal!

@zolotokrylin
Copy link
Contributor Author

What are the outstanding Problems? I can see in the description of this Goal that everything is "closed".

@jchappelow
Copy link
Contributor

The Kwil team node started syncing a few hours ago. I'll check on it soon and create the validator join request when it's set. Will report back with the public key.

@jchappelow
Copy link
Contributor

jchappelow commented Aug 20, 2024

All ready to become a validator.

$  kwil-admin validators list-join-requests
Pending join requests (2 approvals needed):
 Candidate                                                        | Power | Approvals | Expiration
------------------------------------------------------------------+-------+-----------+------------
 dc2240baa54023b06a3cfa95a5d0c8fccff6f5cd6b918e91bc9f4788cc3fc2cc |     1 |         0 | 82775

The node list will be updated in trufnetwork/truf-node-operator#5. I will take that out of draft as soon as I assign a DNS record to the node.

@KwilLuke
Copy link
Contributor

@outerlook - just in case it isn't clear, our node is now set up, and we are waiting for you to approve the validator request.

@outerlook
Copy link
Contributor

Approved! Thank you @KwilLuke @jchappelow

@KwilLuke
Copy link
Contributor

@outerlook Awesome! Should this issue be closed, or do you want to keep it open for additional tracking?

@outerlook
Copy link
Contributor

https://staging.tsn.truflation.com/v0/chain/nodes/count

{
  "ok": true,
  "data": {
    "validators": 3,
    "non_validators": 0
  }
}

It's done. Good job to all here! ❤️

@markholdex
Copy link
Collaborator

@srust99 FYI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants