Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(services): reworked Madara services for better cancellation control #405

Merged
merged 32 commits into from
Dec 16, 2024
Merged
Show file tree
Hide file tree
Changes from 29 commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
630cb59
feat(service): it compiles (fuck fuck fuck fuck fuck...)
Trantorian1 Dec 2, 2024
7323542
feat(service): got services running normally again
Trantorian1 Dec 2, 2024
da45d87
feat(telemetry): refactored telemetry service to be able to be restarted
Trantorian1 Dec 2, 2024
ce2f8f0
feat(rpc): rpc now supports new service architecture
Trantorian1 Dec 3, 2024
62dd2fd
feat(l1_sync): it should now be possible to restart l1 sync service
Trantorian1 Dec 3, 2024
d451b0a
feat(l2_sync): it should now be possible to restart l2 sync service
Trantorian1 Dec 3, 2024
d05d9b6
feat(services): got global cancellation to work again!
Trantorian1 Dec 3, 2024
3169195
docs(service): started work on service documentation
Trantorian1 Dec 4, 2024
ee8a7ca
docs(services): added section on service status requests
Trantorian1 Dec 4, 2024
bd2c879
docs(services): added more docs to some service structures
Trantorian1 Dec 4, 2024
1b48772
docs(services): added docs for `ServiceMonitor`
Trantorian1 Dec 4, 2024
2eca30a
feat(clippy): clippy no longer complains and fixed tests
Trantorian1 Dec 4, 2024
cd58d07
Merge branch 'main' into feat/services
Trantorian1 Dec 4, 2024
91716f4
feat(services): added handling for `SIGTERM`
Trantorian1 Dec 4, 2024
b1e4e1e
feat(services): added deffered start to conflicting services during w…
Trantorian1 Dec 5, 2024
4117d54
feat(admin): updated adming rpc methods to include all service restar…
Trantorian1 Dec 5, 2024
bf32cc1
docs(warp): updated the docs on database migration now it is zero dow…
Trantorian1 Dec 5, 2024
c2c7655
docs(links): fixed some links
Trantorian1 Dec 5, 2024
c71a08c
docs(admin): updated the list of admin methods
Trantorian1 Dec 5, 2024
f76b5ee
chore(changelog)
Trantorian1 Dec 5, 2024
8383ad1
fix(test): all tests now pass locally
Trantorian1 Dec 5, 2024
3ae5478
fix(admin): bundled all service status methods into one
Trantorian1 Dec 6, 2024
647895f
feat(warp): warp updates now actually work for sequencer nodes
Trantorian1 Dec 9, 2024
44a4c0d
feat(service): service id is now behind a trait
Trantorian1 Dec 9, 2024
5cd6550
docs(service): added a fair bit of docs on how to create your own ser…
Trantorian1 Dec 9, 2024
78c0874
refactor(service): ctx.cancelled in tokio::select is now ctx.run_unti…
Trantorian1 Dec 10, 2024
8c3aed4
Merge branch 'main' into feat/services
Trantorian1 Dec 10, 2024
6148d87
fix(lint)
Trantorian1 Dec 10, 2024
a13810f
fix(me): stupid
Trantorian1 Dec 10, 2024
47be1c5
Merge branch 'main' into feat/services
Trantorian1 Dec 12, 2024
5f81948
fix(comments)
Trantorian1 Dec 12, 2024
780776f
fix(lint)
Trantorian1 Dec 12, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

## Next release

- feat(services): reworked Madara services for better cancellation control
- feat(block_production): continue pending block on restart
- feat(mempool): mempool transaction saving on db
- feat(mempool): mempool transaction limits
Expand Down
5 changes: 4 additions & 1 deletion Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 3 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -192,6 +192,9 @@ num-bigint = "0.4"
primitive-types = "0.12"
rand = "0.8"
indoc = "2"
proc-macro2 = "1.0.86"
quote = "1.0.26"
syn = { version = "2.0.39", features = ["full"] }
reqwest = { version = "0.12", features = ["blocking", "json"] }
rstest = "0.18"
serde = { version = "1.0", default-features = false, features = ["std"] }
Expand Down
220 changes: 160 additions & 60 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,11 +33,14 @@ Madara is a powerful Starknet client written in Rust.
- [Supported JSON-RPC Methods](#supported-json-rpc-methods)
- [Madara-specific JSON-RPC Methods](#madara-specific-json-rpc-methods)
- [Example of Calling a JSON-RPC Method](#example-of-calling-a-json-rpc-method)
- 📚 [Database Migration with Zero Downtime](#-database-migration-with-zero-downtime)
- [Warp Update](#warp-update)
- [Achieving Zero Downtime](#achieving-zero-downtime)
- [Running without `--warp-update-sender`](#running-without---warp-update-sender)
- ✅ [Supported Features](#-supported-features)
- [Starknet Compliant](#starknet-compliant)
- [Feeder-Gateway State Synchronization](#feeder-gateway-state-synchronization)
- [State Commitment Computation](#state-commitment-computation)
- [Database Migration](#database-migration)
- 💬 [Get in touch](#-get-in-touch)
- [Contributing](#contributing)
- [Partnerships](#partnerships)
Expand Down Expand Up @@ -113,7 +116,7 @@ cargo run --release -- \
--name Madara \
--sequencer \
--base-path /var/lib/madara \
--preset test \
--preset sepolia \
--l1-endpoint ${ETHEREUM_API_URL}
```

Expand All @@ -126,7 +129,7 @@ cargo run --release -- \
--name Madara \
--devnet \
--base-path /var/lib/madara \
--preset test
--preset sepolia
```

> [!NOTE]
Expand Down Expand Up @@ -426,16 +429,11 @@ are exposed on a separate port **9943** unless specified otherwise with
<details>
<summary>Status Methods</summary>

| Method | About |
| -------------------- | ---------------------------------------------------- |
| `madara_ping` | Return the unix time at which this method was called |
| `madara_shutdown` | Gracefully stops the running node |
| `madara_rpcDisable` | Disables user-facing rpc services |
| `madara_rpcEnable` | Enables user-facing rpc services |
| `madara_rpcRestart` | Restarts user-facing rpc services |
| `madara_syncDisable` | Disables l1 and l2 sync services |
| `madara_syncEnable` | Enables l1 and l2 sync services |
| `madara_syncRestart` | Restarts l1 and l2 sync services |
| Method | About |
| ----------------- | ---------------------------------------------------- |
| `madara_ping` | Return the unix time at which this method was called |
| `madara_shutdown` | Gracefully stops the running node |
| `madara_service` | Sets the status of one or more services |

</details>

Expand Down Expand Up @@ -544,67 +542,61 @@ into the subscription stream:
Where `you-subscription-id` corresponds to the value of the `subscription` field
which is returned with each websocket response.

## ✅ Supported Features
## 📚 Database Migration with Zero Downtime

[⬅️ back to top](#-madara-starknet-client)

### Starknet compliant

Madara is compliant with the latest `v0.13.2` version of Starknet and `v0.7.1`
JSON-RPC specs. You can find out more about this in the [interactions](#-interactions)
section or at the official Starknet [JSON-RPC specs](https://github.com/starkware-libs/starknet-specs).

### Feeder-Gateway State Synchronization

Madara supports its own implementation of the Starknet feeder gateway, which
allows nodes to synchronize state from each other at much faster speeds than
a regular sync.

> [!NOTE]
> Starknet does not currently have a specification for its feeder-gateway
> protocol, so despite our best efforts at output parity, you might still notice
> some discrepancies between official feeder gateway endpoints and our own
> implementation. Please let us know about if you encounter this by
> [raising an issue](https://github.com/madara-alliance/madara/issues/new/choose)

### State Commitment Computation

Madara supports merkelized state verification through its own implementation of
Besu Bonsai Merkle Tries. See the [bonsai lib](https://github.com/madara-alliance/bonsai-trie).
You can read more about Starknet Block structure and how it affects state
commitment [here](https://docs.starknet.io/architecture-and-concepts/network-architecture/block-structure/).

### Database Migration

When migration to a newer version of Madara you might need to update your
database. Instead of re-synchronizing the entirety of your chain's state from
genesis, you can use Madara's **warp update** feature.

> [!NOTE]
> Warp update requires an already synchronized _local_ node with a working
> database.
### Warp Update

Warp update requires a working database source for the migration. If you do not
already have one, you can use the following command to generate a sample
database:

```bash
cargo run --release -- \
cchudant marked this conversation as resolved.
Show resolved Hide resolved
--name madara \
--network mainnet \
--full \
--l1-endpoint https://*** \
--n-blocks-to-sync 1000 `# Only synchronize the first 1000 blocks` \
--stop-on-sync `# ...and shutdown the node once this is done`
```

To begin the database migration, you will need to start an existing node with
To begin the database migration, you will need to start your node with
[admin methods](#madara-specific-json-rpc-methods) and
[feeder gateway](#feeder-gateway-state-synchronization) enabled. This will be
the _source_ of the migration. You can do this with the `--warp-update-sender`
[preset](#4.-presets):
[preset](#4-presets):

```bash
cargo run --releasae -- \
--name Sender \
--full \ # This also works with other types of nodes
--network mainnet \
--warp-update-sender
cargo run --release -- \
--name Sender \
--full `# This also works with other types of nodes` \
--network mainnet \
--warp-update-sender \
--l1-sync-disabled `# We disable sync, for testing purposes` \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"replace this argument with your --l1-endpoint parameter"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add more info about this. The goal for this demonstration though is to stop on sync though, else the sender node would keep synchronizing and the update would take forever

--l2-sync-disabled
```

> [!TIP]
> Here, we have disabled sync for testing purposes, so the migration only
> synchronizes the blocks that were already present in the source node's
> database. In a production usecase, you most likely want the source node to
> keep synchronizing with an `--l1-endpoint`, that way when the migration is
> complete the receiver is fully up-to-date with any state that might have been
> produced by the chain _during the migration_.

You will then need to start a second node to synchronize the state of your
database:

```bash
cargo run --releasae -- \
cargo run --release -- \
--name Receiver \
--base-path /tmp/madara_new \ # Where you want the new database to be stored
--base-path /tmp/madara_new `# Where you want the new database to be stored` \
--full \
--network mainnet \
--l1-endpoint https://*** \
Expand All @@ -615,12 +607,120 @@ This will start generating a new up-to-date database under `/tmp/madara_new`.
Once this process is over, the warp update sender node will automatically
shutdown while the warp update receiver will take its place.

> [!WARNING]
> As of now, the warp update receiver has its rpc disabled, even after the
> migration process has completed. This will be fixed in the future, so that
> services that would otherwise conflict with the sender node will automatically
> start after the migration has finished, allowing for migrations with 0
> downtime.
> [!NOTE]
> You might already have noticed this line which appears at the end of the sync:
> `📱 Running JSON-RPC server at 127.0.0.1:9944 ...`. More about this in the
> next section

### Achieving Zero Downtime

Suppose your are an RPC service provider and your node is also running an RPC
server and exposing it to your clients: if you have to shut it down or restart
it for the duration of a migration this will result in downtime for your service
and added complexity in setting up redundancies.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well you can just resync a new node, i think it makes more sense to talk about sequencers here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea fair, though this is still a bit faster than that due to being on a local network. It would also be quite easy to make this a lot faster "in the future ™️ " by computing the state root once the migration has completed, instead of at each block which is what we do on a normal sync. Something to keep in mind.


The main issue is that it is not possible for multiple nodes to expose their
services on the same port, so our receiver cannot start its rpc service if the
sender node already has it active. Madara fixes this issue thanks to its
microservice architecture which allows for deferred starts: when the sender has
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is kubernetes' concern, that's not microservices here

rolling updates in kubernetes don't work like this, the nodes dont expose their ports to the same one in k8s
in proper k8s apps you would do their rolling update thing or even canary where you slowly switch your reverse proxy from one version of the app to another

that's not the problem we're trying to solve here right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also the way you're doing it isnt technically "no downtime" i believe

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont understand why we would want no downtime updates for full nodes, this is definitely not the right layer to do it
i thought """no downtime""" upgrades would be something for sequencers, so that they can keep producing blocks after a migration - that's not a problem you can easily solve using existing stuff right?

Copy link
Collaborator Author

@Trantorian1 Trantorian1 Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Zero downtime in this sense refers to the fact the receiver seamlessly takes the place of the sender, hence no downtime relating to having to restart the node. I'm not very familiar with v8.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also sequencer migrations are supported as well now (mempool is still not transferred though)

shutdown, the receiver will automatically start any potentially conflicting
services, seamlessly taking its place.

To test this out, run the following command before and after the sender has
shutdown:

> [!IMPORTANT]
> If you have already run a node with `--warp-update-receiver` following the
> examples above, remember to delete its database with `rm -rf /tmp/madara_new`.

```bash
curl --location 'localhost:9944'/v0_7_1/ \
--header 'Content-Type: application/json' \
--data '{
"jsonrpc": "2.0",
"method": "rpc_methods",
"params": [],
"id": 1
}' | jq --sort-keys
```

By default, the sender has its rpc server enabled, but this keeps working even
_after_ it has shutdown. This is because the receiver has taken its place.

### Running without `--warp-update-sender`

Up until now we have had to start a node with `--warp-update-sender` to start
a migration, but this is only a [preset](#4-presets). In a production
environment, you can start your node with the following arguments and achieve
the same results:

```bash
cargo run --release -- \
--name Sender \
--full `# This also works with other types of nodes` \
--network mainnet \
--feeder-gateway-enable `# The source of the migration` \
--gateway-port 8080 `# Default port, change as required` \
--rpc-admin `# Used to shutdown the sender after the migration` \
--rpc-admin-port 9943 `# Default port, change as required` \
--l1-sync-disabled `# We disable sync, for testing purposes` \
--l2-sync-disabled
```

`--warp-update-receiver` doesn't override any cli arguments but is still needed
on the receiver end to start the migration. Here is an example of using it with
custom ports:

> [!IMPORTANT]
> If you have already run a node with `--warp-update-receiver` following the
> examples above, remember to delete its database with `rm -rf /tmp/madara_new`.

```bash
cargo run --release -- \
--name Receiver \
--base-path /tmp/madara_new `# Where you want the new database to be stored` \
--full \
--network mainnet \
--l1-endpoint https://*** \
--warp-update-port-rpc 9943 `# Same as set with --rpc-admin-port on the sender` \
--warp-update-port-fgw 8080 `# Same as set with --gateway-port on the sender` \
--feeder-gateway-enable \
--warp-update-receiver
```

Using this setup and adding any other arguments you need to the warp update
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but what about when you have to close the node to remove that warp-update-receiver argument

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused. The node does not have to be closed if it is called with --warp-update-receiver: this is a cli flag which does not set any other arguments, so once the warp update has completed your node will run normally as per the other arguments that were passed to it. There is no need close the node to remove the --warp-update-receiver argument.

sender and receiver, you can migrate your node in a production environment with
_zero downtime_ on any externally facing services.

## ✅ Supported Features

[⬅️ back to top](#-madara-starknet-client)

### Starknet compliant

Madara is compliant with the latest `v0.13.2` version of Starknet and `v0.7.1`
JSON-RPC specs. You can find out more about this in the [interactions](#-interactions)
section or at the official Starknet [JSON-RPC specs](https://github.com/starkware-libs/starknet-specs).

### Feeder-Gateway State Synchronization

Madara supports its own implementation of the Starknet feeder gateway, which
allows nodes to synchronize state from each other at much faster speeds than
a regular sync.

> [!NOTE]
> Starknet does not currently have a specification for its feeder-gateway
> protocol, so despite our best efforts at output parity, you might still notice
> some discrepancies between official feeder gateway endpoints and our own
> implementation. Please let us know about if you encounter this by
> [raising an issue](https://github.com/madara-alliance/madara/issues/new/choose)

### State Commitment Computation

Madara supports merkelized state commitments through its own implementation of
Besu Bonsai Merkle Tries. See the [bonsai lib](https://github.com/madara-alliance/bonsai-trie).
You can read more about Starknet Block structure and how it affects state
commitment [here](https://docs.starknet.io/architecture-and-concepts/network-architecture/block-structure/).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

state commitments


## 💬 Get in touch

Expand Down
Loading
Loading