Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reduce amount of data loaded during atx validation #6326

Draft
wants to merge 41 commits into
base: develop
Choose a base branch
from

Conversation

dshulyak
Copy link
Contributor

@dshulyak dshulyak commented Sep 12, 2024

earlier this week i shared results of fs profiling with stacks.
reads pattern may vary, but in general it is somewhat x10-15 to writes, e.g for every 2GB of atxs written go-spacemesh loads 20GB-30GB.

it looks like that
image

in this change i applied several simple optimizations that follow directly from the pprof image shared above. all of them are in last commit, i made on top of v1.6.8

  1. cache poet proofs, as they usually shared and quite large, so it is a waste to load them for every atx
  2. identities stored without row_id, that leads to ineffecient IsMalicious check because of how clustered indexes are stored in sqlite, and also IsMalicious is run twice in close proximity
  3. atx downloader was doing a lot of repetitive updates, that required reading before write

the read rate was reduced quite significantly to ~7.5GB per 2GB of writes.

image

the remaining problems for atx validation are:

  1. still a lot of reads to validate maliciousness or atx, e.g loading prev/commitment/positioning atx
  2. atxs themself require a lot of updates in indexes, you can see it specifically here, as how many btree updates are required for atx insertion

image

dshulyak and others added 30 commits July 8, 2024 11:05
this will reduce total gossipsub traffix x8.
comitee upgrade is scheduled to be used from layer 105_720 (July 15, 2024, 10:00:00 AM UTC). 


Co-authored-by: Bartosz Różański <[email protected]>
Co-authored-by: Jedrzej Nowak <[email protected]>
## Motivation

By integrating go-http-metrics, we can gain detailed insights into the performance and behavior of our APIs.
[backport] api: Add go-http-metrics to collect API metrics (#6099)
[backport] api: v2alpha1: Add vesting, vault and drain vault contents to Transaction (#6112)
* Upgrade to support 8.0 Mio ATXs (#6115)

## Motivation

Upgrade limits to allow for up to 8.0 Mio ATXs

* Update CHANGELOG
## Motivation

The ATX syncer was observed to be hanging when a peer serves an invalid ATX. It counts only specific errors as failed requests, instead of every failed request and never reaches the configured requests limit.
* cache poet certifier info (#6107)

## Motivation

To avoid querying the same poet `/v1/info` endpoint many times in the 1:N setups.

* Update CHANGELOG

---------

Co-authored-by: Bartosz Różański <[email protected]>
* fix(fetch): close completed channel once (#6152)

## Motivation

Close completed channel once can be closed multiple times and may cause a panic. This fix adds a use of a `sync.Once` to handle this possible edge case.

* Update CHANGELOG.md

---------

Co-authored-by: acud <[email protected]>
* ATX handler rejects invalid ATXs on pubsub lvl (#6142)

In order to drop peers sending invalid ATXs, the handler must return `pubsub.ErrValidationReject`

* Update CHANGELOG.md

---------

Co-authored-by: Bartosz Różański <[email protected]>
* chore: improve mempool

* Cleanup Changelog

---------

Co-authored-by: Matthias <[email protected]>
## Motivation

The merge tool right now copies over all files in the identities directory (including files like `.DS_Store` and `desktop.ini`). Since those are likely to be present in both locations this can block the merge tool until they are manually deleted.

This PR fixes this issue.
* use zap in syncer package (#6121)

## Motivation

Part of the effort to switch from log to zap.

* use zap in proposals package (#6146)

## Motivation

Part of the effort to migrate from log to zap.

* use zap in mesh package (#6147)

## Motivation

Part of the effort to migrate from log to zap.

* use zap in the fetch package (#6109)

## Motivation

Part of the effort to switch from log to zap.

* reducing log spam, shortening error messages in logs (#6128)

Closes #5887

There are two main issues:
1. We have too many level Info+  logs emitted too frequently
2. Some logs are too long

* update changelog
* Upload api swagger-ui on release (#6088)

## Motivation

Automate the process of building and deploying the spacemeshos/api Swagger UI with each go-spacemesh release, ensuring consistent and up-to-date documentation.



Co-authored-by: Matthias <[email protected]>

* Fix api-swagger-ui workflow trigger (#6182)

## Motivation

Fix the api-swagger-ui workflow trigger.

---------

Co-authored-by: andres-spacemesh <[email protected]>
* configure poet /v1/info cache ttl in presets (#6198)

## Motivation

For the cache to work, it needs to have non-zero TTL set

* cache poet's /v1/pow_params with TTL (#6199)

## Motivation

Similarly to /v1/info, we query /v1/pow_params very often (once per submit per node ID). As the contents returned from this endpoint change rarely (once per epoch), it makes sense to cache the result.

* update changelog
## Motivation

Fallback to PoW if cannot recertify after poet registration failed with 401 (unauthorized).
The `singleflight` pkg allows us to simplify the code.
[backport] api: v2alpha1: Add labels_per_unit to NetworkService.Info endpoint (#6213)
fasmat and others added 10 commits August 12, 2024 23:01
* Speed up ATX cache warmup (#6241)

## Motivation

Speeding up the in-memory ATX cache warmup that is especially slow on HDDs.



Co-authored-by: Jedrzej Nowak <[email protected]>

* Update CHANGELOG

* Fix compiler error

---------

Co-authored-by: Bartosz Różański <[email protected]>
Co-authored-by: Jedrzej Nowak <[email protected]>
)

The removed check served no purpose. The actual contextual validation is implemented in `StoreAtx()`.

Co-authored-by: Bartosz Różański <[email protected]>
* Fix response data slice too small (#6248)

## Motivation

The response msg object needs to be increased in size to allow 8.0 Mio ATXs to be processed by the node

* Update CHANGLOG
…action list. (#6269) (#6277)

* api: v2alpha1: Use subquery instead of left join for transaction list.  (#6269)

Removed the LEFT JOIN from IterateTransactionsOps because it was too slow for handling queries. 
Instead, a subquery with SELECT is used to retrieve tx ids for addresses used in transactions.

Added CustomQuery field to sql builder to allow more complex expressions.
Fixed transaction test generator.

* Update CHANGELOG
* chore(txs): silence logs (#6278)

## Motivation

Reducing log-levels for mempool transaction processing.

* Update changelog

---------

Co-authored-by: Matthias <[email protected]>
…r in 1:n (#6282)

* Update time measurement of metrics for proposal builder in 1:n (#6268)

## Motivation

This updates how timing metrics for the proposal builder are calculated.

* Update CHANGELOG
* Use ATXData during tortoise init (#6279)

## Motivation

Instead of reading information about malicious identities from the DB during tortoise init use atx data.

* Update Changelog
wip

telemetry

untx malicious check

add coalescing in background

redundant work

wip

all the stuff

more

sort atxs before insertion

do single pass to find prev and other current

serialize all access to poets, so that at most one reader active

save commitment_atx
@dshulyak dshulyak force-pushed the reduce-amount-of-data-loaded-during-atx-validation branch from b9e746b to f1a6929 Compare September 18, 2024 08:04
spacemesh-bors bot pushed a commit that referenced this pull request Sep 20, 2024
followup for #6326

in current code poet proofs are fetched for every atx submitted to the node, 
they are relatively large (140KB) and account for sizeable chunk of all reads executed on the atx handler codepath (25%).

they are also a perfect case for lru caching, they are mostly reused and replaced from epoch to epoch.

basic stats in recent epochs.

```
select round_id, count(*), max(length(poet)) from poets group by round_id;
25|42|146200
26|45|145936
27|45|145738
28|45|145903
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants