Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Optional async decoupling for secondary writes and reworked E2E metric assertions #182

Merged
merged 20 commits into from
Oct 26, 2024

Conversation

epociask
Copy link
Collaborator

@epociask epociask commented Oct 12, 2024

Fixes Issue

Related to #164 but doesn't fix it.

Fixes #113 and #96

Changes proposed

  • Optional async decoupling for write routines to reduce dispersal latencies when writing to EigenDA. User can optionally configure the number of go routines.
  • Decompose router abstraction into primary router with access to a secondary one which manages secondary writing/reading
  • Remove stats struct profiling in-favor of in-memory implementation of Metricer interface for E2E assertions
  • Add retry logic for secondary insertions
  • Decomposed tests into cleaner and more concise modules
  • Added async benchmark function

Benchmarking

As expected, latency decreases when using async writes to secondary storage.

0 threads:

     240	   6362160 ns/op	  412053 B/op	     624 allocs/op

5 threads:

     758	   1508104 ns/op	  218021 B/op	     620 allocs/op

10 threads:

    1742	    914375 ns/op	  167984 B/op	     619 allocs/op

Note to reviewers

Copy link
Collaborator

@samlaf samlaf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the refactor! Think there are ways to make the API for secondary struct easier to understand and use though..! I'm not a big fan of async APIs. See my reasoning here: https://www.notion.so/eigen-labs/Bls-Agg-Service-2-0-synchronous-API-98abef46040a48fc8d044e7de0781839

README.md Outdated
Comment on lines 211 to 215
Unit tests can be ran via invoking `make test`. Please make sure to have all test containers downloaded locally before running via:
```
docker pull redis
docker pull minio
```
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this really needed? I would think testcontainer would still pull the images when attempting to run them if they are not present locally?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

server/load_store.go Outdated Show resolved Hide resolved
store/router.go Outdated Show resolved Hide resolved
store/secondary.go Outdated Show resolved Hide resolved
store/secondary.go Outdated Show resolved Hide resolved
store/secondary.go Outdated Show resolved Hide resolved
server/load_store.go Outdated Show resolved Hide resolved
store/router.go Outdated Show resolved Hide resolved
@epociask epociask requested a review from samlaf October 18, 2024 02:30
@epociask epociask changed the title chore: Better abstract secondary storage feat: Optional async decoupling for secondary writes and rework metrics Oct 18, 2024
@epociask epociask changed the title feat: Optional async decoupling for secondary writes and rework metrics feat: Optional async decoupling for secondary writes and reworked E2E metric assertions Oct 18, 2024
README.md Show resolved Hide resolved
commitments/mode.go Show resolved Hide resolved
metrics/memory.go Show resolved Hide resolved
metrics/memory.go Show resolved Hide resolved
metrics/memory.go Outdated Show resolved Hide resolved
metrics/memory.go Outdated Show resolved Hide resolved
store/store.go Outdated Show resolved Hide resolved
Comment on lines 24 to 33
type ISecondary interface {
AsyncEntry() bool
Enabled() bool
Topic() chan<- PutNotify
CachingEnabled() bool
FallbackEnabled() bool
HandleRedundantWrites(ctx context.Context, commitment []byte, value []byte) error
MultiSourceRead(context.Context, []byte, bool, func([]byte, []byte) error) ([]byte, error)
WriteSubscriptionLoop(ctx context.Context)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably works. I personally feel like we're getting wayyyy to complex way too early, for something that's super simple: we only have s3 and redis options, but now we have all these indirection layers, caching vs fallback, interfaces over interfaces. Makes it very hard to understand the code, review it, find bugs, benchmark/profile and optimize, etc. Let's merge this, but I'd like to try to keep the complexity lower in the future, or at least think through ways of simplifying the architecture.

@epociask epociask requested a review from samlaf October 21, 2024 01:20
Copy link
Collaborator

@samlaf samlaf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Have a few small nits but want to get this PR merged asap!

e2e/setup.go Outdated
@@ -113,6 +111,7 @@ type Cfg struct {
UseS3Caching bool
UseRedisCaching bool
UseS3Fallback bool
WriteThreadCount int
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prob makes the comment above ("at most one of the below options should be true") not true anymore. Maybe update the comment to say "below 3/4 options"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e2e/setup.go Show resolved Hide resolved
e2e/setup.go Show resolved Hide resolved
e2e/setup.go Outdated
Comment on lines 335 to 363
// StringOrByte ... constraint that generic type is either "string" or "[]byte"
type StringOrByte interface {
string | []byte
}

func randStr(n int) string {
var letterRunes = []rune("abcdefghijklmnopqrstuvwxyz")
b := make([]rune, n)
for i := range b {
b[i] = letterRunes[rand.Intn(len(letterRunes))]
}
return string(b)
}

// Rand generates either a random string or byte slice depending on the type T
func Rand[T StringOrByte](length int) T {
str := randStr(length)

// Use type switch to return the correct type
var result any
switch any(new(T)).(type) {
case *string:
result = str
case *[]byte:
result = []byte(str)
}

return result.(T)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think this is overcomplicating things and not necessary. Generics typically when behavior is the same, but types are different. Here we just want 2 different things. Why not just have 2 separate functions RandByte and RandString? Super straightforward, follows stdlib pattern. At your call site if I see the generic I'm not sure what's happening so I need to click through and then parse this complex function, for something that should be completely trivial and uninteresting.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -33,6 +33,7 @@ func withMetrics(
// we assume that every route will set the status header
versionByte, err := parseVersionByte(w, r)
if err != nil {
recordDur(w.Header().Get("status"), string(mode), string(versionByte))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This w.Header().Get("status") thing is buggy, I have an issue about this somewhere. But let's keep it like this for now and fix it in another PR.

@epociask epociask merged commit 705f740 into main Oct 26, 2024
7 checks passed
@epociask epociask deleted the epociask--chore-reabstract-router branch October 26, 2024 01:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove stats profiling in favor of prometheus
2 participants