Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sandersaarond/mysql model metrics #98

Draft
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

SandersAaronD
Copy link
Collaborator

@SandersAaronD SandersAaronD commented Sep 27, 2024

WIP

Meant to transition writing and reading model metrics from loki to mysql, currently has a writer implemented and the reader function is there but not attached to an endpoint.

TODO for this:

  • make sure the reader function works, and serves an endpoint
  • have the UI query that endpoint instead of loki
  • switch the python client to write to this (format was changed slightly to allow for debouncing so that we don't end up with too many http connections open at once for frequent logging jobs)

Unclear if this can be done as part of this PR, but: I have a dangling concern about authentication, where it isn't clear to me if tenantID will always be populated correctly in production, since our local setup just defaults to 0, making this hard to test locally. Either we need to find a way to test this locally or we won't know about that until we deploy it.

@SandersAaronD SandersAaronD force-pushed the sandersaarond/mysql-model-metrics branch from 53d8df6 to e068d1e Compare October 1, 2024 15:41
@SandersAaronD SandersAaronD marked this pull request as draft October 2, 2024 20:31
Copy link
Contributor

@annanay25 annanay25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left an initial round of comments, nice work so far!

StackID uint64 `json:"stack_id" gorm:"not null;primaryKey"`
ProcessID uuid.UUID `json:"process_id" gorm:"type:char(36);not null;primaryKey;foreignKey:ProcessID;references:ID"` // Foreign key
MetricName string `json:"metric_name" gorm:"size:32;not null;primaryKey"`
StepName string `json:"step_name" gorm:"size:32;not null;primaryKey"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not entirely obvious what StepName is and why we need it. We should add comments explaining the fields.

Comment on lines +18 to +24
// Add a custom hook if necessary for additional logic.
// Example: AfterCreate hook for custom logic
func (m *ModelMetrics) AfterCreate(tx *gorm.DB) error {
// Custom logic after creating a metric entry
tx.Logger.Info(tx.Statement.Context, "AfterCreate hook called for ModelMetrics")
return nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is being used?

Comment on lines +24 to +44
type testApp struct {
App
}

func (a *testApp) db(ctx context.Context) *gorm.DB {
return a.App._db
}

func setupTestDB(t *testing.T) (*gorm.DB, func()) {
db, err := gorm.Open(sqlite.Open("file::memory:?cache=shared"), &gorm.Config{})
require.NoError(t, err)

err = db.AutoMigrate(&model.Process{}, &model.ModelMetrics{})
require.NoError(t, err)

return db, func() {
sqlDB, err := db.DB()
require.NoError(t, err)
sqlDB.Close()
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we reuse the code in api_test.go? I think most of this boilerplate is set up already.

)

type ModelMetrics struct {
StackID uint64 `json:"stack_id" gorm:"not null;primaryKey"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use TenantID as that's the consistent key we are using in other structs? Or can we update the other structs to use StackID as a uint64?

}

// Iterate over the metrics and build the series data
var response GetModelMetricsResponse
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's some indentation mismatch in this file from this point on

Comment on lines +19 to +27
// Incoming format is an array of these
type ModelMetricsSeries struct {
MetricName string `json:"metric_name"`
StepName string `json:"step_name"`
Points []struct {
Step uint32 `json:"step"`
Value json.Number `json:"value"`
} `json:"points"`
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we storing metrics as individual rows (one row each for a step and value) in mySQL? Why not store an array like in this struct? Is it not performant? Curious if there was some benchmarking done to choose the former.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, it would also limit us from compressing the points in the future which could end up with a lot more disk and network usage. Do we ever not pull back all of the points at once?

MetricValue string `json:"metric_value" gorm:"size:64;not null"`

Process Process `gorm:"foreignKey:ProcessID;references:ID"` // Relationship definition
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also track timestamp for each step? i feel that might be helpful in tracking time between training runs, performance tracking, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants