-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DATA-3441 Update data export command #4596
base: main
Are you sure you want to change the base?
Conversation
Removed myself and added team-data since I unfortunately won't be able to get to this today |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work on this!
cli/app.go
Outdated
{ | ||
Name: "tabular", | ||
Usage: "download tabular data", | ||
UsageText: createUsageText("data export tabular", []string{dataFlagDestination, "part-id", "component-name", "method"}, true), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] resource name here too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh!!!
filePath := utils.ResolveFile(dataFileName) | ||
_, err = os.ReadFile(filePath) | ||
test.That(t, err, test.ShouldNotBeNil) | ||
test.That(t, err, test.ShouldBeError, fmt.Errorf("open %s: no such file or directory", filePath)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[super-duper nit] I think we only need the test.ShouldBeError, that also handles if the error is nil I believe.
}, | ||
&cli.StringFlag{ | ||
Name: "start", | ||
Usage: "ISO-8601 timestamp indicating the start of the interval", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be RFC 3339? I see thats what we're using for the time layout. (I know both are hella similar)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't change what we were already using, but I suspect that we chose it to be a little more flexible with user input
cli/data.go
Outdated
} | ||
end = timestamppb.New(t) | ||
} | ||
if start != nil || end != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't this be fine to always do as opposed to conditionally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a huge deal either way, I believe. Removed it for you!
if err := os.MkdirAll(filepath.Join(dst, metadataDir), 0o700); err != nil { | ||
|
||
// Periodically flush to keep buffer size down. | ||
if *numWrites == uint64(10_000) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be greater than instead of an exact match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not opposed, but curious to hear your reasoning
// tabularData downloads binary data matching filter to dst. | ||
func (c *viamClient) tabularData(dst string, filter *datapb.Filter, limit uint) error { | ||
// tabularData downloads unified tabular data and metadata for the requested data source and interval to the specified destination. | ||
func (c *viamClient) tabularData(dest string, request *datapb.ExportTabularDataRequest) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just so make sure I'm understanding this section properly, you have a go func so that we can fetch while we're still writing previous results to a file right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean on line 699?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then yes! So we can get a new result while writing the previous result is in progress
} | ||
mdIndex++ | ||
} | ||
dataRowChan := make(chan []byte) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are your thoughts on making this a buffered channel? The reason im asking is lets say disk IO is slow, we can fetch more requests while waiting on that. (I realize that disk io will probably never be the limiting factor, but curious what your thoughts are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If disk IO is slow, we'll be constrained by that anyways. If we run into problems, perhaps we can explore doing something, but I'm of the mind to leave this as-is for now
This PR replaces our old
data export
command with two separate subcommands:data export binary
anddata export tabular
. The logic poweringdata export binary
is the same.data export tabular
was essentially rewritten and now usesExportTabularData
.This method:
--destination
.ExportTabularData
to query for a specific data source within an optional interval.data.ndjson
.If there is an error, the export process will be attempted up to 5 times. If there is still an error, the
data.ndjson
file is removed.Automated Testing:
TestTabularDataByFilterAction
to work with the new setup.data.ndjson
.Manual Testing:
Ran export command locally after setting my base-url to my
ExportTabularData
deployed branch.Queried for same data source as in DATA-3440
data.ndjson
file (30.7 MB).Also tested with interval start/end, just interval start, and just interval end. Confirmed that correct data ranges were returned ✅