Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DATA-3441 Update data export command #4596

Open
wants to merge 29 commits into
base: main
Choose a base branch
from

Conversation

katiepeters
Copy link
Member

@katiepeters katiepeters commented Dec 3, 2024

This PR replaces our old data export command with two separate subcommands: data export binary and data export tabular. The logic powering data export binary is the same. data export tabular was essentially rewritten and now uses ExportTabularData.

This method:

  • Checks that the user is logged in.
  • Creates any missing directories in the specified --destination.
  • Utilizes ExportTabularData to query for a specific data source within an optional interval.
  • Writes the results to data.ndjson.

If there is an error, the export process will be attempted up to 5 times. If there is still an error, the data.ndjson file is removed.

Automated Testing:

  • Updated the existing tests for TestTabularDataByFilterAction to work with the new setup.
  • Added a new error test case to test for retries and removal of data.ndjson.

Manual Testing:
Ran export command locally after setting my base-url to my ExportTabularData deployed branch.

Queried for same data source as in DATA-3440

--part-id=cfc5404e-e269-425d-b1f9-ad7ce18790e9 --resource-name=globetrotter --resource-subtype=rdk:component:movement_sensor --method=Position
  • Successfully created data.ndjson file (30.7 MB).
  • Finished in ~1 min, 40 sec.

Also tested with interval start/end, just interval start, and just interval end. Confirmed that correct data ranges were returned ✅

@viambot viambot added the safe to test This pull request is marked safe to test from a trusted zone label Dec 3, 2024
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Dec 3, 2024
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Dec 3, 2024
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Dec 6, 2024
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Dec 10, 2024
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Dec 10, 2024
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Dec 10, 2024
@katiepeters katiepeters marked this pull request as ready for review December 10, 2024 23:04
@dmhilly dmhilly requested review from a team and n0nick and removed request for dmhilly and a team December 11, 2024 16:24
@dmhilly
Copy link
Member

dmhilly commented Dec 11, 2024

Removed myself and added team-data since I unfortunately won't be able to get to this today

app/data_client.go Outdated Show resolved Hide resolved
Copy link
Member

@vijayvuyyuru vijayvuyyuru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work on this!

cli/app.go Outdated
{
Name: "tabular",
Usage: "download tabular data",
UsageText: createUsageText("data export tabular", []string{dataFlagDestination, "part-id", "component-name", "method"}, true),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] resource name here too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh!!!

filePath := utils.ResolveFile(dataFileName)
_, err = os.ReadFile(filePath)
test.That(t, err, test.ShouldNotBeNil)
test.That(t, err, test.ShouldBeError, fmt.Errorf("open %s: no such file or directory", filePath))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[super-duper nit] I think we only need the test.ShouldBeError, that also handles if the error is nil I believe.

},
&cli.StringFlag{
Name: "start",
Usage: "ISO-8601 timestamp indicating the start of the interval",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be RFC 3339? I see thats what we're using for the time layout. (I know both are hella similar)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't change what we were already using, but I suspect that we chose it to be a little more flexible with user input

cli/data.go Outdated
}
end = timestamppb.New(t)
}
if start != nil || end != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this be fine to always do as opposed to conditionally?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a huge deal either way, I believe. Removed it for you!

if err := os.MkdirAll(filepath.Join(dst, metadataDir), 0o700); err != nil {

// Periodically flush to keep buffer size down.
if *numWrites == uint64(10_000) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be greater than instead of an exact match.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not opposed, but curious to hear your reasoning

// tabularData downloads binary data matching filter to dst.
func (c *viamClient) tabularData(dst string, filter *datapb.Filter, limit uint) error {
// tabularData downloads unified tabular data and metadata for the requested data source and interval to the specified destination.
func (c *viamClient) tabularData(dest string, request *datapb.ExportTabularDataRequest) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just so make sure I'm understanding this section properly, you have a go func so that we can fetch while we're still writing previous results to a file right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean on line 699?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Copy link
Member Author

@katiepeters katiepeters Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then yes! So we can get a new result while writing the previous result is in progress

}
mdIndex++
}
dataRowChan := make(chan []byte)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are your thoughts on making this a buffered channel? The reason im asking is lets say disk IO is slow, we can fetch more requests while waiting on that. (I realize that disk io will probably never be the limiting factor, but curious what your thoughts are.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If disk IO is slow, we'll be constrained by that anyways. If we run into problems, perhaps we can explore doing something, but I'm of the mind to leave this as-is for now

@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Dec 13, 2024
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Dec 13, 2024
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Dec 13, 2024
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
safe to test This pull request is marked safe to test from a trusted zone
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants