Skip to content

Conversation

@aldenh-viam
Copy link
Contributor

@aldenh-viam aldenh-viam commented Oct 28, 2025

shellRPCFileReadCopier / shellFileCopyWriter in services/shell/copy_rpc.go uses a gRPC stream to upload a large file in chunks directly into the specified filename. If there is a network blip midway through the transfer, the partial file is left on the destination. If this happens during a viam module reload, RDK's reconfigure will repeatedly try loading the corrupted modulename.tar.gz and failing.

This change makes shellRPCFileReadCopier write files to a temporary filename.download and rename it to filename only if the stream completes without error.

Note: this Copy method also supports directories, but if my understanding is correct, it only creates empty directories for future files, so it doesn't need the same write-to-temp-and-rename logic.

cc @michaellee1019 @cheukt: team discussed offline and we believe we don't want to skip failed packages and proceed with regular startup to avoid the possibility of the robot getting stuck in a bad config state.

@aldenh-viam aldenh-viam requested a review from a team October 28, 2025 20:27
@viambot viambot added the safe to test This pull request is marked safe to test from a trusted zone label Oct 28, 2025
@michaellee1019
Copy link
Member

LGTM @aldenh-viam. I like the idea of ensuring the file copy is completed before naming the file the expected name. That will fix the state we are seeing.

Copy link
Member

@cheukt cheukt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a test for this?

} else {
// Since file may be streamed (see copyFile in copy_rpc.go), write to a temp file (filename.download) and rename after the download
// has completed. This way a temporary network blip won't leave a corrupted partial file in the expected location.
// Technically the temp file can be deleted upon any Copy error, but it will also be clobbered next time we retry.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's do a best effort clean up here, would suck if the partially downloaded module is very large and we just leave it hanging around

@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Oct 30, 2025
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Oct 30, 2025
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Nov 3, 2025
@aldenh-viam aldenh-viam requested a review from cheukt November 3, 2025 19:29
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Nov 3, 2025
Copy link
Member

@cheukt cheukt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thanks for adding the test!

@aldenh-viam aldenh-viam merged commit 05f8f94 into viamrobotics:main Nov 4, 2025
18 checks passed
@aldenh-viam aldenh-viam deleted the push-trlktntwqurw branch November 4, 2025 14:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

safe to test This pull request is marked safe to test from a trusted zone

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants