Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RSDK-8598 - Replace cache after sync #4343

Merged
merged 22 commits into from
Sep 5, 2024

Conversation

cheukt
Copy link
Member

@cheukt cheukt commented Sep 3, 2024

going to pare down changes so that the first set of changes will be a strict improvement while we continue discussions for a more complete solution

@viambot viambot added the safe to test This pull request is marked safe to test from a trusted zone label Sep 3, 2024
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Sep 3, 2024
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Sep 3, 2024
@cheukt cheukt marked this pull request as ready for review September 3, 2024 21:21
@dgottlieb
Copy link
Member

Saw this failure showing up in tests, e.g: TestGizmo

2024-09-03T21:28:52.8014035Z panic: runtime error: invalid memory address or nil pointer dereference [recovered]
2024-09-03T21:28:52.8015289Z 	panic: runtime error: invalid memory address or nil pointer dereference
2024-09-03T21:28:52.8016400Z [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x19f644b]
2024-09-03T21:28:52.8017073Z 
2024-09-03T21:28:52.8017278Z goroutine 28 [running]:
2024-09-03T21:28:52.8017842Z testing.tRunner.func1.2({0x22a2a40, 0x35dcc90})
2024-09-03T21:28:52.8018771Z 	/usr/lib/go-1.21/src/testing/testing.go:1545 +0x3f7
2024-09-03T21:28:52.8019457Z testing.tRunner.func1()
2024-09-03T21:28:52.8020193Z 	/usr/lib/go-1.21/src/testing/testing.go:1548 +0x716
2024-09-03T21:28:52.8020898Z panic({0x22a2a40?, 0x35dcc90?})
2024-09-03T21:28:52.8021636Z 	/usr/lib/go-1.21/src/runtime/panic.go:920 +0x270
2024-09-03T21:28:52.8022806Z go.viam.com/rdk/config.(*Config).StoreToCache(0xc00020da00)
2024-09-03T21:28:52.8023658Z 	/__w/rdk/rdk/config/config.go:269 +0x16b
2024-09-03T21:28:52.8025070Z go.viam.com/rdk/robot/impl.(*localRobot).reconfigure(0xc00017a8c0, {0x2963538, 0x4b238e0}, 0xc00020da00, 0x0?)
2024-09-03T21:28:52.8026635Z 	/__w/rdk/rdk/robot/impl/local_robot.go:1201 +0x4b4
2024-09-03T21:28:52.8027587Z go.viam.com/rdk/robot/impl.(*localRobot).Reconfigure(...)
2024-09-03T21:28:52.8028863Z 	/__w/rdk/rdk/robot/impl/local_robot.go:1159
2024-09-03T21:28:52.8031505Z go.viam.com/rdk/robot/impl.newWithResources({0x2963538?, 0x4b238e0}, 0xc00020da00, 0x0, {0x2982e00?, 0xc000512580}, {0x0, 0x0, 0xc0004fbbd0?})
2024-09-03T21:28:52.8032917Z 	/__w/rdk/rdk/robot/impl/local_robot.go:562 +0x287b
2024-09-03T21:28:52.8033927Z go.viam.com/rdk/robot/impl.New(...)
2024-09-03T21:28:52.8035554Z 	/__w/rdk/rdk/robot/impl/local_robot.go:586
2024-09-03T21:28:52.8037578Z go.viam.com/rdk/examples/customresources/demos/remoteserver_test.TestGizmo(0xc00017dba0)
2024-09-03T21:28:52.8039241Z 	/__w/rdk/rdk/examples/customresources/demos/remoteserver/server_test.go:47 +0x356
2024-09-03T21:28:52.8040499Z testing.tRunner(0xc00017dba0, 0x25abe28)
2024-09-03T21:28:52.8042335Z 	/usr/lib/go-1.21/src/testing/testing.go:1595 +0x262

@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Sep 4, 2024
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Sep 4, 2024
@cheukt
Copy link
Member Author

cheukt commented Sep 4, 2024

@maximpertsov @dgottlieb tests are passing and properly ready for review now. Also tests manually with both local and cloud configs, in both online/offline startup cases

config/config.go Outdated
@@ -238,6 +244,30 @@ func (c Config) FindComponent(name string) *resource.Config {
return nil
}

// SetUnprocessedConfig sets unprocessedConfig with a copy of the config passed in.
func (c *Config) SetUnprocessedConfig(cfg *Config) error {
cpy, err := cfg.CopyOnlyPublicFields()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming json.MarshalIndent only writes public fields and therefore we're confident CopyOnlyPublicFields is the only thing stuff we need to pull out into a copy?

Having SetUnprocessedConfig doing the marshalling would be obviously equivalent to me. But happy to just get an OK and not worry about it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, it's the same.

definitely could marshal earlier as well, went ahead and did that

@@ -1174,13 +1174,6 @@ func (r *localRobot) applyLocalModuleVersions(cfg *config.Config) {
}

func (r *localRobot) reconfigure(ctx context.Context, newConfig *config.Config, forceSync bool) {
r.configRevisionMu.Lock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the consequence here that we now only report a newer config revision after* package downloading succeeds?

Also I don't see the configRevisionMu being locked. Is that important?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the missing lock is a miss, added it back in.

thinking about it more, the config revision should update at the top of the function so users will know the new config did get pulled and is being processed. In theory, it may make sense to rollback the revision if we exit reconfiguration if download fails, but I also think it's ok to keep the new revision as a sign that the new config did get loaded. whether we revert the revision or not, it will be unclear what state robot reconfiguration is in, so maybe we need to add more information in there

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know @maximpertsov and I talked about how to represent "mixed" state when a new revision is being applied/doesn't apply completely.

so maybe we need to add more information in there

I agree the only path to do better is to add more information. And I agree it's best to do that thinking/work in a separate ticket.

@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Sep 4, 2024
config/config.go Outdated
Comment on lines 74 to 76
// unprocessedConfig stores the unprocessed version of the config that will be cached.
// This version is kept because the config is changed as it moves through the system.
unprocessedConfig *Config
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] maybe we can shorten this field name - conceptually an "unprocessed" config is the marshaled config struct, right?

Suggested change
// unprocessedConfig stores the unprocessed version of the config that will be cached.
// This version is kept because the config is changed as it moves through the system.
unprocessedConfig *Config
// raw stores the unprocessed version of the config that will be cached.
// This version is kept because the config is changed as it moves through the system.
raw *Config

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed the name to toCache to be clearer, what do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep makes sense

@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Sep 4, 2024
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Sep 4, 2024
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Sep 4, 2024
@@ -238,6 +245,29 @@ func (c Config) FindComponent(name string) *resource.Config {
return nil
}

// SetToCache sets toCache with a marshalled copy of the config passed in.
func (c *Config) SetToCache(cfg *Config) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] this method is public to accommodate it's usage in config/watcher_test.go right? or are we planning to use it other packages as well? If the former, I suggest we make this method private and define a public version of it in config/export_test.go. This seems to be a common pattern that the golang standard library uses to export test-only functions that are used across different packages and _test modules.

(This is a general pattern we can use around our code-base to if we like it)

Copy link
Member

@dgottlieb dgottlieb Sep 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think being public here is fine -- even if for the sake of testing. Data tried playing games with keeping things private, but publishing selective things for testing and it became a huge mess.

I think if we want to be aggressive about making things private and paying a code complexity/readability cost elsewhere, we should start making tangible measurements for when a public method creates a problem .

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will leave as is for now

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think being public here is fine -- even if for the sake of testing. Data tried playing games with keeping things private, but publishing selective things for testing and it became a huge mess.

What made it a mess? Was it unclear what methods were actually public vs public for testing only? If so, I think we can clarify that with some simple sign-posts like requiring any test-only method to take a testing.TB interface or something like that. But in any case I don't feel too strongly about this. Happy to leave as is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What made it a mess? Was it unclear what methods were actually public vs public for testing only?

No, worse. The indirection resulted in a bunch of interfaces. And finding out which method was getting called (in testing and* in production) was impossible without knowing how the magic worked.

If I wanted to pick a single file as the starting point, I'd say it's this file.

What is a DMService? What is a ManagerConstructor? Following the ManagerConstructor definition, what is a datasync.Manager? Is the *datamanger.builtin.builtin type a datamanager.internal.DMService? Is it a DMService if testing is not enabled? Does that even matter?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

woah, that file adds a test-only global? that's a bit confusing...

Copy link
Member

@maximpertsov maximpertsov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is definitely an improvement! i have one small suggestion to minimize the config's public API, but otherwise looks good.

@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Sep 4, 2024
@cheukt cheukt merged commit c4e7142 into viamrobotics:main Sep 5, 2024
19 checks passed
@cheukt cheukt deleted the replace-cache-after-sync branch September 5, 2024 14:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
safe to test This pull request is marked safe to test from a trusted zone
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants