Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter default properties in json middlewares instead of filter fn #6

Open
wants to merge 20 commits into
base: syncback-merge
Choose a base branch
from

Conversation

kennethloeffler
Copy link

@kennethloeffler kennethloeffler commented Sep 11, 2024

This PR:

  • Removes default property filtering from filter_properties_preallocated
  • Lifts some code out of each json middleware into a new function, populate_unresolved_properties, which is now responsible for filtering default properties

I think this makes sense since only JSON middlewares care about omitting default properties from their file representations. However, this means that more instances may have to be hashed and compared. I'm not totally sure how this will impact performance - it'd be great to profile this on a large project!

These changes allow JSON middlewares to correctly remove properties when property values change to their defaults. Previously, properties at default values would be unilaterally filtered out during hashing, meaning that syncback would produce incorrect results when properties were set back to default values, like in rojo-rbx#937 (comment).

src/snapshot_middleware/mod.rs Outdated Show resolved Hide resolved
Comment on lines 422 to 426
log::warn!(
"Rojo cannot serialize the property {}.{name} in JSON files.\n\
If this is not acceptable, resave the Instance at '{}' manually as an RBXM or RBXMX.",
new_inst.class, snapshot.get_new_inst_path(new_inst.referent())
)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this warning also provide the path of the file where the instance is defined?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can actually include SharedString values in JSON now because of rojo-rbx/rbx-dom#414. I just forgot to change it.

We can probably just remove this warning altogether.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind if we tackle that in a future PR? Making that change effects several tests and I'd rather not clutter this PR with them

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we can handle that in a different PR. A bit out of scope for this one either way.

Copy link
Author

@kennethloeffler kennethloeffler Sep 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we allow rbx-dom's rbx_reflector to write defaults for SharedString properties (which is now possible after rojo-rbx/rbx-dom#414), then we won't have to alter any test data

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's a compelling reason to not have rbx-dom write SharedString defaults now. Seems like a good change.

src/snapshot_middleware/mod.rs Outdated Show resolved Hide resolved
@kennethloeffler
Copy link
Author

This also needs tests, I'll get to that shortly

@kennethloeffler
Copy link
Author

I've added a test that contains an input file with some instances that have all but one property set to their default values. In the output project, I've defined properties which should be removed from whatever JSON file describing the instance, since the input file has them at their defaults.

Copy link
Member

@Dekkonot Dekkonot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change does cause the project_reserialize test to reserialize default.project.json, which is counter to the whole point of the test. I realize the name isn't entirely clear though, so that's on me. The test is confirming that project files don't reserialize when they don't have to, so it shouldn't include the default.project.json. The file in the test's output probably needs to be changed to account for this PR.

Otherwise, I think this is a good change and it simplifies the middleware logic a fair bit and makes it more consistent. I like it.

Comment on lines 422 to 426
log::warn!(
"Rojo cannot serialize the property {}.{name} in JSON files.\n\
If this is not acceptable, resave the Instance at '{}' manually as an RBXM or RBXMX.",
new_inst.class, snapshot.get_new_inst_path(new_inst.referent())
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can actually include SharedString values in JSON now because of rojo-rbx/rbx-dom#414. I just forgot to change it.

We can probably just remove this warning altogether.

src/snapshot_middleware/mod.rs Outdated Show resolved Hide resolved
src/snapshot_middleware/mod.rs Outdated Show resolved Hide resolved
src/snapshot_middleware/mod.rs Outdated Show resolved Hide resolved
@Dekkonot
Copy link
Member

Dekkonot commented Sep 19, 2024

...For some reason GitHub includes my responses both in my review and on your comments above. Some of the above comments are confusing without the conversation chain. Thanks GitHub.

@kennethloeffler
Copy link
Author

I've refactored again, so that the new function is responsible for creating the property and attribute maps. This further simplifies the call sites, at the cost of having to create a new property BTreeMap for project middlewares rather than mutating, and making the new function slightly new complicated.

I still need to address project reserialization - project_node_should_reserialize is slightly wrong in both my version, and the previous one

@Dekkonot
Copy link
Member

Dekkonot commented Oct 8, 2024

Outside of the tests this looks fine to me!

@kennethloeffler
Copy link
Author

Okay, I think I've fixed project_node_should_reserialize to work in the context of this PR - it involves comparing the instance properties to the node properties, and reserializing if either the project node's value is equal to the default value, or if the instance's value is not equal to the default value, ignoring attributes. This should catch cases where node properties are at default values and thus should be removed from the node, and where instance properties are at non-defaults, but not defined in the project node, and thus should be added to the node.

Sorry for the delay on this!

src/snapshot_middleware/project.rs Outdated Show resolved Hide resolved
src/snapshot_middleware/project.rs Outdated Show resolved Hide resolved
@kennethloeffler
Copy link
Author

kennethloeffler commented Oct 15, 2024

Alright, I've slightly reworked the second pass over properties in project_node_should_reserialize, and added some comments explaining why the second pass is necessary + the edge cases around unknown classes/properties

Copy link
Member

@Dekkonot Dekkonot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. It's a more elegant way of handling this.

Any last qualms or concerns you can think of before I merge it?

@kennethloeffler
Copy link
Author

The only outstanding concern I personally have is about performance, but since I'm not familiar with this code or the cost of the extra hashes and comparisons, and don't have a very large project to profile, I'll leave it up to your judgment.

@Dekkonot
Copy link
Member

Dekkonot commented Oct 24, 2024

So, I ran this across Adopt Me to test whether it resulted in a significant slowdown. The good news is that it doesn't so the performance cost doesn't seem to exist.

The bad news is that this change seems to unintentionally triggering reserialization for every model.json in Adopt Me, every time. I'll investigate the underlying cause soon, but we can't merge it as-is. 3000 changed files every time we run syncback on Adopt Me is a no-go, haha.

@kennethloeffler
Copy link
Author

Unfortunate, but not entirely unexpected - thanks for taking the time!

@kennethloeffler
Copy link
Author

The bad news is that this change seems to unintentionally triggering reserialization for every model.json in Adopt Me, every time. I'll investigate the underlying cause soon, but we can't merge it as-is. 3000 changed files every time we run syncback on Adopt Me is a no-go, haha.

Having a hard time reproducing this - I tried with a Part defined in a JSON model. Just making sure, but this is happening for JSON models (not project files), and this new implementation is causing actual changes in the JSON models? I'm curious to know what the changes are, if these JSON models have any commonality, etc.

@Dekkonot
Copy link
Member

The implementation isn't causing changes in these models, it's causing them to reserialize. This means that the list of files that changed (and get written to the disk) is massive, despite no actual changes occurring. This confuses git (it marks these files as changed until add them to the index) but more importantly it means you can't actually audit the changes before confirming them.

As an example, here's syncback ran off of syncback-merge over Adopt Me using a local file:
image

And here's the same workspace using syncback from this branch:
image

Almost all of these files are model.json or meta.json files.

@Dekkonot
Copy link
Member

Looking at it, it seems like this is just a consequence of how this is implemented. The check whether something needs to have syncback ran on it is done in the main loop, but at that point there will be a mismatch between JSONs and the model as read from the file system, so it obviously triggers syncback on them. It doesn't happen for projects because there's an internal "should node reserialize" check for them, but it would if there wasn't.

This is new behavior because the hashing does filtering ahead of time using the same property filtering that JSON files used, but this PR changed property_filter::filter_properties_preallocated to not check default properties, so hashes now include the default value properties.

We can fix this by either restoring that behavior in hashing (a couple ways to do that but not very complicated either way) or by adding custom "should reserialize" behavior to the other JSON representations. I am inclined to just fix hashing, since it'll be less complicated and keep code in one place.

@kennethloeffler
Copy link
Author

kennethloeffler commented Nov 1, 2024

This is new behavior because the hashing does filtering ahead of time using the same property filtering that JSON files used, but this PR changed property_filter::filter_properties_preallocated to not check default properties, so hashes now include the default value properties.

We can fix this by either restoring that behavior in hashing (a couple ways to do that but not very complicated either way) or by adding custom "should reserialize" behavior to the other JSON representations. I am inclined to just fix hashing, since it'll be less complicated and keep code in one place.

The problem is that when default properties are unilaterally excluded from hashes (the previous behavior), then it is impossible for syncback to set properties to default values, if that's the only change for a given instance in the input file. So for example, if we have the following instance in a JSON model:

{
  "className": "Part",
  "properties": {
    "Size": [
      1.0,
      2.0,
      3.0
    ],
    "Anchored": true,
  }
}

and we have an input file where Part.Anchored for this instance is false (but every other property stays the same), then running syncback won't do anything, leaving Part.Anchored at true, because the new value (which happens to be at the default) gets filtered out during the hashing step. Is there another way of considering default properties during hashing you're thinking of?

@Dekkonot
Copy link
Member

Dekkonot commented Nov 2, 2024

The way I had in mind was that we could have a check to see if a value was equal to the default value, and if so just leave it out of the hash. This adds a bit more overhead to the hashing process but it was essentially overhead that was already present prior to this change so I'm fine with it.

@kennethloeffler
Copy link
Author

kennethloeffler commented Nov 2, 2024

Apologies if I'm misunderstanding, but isn't that what syncback hashing currently does, producing the undesirable behavior I described here which this PR is trying to fix? See here in property_filter::filter_properties_preallocated:

if let Some(class_data) = class_data {
let defaults = &class_data.default_properties;
for (name, value) in &inst.properties {
if predicate(name, value) {
continue;
}
if let Some(default) = defaults.get(name.as_str()) {
if !variant_eq(value, default) {
allocation.push((name, value));
}
} else {
allocation.push((name, value));
}
}

...which is later called by hash::hash_inst_filtered:
fn hash_inst_filtered<'inst>(
project: &Project,
inst: &'inst Instance,
prop_list: &mut Vec<(&'inst str, &'inst Variant)>,
) -> Hasher {
filter_properties_preallocated(project, inst, prop_list);
hash_inst_prefilled(inst, prop_list)
}

...so I'm not quite sure what to do.

Maybe one way we could help this is by implementing rojo-rbx/rbx-dom#385 using the "WeakDom populates defaults" strategy. That way, instances in the old tree (which are generated directly from the Rojo project, and therefore are currently missing any properties left unspecified) would have defaults populated, and will have a better chance of hashing identically to instances in the new tree.

@Dekkonot
Copy link
Member

Dekkonot commented Nov 11, 2024

Sorry for the delay in getting back to you on this!

Yeah, looking at it again that does seem to be the exact behavior we're trying to fix here. Hmm. I suppose that probably means we'll need to go with the lame version, where we have 'should reserialize' functions for the JSON middleware instead. Frustrating, but at least it gives us the opportunity to do it in one place instead of isolated per JSON middleware, right?

Populating the default properties of Instances would probably also help, but I can't go "this'll work if we change something upstream" because right now it just doesn't work.

@kennethloeffler
Copy link
Author

Yep, I've done some work in that direction already, so I'll get started on finalizing what I have. We can revisit the strategy here once rojo-rbx/rbx-dom#385 is further along, but I agree that we should just get it working first.

@kennethloeffler
Copy link
Author

kennethloeffler commented Nov 13, 2024

I've added some logic to decide whether a JSON model should reserialize.

Testing this is harder than it should be: syncback seems to think that instances described by JSON models included directly in a $path entry come from the project file, which means I have add nesting for the "changed class name" test case.

Additionally, syncback doesn't seem to play nice at all with JSON models with children, and wants to syncback JSON model children as if they were either described in the project file, or underneath a directory. So, I can't really test those code paths. Fixing this is out of scope for this PR, but we should do further testing and figure out what's going wrong.

Meta files should hopefully be much simpler, since they cannot describe children.

@kennethloeffler
Copy link
Author

Added some similar logic to meta files, and it is much simpler, hooray! I'm not sure if this needs another test, because the "all middleware" test seems to adequately exercise this code.

@Dekkonot
Copy link
Member

Dekkonot commented Nov 15, 2024

New logic looks good, and it works, though this branch still causes 887 files to attempt to reserialize for Adopt Me. If I let it continue, only 4 of them actually do change.

There are also some surprise JSON models in there, so I'll investigate those, but I suspect the rest of these are still just issues caused by hashing. We could fix them the same way but it does beg the question of why we even bother with the hashing if we're manually checking every Instance with default properties we want to filter anyway (this will be basically anything that's not an RBXM).

@kennethloeffler
Copy link
Author

What kinds of files are those 887 made up of? Are they still JSON models, or other kinds? Any more information from which we could create isolated test cases will help. I'm also curious what the 4 actual changes were - maybe it's the implementation removing JSON properties that were found to be at default values (which we'd expect)?

wrt the value of hashing: the more I think about it, the more this entire issue around default properties seems caused by rbx-dom not having enough parity with Roblox's implementation a la rojo-rbx/rbx-dom#385. I think it's highly likely that if rbx-dom starts populating default properties, then we'll start seeing the hashes match up when there are no logical changes, allowing us to once again skip all of this later work.

If it doesn't cause too much of a performance hit right now, then I don't see too much of a problem. Maybe there's another way to avoid the defect of sometimes dropping change-to-defaults, but I can't say I've thought of it yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants