Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wb_supervisor_field_get_sf_node can cause OSError #4535

Closed
Justin-Fisher opened this issue May 13, 2022 · 6 comments
Closed

wb_supervisor_field_get_sf_node can cause OSError #4535

Justin-Fisher opened this issue May 13, 2022 · 6 comments
Labels
bug Something isn't working
Milestone

Comments

@Justin-Fisher
Copy link
Contributor

Describe the Bug
Calling wb_supervisor_field_get_sf_node (hereafter abbreviated as get_sf_node) on some SFNode fields causes "OSError: exception: access violation reading 0xFFFFFFFFFFFFFFFF" when called via Python ctypes. get_sf_node usually works fine, including on SFNode fields within the internal structure of protos. However, I have experienced this problem where there are nested protos, e.g. an UnevenTerrain proto whose exposed appearance is a Sand proto.

Steps to Reproduce
Create an UnevenTerrain proto with a Sand proto in its exposed appearance field, and give the Sand proto's textureTransform a scale like (3, 3). Now use wb_superviser_... functions to traverse down inside the UnevenTerrain proto's hidden structure to find its shadow copy of that textureTransform.

Put briefly: if ut is the UnevenTerrain proto, the SFNode field that causes the error is ut.children[0].children[0].appearance.textureTransform.

Put less briefly: start at the UnevenTerrain, use get_proto_field to get its hidden children field, use get_mf_node to get the 0th child in that field (a Transform node), use get_field to get that node's own children field, use get_mf_node again to get its 0th child (a Shape node), use get_field again to get its appearance field, use get_sf_node to get its content (a shadowy copy of the Sand proto), use get_field to get its textureTransform field, and call get_sf_node on that textureTransform field.

Actual behavior
The python supervisor will crash, saying "OSError: exception: access violation reading 0xFFFFFFFFFFFFFFFF".

Expected behavior
This should return a wbNodeRef for the hidden TextureTransform subnode, just as calling get_sf_node on other fields within the hidden parts of a proto returns wbNodeRefs for other proto subnodes. You can see that Webots knows that this TextureTransform is there, and that it has the given scale (3, 3), by using export from any intervening node in the above path.

If, for some reason, Webots really can't return an actual node for nested protos like this, then it should return a null wbNodeRef rather than crashing.

Discussion
The address mentioned in the error, 0xFFFFFFFFFFFFFFFF, is an alternative representation of -1, which itself is the value that wb_supervisor_node_get_id returns for all non-exposed proto subnodes. One hypothesis is that, for some reason, get_sf_node is blindly using such an id when it shouldn't be. I'm not sure why it does that in this case, but doesn't do it in other cases, but my guess is that it has something to do with the nested protos. I have tested similarly automatedly traversing the internal structure of all protos in a fairly complex world, and this is the only instance that has caused errors. I think this is also the only instance in that world where a proto internally re-uses another proto that was given in one of its exposed fields, so my guess is that has something to do with causing the problem.

@ad-daniel ad-daniel self-assigned this May 16, 2022
@ad-daniel
Copy link
Contributor

Oddly enough, if navigating through the tree directly (i.e. following the sequence as expressed through the PROTO headers (since everything is exposed), it actually works correctly.

  WbNodeRef uneven_terrain = wb_supervisor_node_get_from_def("UT");
  WbFieldRef appearance = wb_supervisor_node_get_field(uneven_terrain, "appearance");
  WbNodeRef sand = wb_supervisor_field_get_sf_node(appearance);
  WbFieldRef texture_transform = wb_supervisor_node_get_field(sand, "textureTransform");
  WbNodeRef texture_transform_node = wb_supervisor_field_get_sf_node(texture_transform);
  WbFieldRef scale = wb_supervisor_node_get_field(texture_transform_node, "scale");
  const double *values = wb_supervisor_field_get_sf_vec2f(scale);
  printf("scale values: %f %f\n", values[0], values[1]);

If navigating within the PROTO body as you mentioned:

  WbNodeRef uneven_terrain = wb_supervisor_node_get_from_def("UT");
  WbNodeRef transform = wb_supervisor_node_get_from_proto_def(uneven_terrain, "UNEVEN_TERRAIN_ELEVATION_GRID");
  WbFieldRef children = wb_supervisor_node_get_field(transform, "children");
  WbNodeRef shape = wb_supervisor_field_get_mf_node(children, 0);
  WbFieldRef appearance = wb_supervisor_node_get_field(shape, "appearance");
  WbNodeRef sand = wb_supervisor_field_get_sf_node(appearance);
  WbFieldRef texture_transform = wb_supervisor_node_get_field(sand, "textureTransform");
  WbNodeRef texture_transform_node = wb_supervisor_field_get_sf_node(texture_transform);

the last line does indeed fail and it triggers this assert, which doesn't appear to have anything to do with anything, since we are not dealing with devices

@ad-daniel ad-daniel added the bug Something isn't working label May 19, 2022
@ad-daniel ad-daniel added this to the R2022a-rev1 milestone May 19, 2022
@ad-daniel
Copy link
Contributor

Ok, turns out it's complicated and messy, but whether it's wrong or how it could be improved I'm uncertain.
Basically what happens is that we have a structure like this:

#VRML_SIM R2022a utf8
PROTO test [
]
{
  DEF UT UnevenTerrain {
    appearance Sand {
      textureTransform TextureTransform {
        scale 3 3
      }
    }
  }
}

or alternatively like this:

#VRML_SIM R2022a utf8
PROTO test [
  field SFNode  appearance  Sand { textureTransform TextureTransform { scale 3 3 } }
]
{
  DEF UT UnevenTerrain {
    appearance IS appearance
  }
}

Fundamentally it doesn't matter, they are equivalent but the first shows better that we're dealing with fields internal to a PROTO.

The problem stems from the second to last request, not actually the last one, in fact we ask the supervisor for:

WbFieldRef texture_transform = wb_supervisor_node_get_field(sand, "textureTransform"); <---
WbNodeRef texture_transform_node = wb_supervisor_field_get_sf_node(texture_transform);

But since we're dealing with internal fields, it should've been requested using the wb_supervisor_node_get_proto_field API function, not wb_supervisor_node_get_field one. Using the other function is not technically wrong either, since due to the nestedness, there exists both internal and external textureTransform fields.

As to why it crashes, is because depending on which function we use, the parent node Sand gets flagged as either internal or external, and a discrepancy emerges when we request the the WbNodeRef associated with this field. The reason is that with respect to the Sand PROTO, textureTransform is the 0th exported field (parameter). Whereas, at the same time, Sand is internally speaking nothing more than a PBRAppearance, and it's 0th field is baseColor (textureTransform is the 15th). Which brings us to the crash. The libcontroller requests a SF_NODE associated with the field textureTransform but due to which function has been used to define texture_transform (and specifically on whether it gets flagged as internal or external), what Webots sends back is a SF_VEC3 (i.e. the 0th field of a PBR, namely baseColor).

So basically it's a mess, but although messy I'm not entirely certain it's actually broken. Very obscure and easily prone to mistakes absolutely, but I'm not certain it's fundamentally wrong based on what the controller requests relative to the node tree in question.

@Justin-Fisher
Copy link
Contributor Author

Justin-Fisher commented May 21, 2022

I think that since it crashes, that means it's actually broken. It may be that the decision will be that one shouldn't be able to fetch values from such fields, but if so, it should print a warning and return null rather than crashing.

Anyway, I've also run into similar bugs where protos are nested inside a hidden MF field rather than a hidden SF field. E.g. a standard RectangleArena proto has a hidden children field. I can successfully get a wbNodeRef for its 0th Node (which apparently is supposed to be a Floor proto), and can successfully get a wbFieldRef for that node's exposed name field, but when I attempt to use get_sf_string on that name field, I get the same OSError again, even though I'm quite sure that the wbFieldRef that I'm passing seems valid (e.g., it correctly reports that its type is SFString). I guess this is probably another case where it is somehow confusing internal and external fields of nested protos?

@omichel omichel modified the milestones: R2022a-rev1, R2022b Jun 13, 2022
@ad-daniel ad-daniel modified the milestones: R2022b, R2022b-rev1 Jun 22, 2022
@ad-daniel ad-daniel modified the milestones: R2022b-rev1, R2023a, R2023a-rev1 Nov 10, 2022
@brettle
Copy link
Contributor

brettle commented Sep 13, 2024

@CoolSpy3, since you've recently been knee-deep in the proto code, do you have a sense for whether your PR #6613 might have fixed this, or if not, do you have any ideas on how to best address it?

@CoolSpy3
Copy link
Contributor

CoolSpy3 commented Sep 13, 2024

@CoolSpy3, since you've recently been knee-deep in the proto code, do you have a sense for whether your PR #6613 might have fixed this, or if not, do you have any ideas on how to best address it?

I doubt that this would've been addressed by #6613. I tried to leave as much of the existing code intact as possible. However, based on the above comments, my guess is that this might've been fixed by #5774. If it wasn't I'd have to do some more detailed debugging to figure stuff out. My guess would be that is_proto_internal flag on WbFieldStructPrivate/WbNodeStructPrivate is getting incorrectly set or is not being properly tracked somewhere.

@CoolSpy3
Copy link
Contributor

I can reproduce this issue on R2022a, but not R2023b, so it's probably been resolved (maybe by the aforementioned PR). If this pops back up feel free to reopen the issue :D

Test files:
test_4535.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Development

No branches or pull requests

6 participants