Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW]: scene_synthesizer: A Python Library for Procedural Scene Generation in Robot Manipulation #7561

Open
editorialbot opened this issue Dec 3, 2024 · 28 comments
Assignees
Labels
Python review Shell TeX Track: 5 (DSAIS) Data Science, Artificial Intelligence, and Machine Learning

Comments

@editorialbot
Copy link
Collaborator

editorialbot commented Dec 3, 2024

Submitting author: @clemense (Clemens Eppner)
Repository: https://github.com/NVlabs/scene_synthesizer
Branch with paper.md (empty if default branch):
Version: 1.11.4
Editor: @crvernon
Reviewers: @AlexanderFabisch, @Mechazo11
Archive: Pending

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/d32b27b7e95469107bf1bbe3bdb2683a"><img src="https://joss.theoj.org/papers/d32b27b7e95469107bf1bbe3bdb2683a/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/d32b27b7e95469107bf1bbe3bdb2683a/status.svg)](https://joss.theoj.org/papers/d32b27b7e95469107bf1bbe3bdb2683a)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@AlexanderFabisch & @Mechazo11, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review.
First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @crvernon know.

Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest

Checklists

📝 Checklist for @AlexanderFabisch

📝 Checklist for @Mechazo11

@editorialbot
Copy link
Collaborator Author

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf

@editorialbot
Copy link
Collaborator Author

Software report:

github.com/AlDanial/cloc v 1.90  T=0.30 s (272.2 files/s, 299333.6 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
XML                             15              0              0          61681
Python                          37           4159           4639          18698
TeX                              2             25              2            249
reStructuredText                17            327            672            248
CSS                              1             42             17            148
Markdown                         4             30              0            139
Bourne Shell                     1              5              3             62
YAML                             2              3              0             29
TOML                             1              5             12             22
HTML                             1              1              0             13
make                             1              3              0             11
Dockerfile                       1              6             17              3
-------------------------------------------------------------------------------
SUM:                            83           4606           5362          81303
-------------------------------------------------------------------------------

Commit count by author:

    18	Clemens Eppner

@editorialbot
Copy link
Collaborator Author

Paper file info:

📄 Wordcount for paper.md is 533

✅ The paper includes a Statement of need section

@editorialbot
Copy link
Collaborator Author

License info:

✅ License found: Apache License 2.0 (Valid open source OSI approved license)

@editorialbot
Copy link
Collaborator Author

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

✅ OK DOIs

- 10.1109/icra40945.2020.9196910 is OK
- 10.21105/joss.04901 is OK
- 10.1109/cvpr.2019.00100 is OK
- 10.1109/iccv.2019.00943 is OK
- 10.1109/cvpr46437.2021.00447 is OK
- 10.1109/cvpr42600.2020.01111 is OK
- 10.1109/cvpr52688.2022.00373 is OK
- 10.1109/cvpr52729.2023.01215 is OK
- 10.1109/cvpr52733.2024.00593 is OK
- 10.1109/iccv51070.2023.00727 is OK
- 10.1109/icra48891.2023.10161528 is OK
- 10.1109/iros.2018.8594495 is OK
- 10.1109/cvpr52733.2024.01539 is OK

🟡 SKIP DOIs

- No DOI given, and none found for title: A Procedural World Generation Framework for System...
- No DOI given, and none found for title: Habitat 2.0: Training Home Assistants to Rearrange...
- No DOI given, and none found for title: ProcTHOR: Large-Scale Embodied AI Using Procedural...
- No DOI given, and none found for title: RoboCasa: Large-Scale Simulation of Everyday Tasks...
- No DOI given, and none found for title: DiffuScene: Scene Graph Denoising Diffusion Probab...
- No DOI given, and none found for title: RoboPoint: A Vision-Language Model for Spatial Aff...
- No DOI given, and none found for title: Motion Policy Networks
- No DOI given, and none found for title: Imitating Task and Motion Planning with Visuomotor...
- No DOI given, and none found for title: Scenic: A Language for Scenario Specification and ...
- No DOI given, and none found for title: M2T2: Multi-Task Masked Transformer for Object-cen...
- No DOI given, and none found for title: Fidgit: An ungodly union of GitHub and Figshare
- No DOI given, and none found for title: trimesh
- No DOI given, and none found for title: Techniques for training machine learning models us...

❌ MISSING DOIs

- None

❌ INVALID DOIs

- None

@crvernon
Copy link

crvernon commented Dec 3, 2024

👋 @clemense, @AlexanderFabisch, and @Mechazo11 - This is the review thread for the paper. All of our communications will happen here from now on.

Please read the "Reviewer instructions & questions" in the first comment above.

Both reviewers have checklists at the top of this thread (in that first comment) with the JOSS requirements. As you go over the submission, please check any items that you feel have been satisfied. There are also links to the JOSS reviewer guidelines.

The JOSS review is different from most other journals. Our goal is to work with the authors to help them meet our criteria instead of merely passing judgment on the submission. As such, the reviewers are encouraged to submit issues and pull requests on the software repository. When doing so, please mention #7561 so that a link is created to this thread (and I can keep an eye on what is happening). Please also feel free to comment and ask questions on this thread. In my experience, it is better to post comments/questions/suggestions as you come across them instead of waiting until you've reviewed the entire package.

We aim for the review process to be completed within about 4-6 weeks but please make a start well ahead of this as JOSS reviews are by their nature iterative and any early feedback you may be able to provide to the author will be very helpful in meeting this schedule.

@editorialbot
Copy link
Collaborator Author

👉📄 Download article proof 📄 View article proof on GitHub 📄 👈

@AlexanderFabisch
Copy link

AlexanderFabisch commented Dec 3, 2024

Review checklist for @AlexanderFabisch

Conflict of interest

  • I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

General checks

  • Repository: Is the source code for this software available at the https://github.com/NVlabs/scene_synthesizer?
  • License: Does the repository contain a plain-text LICENSE or COPYING file with the contents of an OSI approved software license?
  • Contribution and authorship: Has the submitting author (@clemense) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
  • Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines
  • Data sharing: If the paper contains original data, data are accessible to the reviewers. If the paper contains no original data, please check this item.
  • Reproducibility: If the paper contains original results, results are entirely reproducible by reviewers. If the paper contains no original results, please check this item.
  • Human and animal research: If the paper contains original data research on humans subjects or animals, does it comply with JOSS's human participants research policy and/or animal research policy? If the paper contains no such data, please check this item.

Functionality

  • Installation: Does installation proceed as outlined in the documentation?
  • Functionality: Have the functional claims of the software been confirmed?
  • Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

Software paper

  • Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
  • A statement of need: Does the paper have a section titled 'Statement of need' that clearly states what problems the software is designed to solve, who the target audience is, and its relation to other work?
  • State of the field: Do the authors describe how this software compares to other commonly-used packages?
  • Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
  • References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

Comments on the paper:

  • The difference between this procedural scene generator and others seems to be that it works for physical simulations. It is not clear, however, what makes this use case specifically hard. I'm honestly curious: what are the challenges that this software overcomes?
  • In addition to this, the summary of features and functionality could be a bit more detailed. The documentation is really awesome and I think the paper could summarize it a bit on a conceptual level.
  • Comparison to similar software packages: I wonder how we would quantify progress in procedural scene generation for physics engines. This seems to be one of the first projects of this kind. What if I wanted to write my own software. How could I claim that it works better? Would we measure time to generate scenes (computational cost)? Is there a way to quantify the quality of the result?
  • Example use cases: Again, I am curious how the software was used in these use cases. Could you provide a little bit more context so that I don't have to read all the papers?
  • Extensions: It might be interesting for the reader to see how the library could be extended and customized for non-kitchen use cases. This could either be part of the paper or described in the documentation.
  • References: some conferences (CVPR, NeurIPS) are abbreviated. I'd recommend to write the full name because not everybody might know these conferences.

@AlexanderFabisch
Copy link

I am done with my review. I think the tool is a nice contribution and it's easy to use. I opened several issues in the repository that have all been quickly addressed by @clemense . I have some comments on the paper that you find below my checklist.

@clemense
Copy link

clemense commented Dec 9, 2024

@crvernon What's the process regarding "Comments on the paper", should i just answer them in this thread here?

@crvernon
Copy link

crvernon commented Dec 9, 2024

@clemense - you can either incorporate the changes directly in the paper if you agree with the suggestions (or if not) discuss why you didn't feel the suggested changes were appropriate here in this thread. Thanks!

@Mechazo11
Copy link

Mechazo11 commented Dec 14, 2024

Review checklist for @Mechazo11

Conflict of interest

  • I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

General checks

  • Repository: Is the source code for this software available at the https://github.com/NVlabs/scene_synthesizer?
  • License: Does the repository contain a plain-text LICENSE or COPYING file with the contents of an OSI approved software license?
  • Contribution and authorship: Has the submitting author (@clemense) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
  • Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines
  • Data sharing: If the paper contains original data, data are accessible to the reviewers. If the paper contains no original data, please check this item.
  • Reproducibility: If the paper contains original results, results are entirely reproducible by reviewers. If the paper contains no original results, please check this item.
  • Human and animal research: If the paper contains original data research on humans subjects or animals, does it comply with JOSS's human participants research policy and/or animal research policy? If the paper contains no such data, please check this item.

Functionality

  • Installation: Does installation proceed as outlined in the documentation?
  • Functionality: Have the functional claims of the software been confirmed?
  • Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

  • A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
  • Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
  • Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
  • Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
  • Community guidelines: Are there clear guidelines for third parties wishing to 1. Contribute to the software 2. Report issues or problems with the software 3. Seek support

Software paper

  • Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
  • A statement of need: Does the paper have a section titled 'Statement of need' that clearly states what problems the software is designed to solve, who the target audience is, and its relation to other work?
  • State of the field: Do the authors describe how this software compares to other commonly-used packages?
  • Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
  • References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

@Mechazo11
Copy link

Hi @crvernon I am done with my review. I left one ticket in scene_synthesizer repo regarding adding a simpler example on the README.md file for quickly portraying the capabilities of the package.

@crvernon
Copy link

Thanks @Mechazo11! I see @AlexanderFabisch is still working through some topics as well.

@AlexanderFabisch
Copy link

Hi @crvernon, the first iteration of my review is completed. I am waiting for @clemense 's response and changes to complete the review.

@crvernon
Copy link

Thank you @AlexanderFabisch!

@clemense
Copy link

clemense commented Jan 7, 2025

@AlexanderFabisch Thank you for your comments! Here my answers:

The difference between this procedural scene generator and others seems to be that it works for physical simulations. It is not clear, however, what makes this use case specifically hard. I'm honestly curious: what are the challenges that this software overcomes?

That's a great question and my answer might not be satisfying: Academic robotics research is largely driven by PhD theses. Although people in general tend to agree that data is the (most) important ingredient in the deep learning era it is impossible to get a PhD by solely focussing on data generation. Instead data generation is a necessary nuisance, and the "intellectual" / scientifcally respected part is the neural network architecture, training process, and resulting application. Case in point: The preprint of this work was rejected by ArXiv (!) due to it not being "scholarly" enough.
scene_synthesizer enables the combination of existing datasets and its own procedural generators to create scene descriptions for robot learning in simulation. I don't know of any tool that is available that does this. As mentioned in the related work section, there are finite datasets or procedural systems that are tightly integrated into bigger frameworks. But IMO they lack the extendability and re-usability that scene_synthesizer offers.

In addition to this, the summary of features and functionality could be a bit more detailed. The documentation is really awesome and I think the paper could summarize it a bit on a conceptual level.

The paper has a "Features & Functionality" section with high-level descriptions. I'd rather not duplicate the more detailed documentation in the paper since code documentation is better kept in conjuction with code.

Comparison to similar software packages: I wonder how we would quantify progress in procedural scene generation for physics engines. This seems to be one of the first projects of this kind. What if I wanted to write my own software. How could I claim that it works better? Would we measure time to generate scenes (computational cost)? Is there a way to quantify the quality of the result?

This is a good question, but again very tough to answer. IMO it depends on the downstream task. If the ultimate task is to learn a cat detector vs a bi-pedal running policy, the metrics will be very different. I think that no single metric can capture all of these use cases. What I see a lot in the current wave of LLM-driven scene generators is e.g. using the CLIP score of rendered images as a metric - but it's obvious that this already has a number of shortcomings.

Example use cases: Again, I am curious how the software was used in these use cases. Could you provide a little bit more context so that I don't have to read all the papers?

The software was used the following ways:

  • “Imitating Task and Motion Planning with Visuomotor Transformers”: To generate scenes with tasks for putting objects inside a microwave and shelf. The tasks were then solved with a Task-and-Motion-Planning framework, and the solutions used as data for learning visuo-motor policies.
  • “Motion Policy Networks”: To generate cabinet scenes with robot start and goal configurations. These scenarios were used in a motion planner and the resulting collision-free trajectories (together with the scene geometry as a point cloud) was used to train a neural network that imitates the trajectories, thus learning to plan collision-free motions.
  • “CabiNet: Scaling Neural Collision Detection for Object Rearrangement with Procedural Scene Generation”: Cabinet and tabletop scenes are used to generate colliding and collision-free configurations. From these examples a network is trained to predict collisions. The network is then used in conjuction with a planner to solve pick-and-place problems.
  • “M2T2: Multi-Task Masked Transformer for Object-Centric Pick and Place”: Similar dataset to the previous work.
  • “Robo-Point: A Vision-Language Model for Spatial Affordance Prediction for Robotics”: Kitchen scenes with random objects on countertops and inside drawers and cabinets are used to render images and captioning them with spatial descriptions.

Extensions: It might be interesting for the reader to see how the library could be extended and customized for non-kitchen use cases. This could either be part of the paper or described in the documentation.

That's right, the only kitchen-specific thing in the library are the procedural scenes and some of the procedural assets. The reason there's more kitchen-specific stuff in the library is due to the fact that existing scene generators and existing scene datasets oftentimes avoid kitchens (and focus on other parts of an apartment, bedrooms etc.) due to the highly constraint nature of kitchen furniture and layouts. Currently, the paper doesn't limit itself to kitchen scenes - and the examples in the docs are also not kitchen-specific.

References: some conferences (CVPR, NeurIPS) are abbreviated. I'd recommend to write the full name because not everybody might know these conferences.

I changed the references and wrote the full name of these conferences.

@AlexanderFabisch
Copy link

That's a great question and my answer might not be satisfying: Academic robotics research is largely driven by PhD theses. Although people in general tend to agree that data is the (most) important ingredient in the deep learning era it is impossible to get a PhD by solely focussing on data generation. Instead data generation is a necessary nuisance, and the "intellectual" / scientifcally respected part is the neural network architecture, training process, and resulting application. Case in point: The preprint of this work was rejected by ArXiv (!) due to it not being "scholarly" enough.
scene_synthesizer enables the combination of existing datasets and its own procedural generators to create scene descriptions for robot learning in simulation. I don't know of any tool that is available that does this. As mentioned in the related work section, there are finite datasets or procedural systems that are tightly integrated into bigger frameworks. But IMO they lack the extendability and re-usability that scene_synthesizer offers.

Maybe there was a misunderstanding on my side: I thought there were similar software packages that are optimized for visual scenes, but not for physical simulations. I assumed it based on this part of the statement of need: "purely generative models still lack the ability to create scenes that can be used in physics simulator [..]. Other procedural pipelines either focus on learning visual model". My question is how that differs. What differentiates a scene generator for physical simulation from a scene generator for visual "simulation"?

The preprint of this work was rejected by ArXiv (!) due to it not being "scholarly" enough.

That's crazy. It's more an engineering task, but it is foundational work in today's robotics research. However, to give it a more scientific character, I believe you should focus a bit on how we can evaluate scene generators, so that other people have a way to quantify improvement. That's why I asked for this. Maybe it's too much to ask for in this paper though.

The software was used the following ways:

* “Imitating Task and Motion Planning with Visuomotor Transformers”: To generate scenes with tasks for putting objects inside a microwave and shelf. The tasks were then solved with a Task-and-Motion-Planning framework, and the solutions used as data for learning visuo-motor policies.

* “Motion Policy Networks”: To generate cabinet scenes with robot start and goal configurations. These scenarios were used in a motion planner and the resulting collision-free trajectories (together with the scene geometry as a point cloud) was used to train a neural network that imitates the trajectories, thus learning to plan collision-free motions.

* “CabiNet: Scaling Neural Collision Detection for Object Rearrangement with Procedural Scene Generation”: Cabinet and tabletop scenes are used to generate colliding and collision-free configurations. From these examples a network is trained to predict collisions. The network is then used in conjuction with a planner to solve pick-and-place problems.

* “M2T2: Multi-Task Masked Transformer for Object-Centric Pick and Place”: Similar dataset to the previous work.

* “Robo-Point: A Vision-Language Model for Spatial Affordance Prediction for Robotics”: Kitchen scenes with random objects on countertops and inside drawers and cabinets are used to render images and captioning them with spatial descriptions.

I'd suggest to add that to the paper.

That's right, the only kitchen-specific thing in the library are the procedural scenes and some of the procedural assets. The reason there's more kitchen-specific stuff in the library is due to the fact that existing scene generators and existing scene datasets oftentimes avoid kitchens (and focus on other parts of an apartment, bedrooms etc.) due to the highly constraint nature of kitchen furniture and layouts. Currently, the paper doesn't limit itself to kitchen scenes - and the examples in the docs are also not kitchen-specific.

... and this as well.

@clemense
Copy link

clemense commented Jan 8, 2025

Maybe there was a misunderstanding on my side: I thought there were similar software packages that are optimized for visual scenes, but not for physical simulations. I assumed it based on this part of the statement of need: "purely generative models still lack the ability to create scenes that can be used in physics simulator [..]. Other procedural pipelines either focus on learning visual model". My question is how that differs. What differentiates a scene generator for physical simulation from a scene generator for visual "simulation"?

Ah, got it. The physical simulation needs things like collision geometry, mass information, center of mass, friction, restitution, articulation information (joints and their properties, damping, limits, maximum efforts, velocities etc).

That's crazy. It's more an engineering task, but it is foundational work in today's robotics research. However, to give it a more scientific character, I believe you should focus a bit on how we can evaluate scene generators, so that other people have a way to quantify improvement. That's why I asked for this. Maybe it's too much to ask for in this paper though.

I'm a bit torn on this. Again, it's complicated since scene generation has different objectives, depending on the downstream application/task. The software itself doesn't provide any metrics or support to evaluate scene generators. This is still (and IMO will be for a long time) a fuzzy research area. Also, JOSS explicitly states that "Your paper must not focus on new research results accomplished with the software." and that the software "supports the functioning of research instruments or the execution of research experiments". Keeping the distinction between research software and research results is - IMO - best served by not bloating the paper with random musings about potential metrics and rankings.

I'd suggest to add that to the paper.

Done!

I'd suggest to add that to the paper.

Done!

@AlexanderFabisch
Copy link

@editorialbot generate pdf

@editorialbot
Copy link
Collaborator Author

👉📄 Download article proof 📄 View article proof on GitHub 📄 👈

@AlexanderFabisch
Copy link

Ah, got it. The physical simulation needs things like collision geometry, mass information, center of mass, friction, restitution, articulation information (joints and their properties, damping, limits, maximum efforts, velocities etc).

Could you add that to the paper as well? I think then my review is finished!

@clemense
Copy link

clemense commented Jan 8, 2025

Ok, I just added this. Thanks!

@AlexanderFabisch
Copy link

@editorialbot generate pdf

@editorialbot
Copy link
Collaborator Author

👉📄 Download article proof 📄 View article proof on GitHub 📄 👈

@AlexanderFabisch
Copy link

The paper looks good. Thanks @clemense

@crvernon My review is done.

@clemense
Copy link

clemense commented Jan 9, 2025

I also added the example asked for by @Mechazo11 in the README.md and closed the associated ticket. Thanks!

@crvernon I don't see any open requests from the reviewers. Let me know if I need to do anything else. Thanks!

@clemense
Copy link

@crvernon Let me know if I can/need to do anything else to push this over the finish line. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Python review Shell TeX Track: 5 (DSAIS) Data Science, Artificial Intelligence, and Machine Learning
Projects
None yet
Development

No branches or pull requests

5 participants