Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About visualization of data distributions on different sources #3

Open
pILLOW-1 opened this issue Sep 28, 2024 · 2 comments
Open

About visualization of data distributions on different sources #3

pILLOW-1 opened this issue Sep 28, 2024 · 2 comments

Comments

@pILLOW-1
Copy link

pILLOW-1 commented Sep 28, 2024

Hi, great work!
I have two questions about visualization of data distributions on different sources in fig.1.
Q1: Is the generated data visualized here learned from the corresponding data source? For example, in the first row, the data from stable diffusion is learned on LVIS train, right?
Q2: Based on the idea that generative data can expand the data distribution that the model can learn, is it possible for a generative model trained solely on one domain to generate data from another domain(e.g., model trained on training set can generate data similar to data in testing set)?
Looking forward to your answers!
Uploading visualization.jpg…

@pILLOW-1
Copy link
Author

Addition to Q1: In the second row, the data from DeepFloyd is learned on LVIS val?

@leaf1170124460
Copy link
Collaborator

leaf1170124460 commented Sep 30, 2024

Hi, @pILLOW-1.

Thank you for your interest in our work!

Regarding Q1: There is no "learned from the corresponding data source" relationship between the two data sources in each row. Each subplot represents the visualization of embeddings after dimension reduction of data from the respective source. For example, the LVIS train subplot shows the embeddings of all instances in LVIS train after dimension reduction, while the Stable Diffusion subplot shows the embeddings of the data generated by Stable Diffusion after dimension reduction.

Regarding Q2: In DiverGen, we did not retrain or fine-tune the pre-trained generative models. We only used the open-source pre-trained weights for data generation. However, we think your suggestion is intriguing, and we may explore it further if time allows.

Hope this clears up your confusion, and feel free to reach out if you have any more questions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants