Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StyleNet: Non Reproducible Results #9

Open
tusharkr opened this issue Nov 13, 2020 · 15 comments
Open

StyleNet: Non Reproducible Results #9

tusharkr opened this issue Nov 13, 2020 · 15 comments

Comments

@tusharkr
Copy link

tusharkr commented Nov 13, 2020

I would like to categorically state that this Paper "StyleNet: Generating Attractive Visual Captions with Styles" from Microsoft is non-reproducible. This is not just from the code based on this repo, but our own extensive experiments have lead us to believe that this paper is just a work of fiction put together. We have also contacted the lead authors Chuang Gan & Zhe Gan. However, we did not get any reasonable explanation about why this architecture does not work. It is unfortunate to see that this paper also have significant citations. At this point, how this was accepted at CVPR remains a big question.
Also the new dataset as mentioned in the paper, is not available as a whole. Only a part of this dataset is available, which makes this task even more questionable.
Overall, I would request readers stumbling across this not to waste their time reproducing this paper!!

@WuRong1997
Copy link

Thank you for telling us. I just decide to start my work based on StyleNet and try to reproduce it. You help me save my time.

@njucckevin
Copy link

Thanks for you remind. But could you explain why you think the dataset is not available? I took a look at the dataset and it seems no problem, maybe you think it's difficult to distinguish the romantic and humorous?

@tusharkr
Copy link
Author

The dataset should contain 10k images according to the paper. However, in reality only 7k images are available. We confronted the author regarding this and he did not give any specific reason as to why the 3k is missing. Moreover, there are 3 captions per image for the neutral captions whereas there is only 1 caption per image for humorous. This makes the training impossible. This is why I have categorically stated that this paper is just a work of fiction.

@njucckevin
Copy link

I got it, thank you.

@tusharkr
Copy link
Author

you are welcome. I have spent close to 6 months trying to reproduce this paper. After asking a couple of confronting questions, the authors stopped responding. I would suggest not to waste your time on this or any similar paper written by the the first-author of this paper.

@Doragd
Copy link

Doragd commented Mar 30, 2021

First of all, I want to point that this repo is not official repo. Actually, there is so much work following this paper, which focuses on limited stylized pair data by unpair training.

@tusharkr
Copy link
Author

Two points,
Firstly, It does not matter if this is the official repo or not, technically the paper is non-reproducible and the architecture simply does not work. Since this link is where most researchers stumble on (in fact I have a mail where the second author himself asked me to try this repo), it is good to tell them in advancethat not only this repo, but the paper itself is non-reproducible.
Secondly, just because there are others who are inspired from this design (or that there are other papers referring to this paper) does not necessarily guarantee the reproducibility of this paper.

@Doragd
Copy link

Doragd commented Mar 30, 2021

Thanks for your quick and kind reply. I am devoting myself to reproduce this paper, at least reproducing the performance on the 7k limited data that is now public. This result has been made by MSCap, CVPR19'.
image

@Doragd
Copy link

Doragd commented Mar 30, 2021

YOU ARE RIGHT! I also doubt the result in this paper. The following picture is my rough result with respect to the romantic style. By the way, I think I have fixed some bugs in this repo.
image

@tusharkr
Copy link
Author

Good to know that you are trying to reproduce,
However, from our side, we fixed all the bugs in this repo. We also wrote the code from scratch by reading the paper. At the end, we wasted 7 months trying all possible combinations. But we could not reproduce even a partial result. That is why I am stating that this paper is just fiction. It is an insult to the CVPR tradition. I still wonder how the authors we able to convince the CVPR reviewers.

@njucckevin
Copy link

Wait..@Doragd So you think the FlickrStyle10K(in fact, 7K) dataset is feasible for stylish image captioning, but the result in Stylenet is exaggerated?
And by the way, what's the result in you picture? I have read MSCap, but there is no similar result.

@Doragd
Copy link

Doragd commented Mar 30, 2021

@njucckevin First of all, 7k data is somewhat feasible to train a model for stylized image captioning, but in my opinion, StyleNet which only depends on four stylized parameter matrixs cannot learn to express style, especially its strange training method. My result is a rather rough result, and I will refine it soon. You can feel free to contact me to obtain my refine version.

@njucckevin
Copy link

I got it. Thanks~

@Cathyttt
Copy link

Cathyttt commented Apr 9, 2021

@Doragd Hi, I'm also trying to reproduce the StyleNet model while after reading this issue I'm wondering if it's worthy to spend time on it. May I see your code and results about your version ? Thanks.

@Doragd
Copy link

Doragd commented Apr 17, 2021

@Doragd Hi, I'm also trying to reproduce the StyleNet model while after reading this issue I'm wondering if it's worthy to spend time on it. May I see your code and results about your version ? Thanks.

please contact me with e-mails in the next week

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants