-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
StyleNet: Non Reproducible Results #9
Comments
Thank you for telling us. I just decide to start my work based on StyleNet and try to reproduce it. You help me save my time. |
Thanks for you remind. But could you explain why you think the dataset is not available? I took a look at the dataset and it seems no problem, maybe you think it's difficult to distinguish the romantic and humorous? |
The dataset should contain 10k images according to the paper. However, in reality only 7k images are available. We confronted the author regarding this and he did not give any specific reason as to why the 3k is missing. Moreover, there are 3 captions per image for the neutral captions whereas there is only 1 caption per image for humorous. This makes the training impossible. This is why I have categorically stated that this paper is just a work of fiction. |
I got it, thank you. |
you are welcome. I have spent close to 6 months trying to reproduce this paper. After asking a couple of confronting questions, the authors stopped responding. I would suggest not to waste your time on this or any similar paper written by the the first-author of this paper. |
First of all, I want to point that this repo is not official repo. Actually, there is so much work following this paper, which focuses on limited stylized pair data by unpair training. |
Two points, |
Good to know that you are trying to reproduce, |
Wait..@Doragd So you think the FlickrStyle10K(in fact, 7K) dataset is feasible for stylish image captioning, but the result in Stylenet is exaggerated? |
@njucckevin First of all, 7k data is somewhat feasible to train a model for stylized image captioning, but in my opinion, StyleNet which only depends on four stylized parameter matrixs cannot learn to express style, especially its strange training method. My result is a rather rough result, and I will refine it soon. You can feel free to contact me to obtain my refine version. |
I got it. Thanks~ |
@Doragd Hi, I'm also trying to reproduce the StyleNet model while after reading this issue I'm wondering if it's worthy to spend time on it. May I see your code and results about your version ? Thanks. |
please contact me with e-mails in the next week |
I would like to categorically state that this Paper "StyleNet: Generating Attractive Visual Captions with Styles" from Microsoft is non-reproducible. This is not just from the code based on this repo, but our own extensive experiments have lead us to believe that this paper is just a work of fiction put together. We have also contacted the lead authors Chuang Gan & Zhe Gan. However, we did not get any reasonable explanation about why this architecture does not work. It is unfortunate to see that this paper also have significant citations. At this point, how this was accepted at CVPR remains a big question.
Also the new dataset as mentioned in the paper, is not available as a whole. Only a part of this dataset is available, which makes this task even more questionable.
Overall, I would request readers stumbling across this not to waste their time reproducing this paper!!
The text was updated successfully, but these errors were encountered: