Dataset used in ACM MM'17 paper "Learning Fashion Compatibility with Bidirectional LSTMs" [paper] [code]
This dataset is also available on Google Drive. Original Images can be downloaded on Kaggle.
A clean version of this dataset can be found: Cleaned Maryland
You may be interested in a new dataset A100 dataset, which measures the aesthetic ability of an AI model.
Author: Xintong Han
Contact: [email protected]
Polyvore.com is a popular fashion website, where users can create and upload outfit data. This website has been acquired by SSense, and the URLs in the dataset are no longer available.
Download and decompress polyvore.tar.gz.
This dataset contains 21,889 outfits from polyvore.com, of which 17,316 are for training, 1,497 for validation, and 3,076 for testing. The train, validation, and test outfits are in {train,valid,test}_no_dup.json, respectively.
Each JSON item has the following information:
{
"name": Name of the outfit,
"views": Number of views of the outfit,
"items": [
Fashion items in the outfit.
{
"index": Index of the fashion item in this outfit on Polyvore,
"name": Description of the fashion item,
"price": Price of the fashion item (usually in US dollars),
"likes": Number of likes of the item,
"image": Image url of the item,
"categoryid": Category ID of the item,
},
{
...
},
...
],
"image": Outfit image url,
"likes": Number of likes of the outfit,
"date": Upload date of the outfit,
"set_url": Outfit url,
"set_id": Outfit ID,
"desc": Outfit description.
}
The image URLs are no longer available and they can be accessed on an unofficial Kaggle page. This file contains the images of 33,375 outfits, which include all 21,889 outfits in polyvore dataset. The other ~11k outfits were uploaded more than 3 years ago. We are afraid that they are out-of-fashion so we do not use them).
category_id.txt contains the mapping between category ID and category name. Thanks Zhenyu for providing it!
fill_in_the_blank_test.json contains the questions used to evaluate the fill-in-the-blank fashion recommendation task. It follows the following format:
{
"question": Fashion item sequence to form the question,
"answers": Multiple choice set to choose from,
"blank_position": The blank position to be filled in.
},
The name of a fashion item is SetID_ItemIndex, e.g., 119704139_1 is the fashion item with "index" 1 in the outfit with "set_id" 119704139. The first answer in "answers" is the correct one (i.e., the original fashion item in the outfit).
fashion-compatibility-prediction.txt contains ~7,000 outfits, where 4,000 are incompatible and 3,000 are compatible.
In each line, the first number indicates compatibility (1 is compatible, 0 is not) followed by a sequence of fashion items consisting of the outfit.
-
These outfits are crawled around 02/19/2017, so you can estimate the exact upload date of an outfit by looking at the "date" field.
-
For outfits that contain too many fashion items, we only keep their first 8 items.
-
We delete the fashion items with non-fashion "categoryid" such as background, texts, and decorations. As a result, the indices of items in an outfit may not be consecutive.
If this dataset helps your research, please cite our paper:
@inproceedings{han2017learning,
author = {Han, Xintong and Wu, Zuxuan and Jiang, Yu-Gang and Davis, Larry S},
title = {Learning Fashion Compatibility with Bidirectional LSTMs},
booktitle = {ACM Multimedia},
year = {2017},
}
There are several datasets crawled from Polyvore.com:
-
Mining Fashion Outfit Composition Using An End-to-End Deep Learning Approach on Set Data. [paper] [dataset]
-
NeuroStylist: Neural Compatibility Modeling for Clothing Matching. [paper] [dataset]
-
Learning Type-Aware Embeddings for Fashion Compatibility. [paper][dataset][cleaned dataset]
-
How Good Is Aesthetic Ability of a Fashion Model? [paper][dataset]