This project is about classifying the category of item names on e-commerce platform.
● result_online.csv: This file includes processed item id, item names and item_category concatenated with item_category2.
- Balance item categories by dowsampling or upsampling each category.
- Word segmentation using jieba.
- Extract the features of item names with TFIDF vectorizor.
- Dimenionality reduction. Only using the 100 best features per category.
- Train test split (0.8 for training, 0.2 for testing).
- Train XGBClassifier.
- Evaluate the model with offline data.