당신도 중고 거래왕이 될 수 있습니다!

Members

김아경	김현욱	김황대	박상류	정재현	최윤성

Github	Github	Github	Github	Github	Github

Project Overview

목표
1. 멀티모달 분류모델을 활용하여 입력된 상품 이미지와 제목으로 카테고리 분류
2. 생성/추출모델을 통해 상품 노출 빈도를 높일 수 있는 해시태그 생성
모델
1. EfficientNet-b0 와 BERT Classifier 모델을 이용한 카테고리 분류모델
2. Elastic Search 와 TF-IDF를 이용한 HashTag 추출모델
3. skt/kogpt-base-v2를 기반한 데이터 fine-tuned HashTag 생성모델
Data
- 번개장터 crawling 데이터 (분야 : 전자기기)
Contributors
- 김아경: 추출모델설계, Text 데이터 전처리
- 김현욱: 이미지 데이터 전처리, 분류모델 검증
- 김황대: 생성모델 설계, Streamlit 설계
- 박상류: 생성모델 설계, Text 데이터 전처리
- 정재현: 데이터 크롤링, Elastic Search 설계 및 구현
- 최윤성: Project Manager, 분류모델 설계

Getting Started

Install requirements

  # requirement 설치
  cd code
  pip install -r requirements.txt

Hardware

The following specs were used to create the original solution.

Ubuntu 18.04.5 LTS
Intel(R) Xeon(R) Gold 5120 CPU @ 2.20GHz
NVIDIA Tesla V100-SXM2-32GB

Code Structure

├── code/                   
│   ├── crawl
│   │   └── bunjang_crawl.py
│   │
│   ├── multimodal-clf
│   │   ├── configs
│   │   │   ├── data/secondhad-goods.yaml
│   │   │   └── model/mobilenetv3_kluebert.yaml
│   │   ├── src
│   │   │   ├── augmentation
│   │   │   │   ├── methods.py
│   │   │   │   ├── policies.py
│   │   │   │   └── transforms.py
│   │   │   ├── utils
│   │   │   │   ├── common.py
│   │   │   │   └── data.py
│   │   │   ├── dataloader.py
│   │   │   ├── model.py
│   │   │   └── traniner.py
│   │   └── train.py
│   │   
│   ├── prototype
│   │   ├── models/mmclf
│   │   │   ├── best.pt
│   │   │   ├── config.yaml
│   │   │   ├── mmclf.py
│   │   │   ├── special_tokens_map.json
│   │   │   ├── tokenizer_config.json
│   │   │   ├── tokenizer.json
│   │   │   └── vocab.txt
│   │   ├── app.py
│   │   └── inference.py
│   │   
│   ├── text_extraction
│   │   ├── es_api.py
│   │   ├── make_vocab.py
│   │   └── text_extraction.py
│   │
│   ├── text_generation
│   │   ├── arguments.py
│   │   ├── data.py
│   │   ├── hashtag_preprocess.py
│   │   ├── inference.py
│   │   ├── preprocess.py
│   │   └── train.py                  
│   │
│   ├── requirements.txt
│   └── README.md
│
└── data/es_data                     
    └── vocab_space_ver2.txt

Detail

멀티모달 분류모델을 활용하여 입력된 상품 이미지와 제목으로 카테고리 분류
- 사용자가 제공한 이미지와 상품 제목을 각각 분류 후 Soft Voting을 통한 카테고리 분류
생성/추출모델을 통해 상품 노출 빈도를 높일 수 있는 해시태그 생성
- TF-IDF 빈도수 계산 및 Elastic Search를 이용한 본문 내 해시태그 추출
- GPT-2를 기반으로 실제 약 10만개의 제목, 본문, 해시태그를 학습한 fine-tuned 모델을 이용한 해시태그 생성
시연영상: YouTube

Name		Name	Last commit message	Last commit date
Latest commit History 146 Commits
code		code
data/es_data		data/es_data
.gitignore		.gitignore
README.md		README.md
nlprime_project.pdf		nlprime_project.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

당신도 중고 거래왕이 될 수 있습니다!

Table of Contents

Members

Project Overview

Getting Started

Hardware

Code Structure

Detail

About

Releases

Packages

Contributors 6

Languages

boostcampaitech2/final-project-level3-nlp-16

Folders and files

Latest commit

History

Repository files navigation

당신도 중고 거래왕이 될 수 있습니다!

Table of Contents

Members

Project Overview

Getting Started

Hardware

Code Structure

Detail

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Languages

Packages