Skip to content

NAVER Cafe Crawler using pandas, tqdm, Selenium, BeautifulSoup4

Notifications You must be signed in to change notification settings

sirius-mhlee/naver-cafe-crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NAVER Cafe Crawler

NAVER Cafe Crawler using pandas, tqdm, Selenium, BeautifulSoup4

Caution

This crawler was created for educational and training purposes related to data analysis.
Users bear full legal responsibility for any use of this crawler.

Requirement

  • pandas
  • tqdm
  • Selenium
  • BeautifulSoup4

Install them using pip.

pip install pandas tqdm selenium beautifulsoup4

Configuring Config.py

Set your NAVER ID and password.

user_id = ''
user_pw = ''

Specify the name and ID of the NAVER Cafe you want to crawl.

cafe_name = ''
cafe_id = 0

Running Prepare.py

Specify the menu ID of the NAVER Cafe and the number of pages to collect.
(Menu ID, Number of pages)

Note

For example, if you enter 15 as the number of pages, it will crawl from page 1 to page 15.

menu_id_page = [
    (100, 15),
]

Run Prepare.py to generate a file named [Menu ID]_link.csv, which contains the links to be crawled.

python3 Prepare.py

Running Crawling.py

Provide the list of [Menu ID]_link.csv files to crawl.

file_name_list = [
    ('100_link.csv'),
]

Run Crawling.py to perform crawling.

python3 Crawling.py

It will save:

  • Post contents in [Menu ID]_content.csv
  • Comments in [Menu ID]_comment.csv

About

NAVER Cafe Crawler using pandas, tqdm, Selenium, BeautifulSoup4

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages