NAVER Cafe Crawler using pandas, tqdm, Selenium, BeautifulSoup4
Caution
This crawler was created for educational and training purposes related to data analysis.
Users bear full legal responsibility for any use of this crawler.
- pandas
- tqdm
- Selenium
- BeautifulSoup4
Install them using pip.
pip install pandas tqdm selenium beautifulsoup4
Set your NAVER ID and password.
user_id = ''
user_pw = ''
Specify the name and ID of the NAVER Cafe you want to crawl.
cafe_name = ''
cafe_id = 0
Specify the menu ID of the NAVER Cafe and the number of pages to collect.
(Menu ID, Number of pages)
Note
For example, if you enter 15 as the number of pages, it will crawl from page 1 to page 15.
menu_id_page = [
(100, 15),
]
Run Prepare.py
to generate a file named [Menu ID]_link.csv
, which contains the links to be crawled.
python3 Prepare.py
Provide the list of [Menu ID]_link.csv
files to crawl.
file_name_list = [
('100_link.csv'),
]
Run Crawling.py
to perform crawling.
python3 Crawling.py
It will save:
- Post contents in
[Menu ID]_content.csv
- Comments in
[Menu ID]_comment.csv