Skip to content

A fully automated web scraping solution that includes scheduling automatic launches through GitHub Actions, using Puppeteer with Node.js to collect data, and storing the data on GitHub Pages for public access.

License

Notifications You must be signed in to change notification settings

pulipulichen/crawler-Course-Example

Repository files navigation

DOI

crawler-Course-Example: 自動排程網路爬蟲教材

A fully automated solution for web scraping. It includes automatic scheduling and triggering through GitHub Actions, data collection using Node.js's Puppeteer, and data preservation on GitHub Pages for use by others.

一套網路爬蟲的全自動解決方案。包含了透過GitHub Action自動排程啟動、使用Node.js的Puppet蒐集資料、以GitHub Pages保存資料供其他人使用。

Techniques

  • Node.js
  • GitHub Action: 自動排程執行的DevOps方案。
  • Puppeteer: Node.js的瀏覽器模擬工具。

Slide

Citation

Chen, Y.-T. (2024). Crawler-Course-Example (20240518.210053) [Computer software]. Zenodo. https://doi.org/10.5281/ZENODO.11214115


Memo

最後記得用以下欄位儲存:

  • id: 一定要有id。
  • dc.title
  • dc.creator
  • dc.subject
  • dc.description
  • dc.publisher
  • dc.contributor
  • dc.date: 建議轉換成ISO格式。
  • dc.type
  • dc.format
  • dc.identifier
  • dc.source
  • dc.language
  • dc.relation
  • dc.coverage
  • dc.rights

API

https://pulipulichen.github.io/crawler-Course-Example/data.csv

About

A fully automated web scraping solution that includes scheduling automatic launches through GitHub Actions, using Puppeteer with Node.js to collect data, and storing the data on GitHub Pages for public access.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published