Skip to content

wachuuu/webcrawlerjs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

webcrawlerjs

Web Crawling and then Scraping morele.net using Node.js

This simple crawler/scraper starts in Lego category of the morele.net shop, finds links to another pages and extracts data from them. In this case I use it to calculate price of one brick in the Lego set.

After scraping all subsites or exceeding visiting limit, program will sort extracted data and write it in products.json file. On the top of the file you can find lego sets with lowest price-per-piece ratio.

I've implemented queuing system that allows us to process data asynchronously, but with concurrency limit. With every request there are new potential subsites to visit, concurrency limit ensures that we won't utilize all subsites at once and end process

Setup

Install packages using npm

$> npm install

Run script

$> node main.js

About

Web Crawling/Scraping with Node.js

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published