webcrawlerjs

Web Crawling and then Scraping morele.net using Node.js

This simple crawler/scraper starts in Lego category of the morele.net shop, finds links to another pages and extracts data from them. In this case I use it to calculate price of one brick in the Lego set.

After scraping all subsites or exceeding visiting limit, program will sort extracted data and write it in products.json file. On the top of the file you can find lego sets with lowest price-per-piece ratio.

I've implemented queuing system that allows us to process data asynchronously, but with concurrency limit. With every request there are new potential subsites to visit, concurrency limit ensures that we won't utilize all subsites at once and end process

Setup

Install packages using npm

$> npm install

Run script

$> node main.js

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitignore		.gitignore
README.md		README.md
main.js		main.js
package-lock.json		package-lock.json
package.json		package.json
products.json		products.json
proxy-test.js		proxy-test.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

webcrawlerjs

Setup

About

Releases

Packages

Languages

wachuuu/webcrawlerjs

Folders and files

Latest commit

History

Repository files navigation

webcrawlerjs

Setup

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages