Skip to content

dangrasso/crawler-tripadvisor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Crawler TripAdvisor

A focused crawler in Java �for reviews-extraction from TripAdvisor

final project for the course of Web Information Management, june 2012

Detailed project information and evaluation can be found in the docs/ folder, in the pdf presentation eng_crawler_tripadvisor.pdf

Running the crawler

compile and run it/thecrawlers/crawler/CrawlHandler with the arguments:

  • numberOfCrawlers
  • rootFolder (it will contain intermediate crawl data) ...for example "data/crawl/"\
  • timeDelay (time delay between requests in milliseconds)

Warning

This version supports crawling on Tripadvisor as it is in june 2012. Due to the focused nature of the crawler and the evolution of page structure in Tripadvisor, this project will output parsing errors after some time and need updates.

About

A focused Crawler for extracting reviews data on Tripadvisor.it

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages