final project for the course of Web Information Management, june 2012
Detailed project information and evaluation can be found in the docs/ folder, in the pdf presentation eng_crawler_tripadvisor.pdf
compile and run it/thecrawlers/crawler/CrawlHandler with the arguments:
- numberOfCrawlers
- rootFolder (it will contain intermediate crawl data) ...for example "data/crawl/"\
- timeDelay (time delay between requests in milliseconds)
This version supports crawling on Tripadvisor as it is in june 2012. Due to the focused nature of the crawler and the evolution of page structure in Tripadvisor, this project will output parsing errors after some time and need updates.