-
Notifications
You must be signed in to change notification settings - Fork 0
This is a simple Java parser that extracts the url, title, description and links from the search engine results page (in this case, yandex.ru) and downloads the first 10 pages into results directory.
pistonsky/yandex-1
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
The program utilizes regular expressions to parse html file. To launch type: java Parser keywords to search Replace "keywords to search" with your search string. Sample output: Document 1: http://abc.com Welcome to abc.com The greatest website in the world! abc.com Document 2: ... It will also download 10 pages to the results folder. The files will be named 1.html, 2.html and so on. If the program throws an error, that means yandex.ru gives captcha page instead of serp.
About
This is a simple Java parser that extracts the url, title, description and links from the search engine results page (in this case, yandex.ru) and downloads the first 10 pages into results directory.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published