Releases: peterbencze/serritor
Serritor 2.1.1
- Fix bug where crawl seeds were fed to the frontier twice, resulting in incorrect crawl stats
- Fix bug where crawl stats were not reset when the crawler was restarted after its state was restored
- Update dependency versions
Serritor 2.1.0
This release includes new features, improvements and changes to the existing API.
Changes in a nutshell:
- Add helper class for finding text in response content
- Refactor UrlFinder
- Modify HTTP client so that it uses the same user-defined HTTP proxy as Selenium
- Ignore authentication cookie when cookie authentication is not enabled
- Use MutableCapabilities instead of DesiredCapabilities when configuring the browser
Serritor 2.0.0
This major release includes a number of new features, bug fixes and changes to the existing API.
Changes in a nutshell:
- Add internal proxy server to overcome Selenium limitations (no access to response headers etc.)
- Add onBrowserInit callback to configure the browser before the crawling begins
- Always call onStop even if an unhandled exception is thrown
- Rename callbacks
- Add detailed logging
- Use slf4j instead of builtin logger
- Add web API feature
... and more
Serritor 1.6.0
This release adds the possibility to specify custom callbacks for crawl events.
Serritor 1.5.0
This release includes bug fixes and a number of enhancements and new features.
Major changes in a nutshell:
- Change the access modifier of the stop method
- Add the possibility to download files
- Add the possibility to retrieve response content type
- Fix browser compatibility check exception when using HtmlUnitDriver
- Add default URL finder creation method
- Remove Selenium cookie synchronization
- Add support for loading config from previously saved state
- Add static methods for creating crawl requests with the default config
Serritor 1.4.0
This release includes a number of bug fixes and improvements.
Serritor 1.3.1
This release includes a new feature and changes to the existing API.
Changes in a nutshell:
- Changes how the crawler is configured:
- Adds CrawlerConfigurationBuilder for building CrawlerConfiguration instances
- The configuration is passed to the crawler's constructor
- Adds the possibility to download the file in onNonHtmlResponse callback
Please check the Wiki for more information.
Serritor 1.3.0
This release includes new features, improvements and changes to the existing API.
New features in a nutshell:
- Crawl domains: they specify the domains in which crawling is allowed
- Crawl delay mechanisms: these can be used to determine the delay between each request
- Url finder: it can be used to find URLs in HTML sources using regular expressions
Please check the Wiki for more information.
Serritor 1.2.1
This release includes minor fixes and improvements (including changes to the API, please check the Wiki for more information).
Serritor 1.2
This release includes new features, bug fixes and major API modifications. Please check the documentation for more information.