v0.11.0
We are pleased to announce version 0.11.0 of ACHE Crawler! Besides several technical improvements, we are really glad to announce the very first ACHE release under the Apache License 2 (APLv2).
Following is a detailed log of the major changes since the last version:
- Removed dependency on Weka and reimplemented all machine-learning code using SMILE.
- Added option to skip cross-validation on
ache buildModel
command - Added option to configure max number of features on
ache buildModel
command - Changed license from GNU GPL to Apache 2.0
- Added tool (ache run ReplayCrawl) to replay old crawls using a new configuration file
- Added near-duplicate page detection using min-hashing and LSH
- Support ELASTIC format in Kafka data format (issue #155)
- Upgrade react-scripts to get rid of vulnerable transitive dependency (hoek:4.2.0)
- Upgrade npm version to 5.8.0 on gradle build script
- Changed
smile
target page classifier to use Platt's scaling only when the
parameter 'relevance_threshold' is provided in thepageclassifier.yml
file. - Added Ansible scripts for automatic deployment
- Added RocksDB-based target repository (RocksDBTargetRepository)
- Fixed bug in ache-dashboard that prevented reloading search page on the browser
page refresh (issue #163) - Support Elasticsearch 6.x (issue #158)