Skip to content

v0.11.0

Compare
Choose a tag to compare
@aecio aecio released this 01 Jun 18:56
· 214 commits to master since this release

We are pleased to announce version 0.11.0 of ACHE Crawler! Besides several technical improvements, we are really glad to announce the very first ACHE release under the Apache License 2 (APLv2).

Following is a detailed log of the major changes since the last version:

  • Removed dependency on Weka and reimplemented all machine-learning code using SMILE.
  • Added option to skip cross-validation on ache buildModel command
  • Added option to configure max number of features on ache buildModel command
  • Changed license from GNU GPL to Apache 2.0
  • Added tool (ache run ReplayCrawl) to replay old crawls using a new configuration file
  • Added near-duplicate page detection using min-hashing and LSH
  • Support ELASTIC format in Kafka data format (issue #155)
  • Upgrade react-scripts to get rid of vulnerable transitive dependency (hoek:4.2.0)
  • Upgrade npm version to 5.8.0 on gradle build script
  • Changed smile target page classifier to use Platt's scaling only when the
    parameter 'relevance_threshold' is provided in the pageclassifier.yml file.
  • Added Ansible scripts for automatic deployment
  • Added RocksDB-based target repository (RocksDBTargetRepository)
  • Fixed bug in ache-dashboard that prevented reloading search page on the browser
    page refresh (issue #163)
  • Support Elasticsearch 6.x (issue #158)