Skip to content

Crawling strategy improvements and native logging

Compare
Choose a tag to compare
@sibiryakov sibiryakov released this 01 Jun 14:20
· 566 commits to master since this release

Here is the change log:

  • latest SQLAlchemy unicode-related crashes are fixed,
  • corporate website friendly canonical solver has been added.
  • crawling strategy concept evolved: added ability to add to queue an arbitrary URL (with transparent state check), FrontierManager available on construction,
  • strategy worker code was refactored,
  • default state introduced for links generated during crawling strategy operation,
  • got rid of Frontera logging in favor of Python native logging,
  • logging system configuration by means of logging.config using file,
  • partitions to instances can be assigned from command line now,
  • improved test coverage from @Preetwinder.

Enjoy!