Skip to content

Releases: digicademy/hydra-scraper

v0.9.1

04 Nov 20:21
Compare
Choose a tag to compare
  • Updated nfdicore/cto structure with altered prepare parameter

v0.9.0

04 Nov 13:18
Compare
Choose a tag to compare
  • Full rewrite with a modular architecture
  • Any combination of Feed and FeedElement
  • Support for RDF (schema.org), XML (CMIF, LIDO), Beacon, ZIP ingest
  • Log but accept missing feed elements
  • Less memory hoarding with large datasets
  • Look-up routine for authority files
  • Single template to generate nfdicore/cto triples
  • Template adapted to current nfdicore/cto version
  • Automatically create ARK IDs for nfdicore/cto
  • Prep work for further serialisations such as DCAT
  • New command-line interface and argument parsing
  • A -quiet option prevents reporting intermiedate progress
  • Provide optional OCI (Podman/Docker) container set-up
  • Observe rules layed out in robots.txt files
  • Recognise http and https namespaces in schema.org sources
  • Provide log files for scraping runs
  • Switch to httpx

v0.8.4

22 Oct 12:22
c374798
Compare
Choose a tag to compare
Merge pull request #1 from digicademy/lido-to-cgif

Add LIDO-to-triple conversion

v0.8.3

08 Oct 15:02
Compare
Choose a tag to compare
Update version number

v0.8.2

05 Oct 09:12
Compare
Choose a tag to compare
Fix content negotiation