Skip to content

Latest commit

 

History

History
50 lines (38 loc) · 1.52 KB

CHANGELOG.md

File metadata and controls

50 lines (38 loc) · 1.52 KB

Changelog

0.9.2-pre

  • Retry fetching remote files in case of 5xx responses
  • Switch to file size to calculate RDFLib/pyoxigraph switch

0.9.1

  • Updated nfdicore/cto structure with altered prepare parameter

0.9.0

  • Full rewrite with a modular architecture
  • Any combination of Feed and FeedElement
  • Support for RDF (schema.org), XML (CMIF, LIDO), Beacon, ZIP ingest
  • Log but accept missing feed elements
  • Less memory hoarding with large datasets
  • Look-up routine for authority files
  • Single template to generate nfdicore/cto triples
  • Template adapted to current nfdicore/cto version
  • Automatically create ARK IDs for nfdicore/cto
  • Prep work for further serialisations such as DCAT
  • New command-line interface and argument parsing
  • A -quiet option prevents reporting intermiedate progress
  • Provide optional OCI (Podman/Docker) container set-up
  • Observe rules layed out in robots.txt files
  • Recognise http and https namespaces in schema.org sources
  • Provide log files for scraping runs
  • Switch to httpx

0.8.4

  • Provide infrastructure for CGIF filters
  • Add ability to read triples from LIDO files

0.8.3

  • Rename -source_url_type to -content_type
  • Add option to harvest from file dump
  • Bring back option to compile CSV table from scraped data
  • Implement URL composition feature for Beacon files

0.8.2

  • Add code of conduct
  • Use speaking command-line arguments
  • Add option to filter resource downloads by string
  • Add optional content negotiation
  • Test everything against the CVMA API