Releases: digicademy/hydra-scraper
Releases · digicademy/hydra-scraper
v0.9.1
v0.9.0
- Full rewrite with a modular architecture
- Any combination of Feed and FeedElement
- Support for RDF (schema.org), XML (CMIF, LIDO), Beacon, ZIP ingest
- Log but accept missing feed elements
- Less memory hoarding with large datasets
- Look-up routine for authority files
- Single template to generate
nfdicore/cto
triples - Template adapted to current
nfdicore/cto
version - Automatically create ARK IDs for
nfdicore/cto
- Prep work for further serialisations such as DCAT
- New command-line interface and argument parsing
- A
-quiet
option prevents reporting intermiedate progress - Provide optional OCI (Podman/Docker) container set-up
- Observe rules layed out in
robots.txt
files - Recognise
http
andhttps
namespaces in schema.org sources - Provide log files for scraping runs
- Switch to
httpx
v0.8.4
Merge pull request #1 from digicademy/lido-to-cgif Add LIDO-to-triple conversion
v0.8.3
Update version number
v0.8.2
Fix content negotiation