- Retry fetching remote files in case of 5xx responses
- Switch to file size to calculate RDFLib/pyoxigraph switch
- Updated nfdicore/cto structure with altered
prepare
parameter
- Full rewrite with a modular architecture
- Any combination of Feed and FeedElement
- Support for RDF (schema.org), XML (CMIF, LIDO), Beacon, ZIP ingest
- Log but accept missing feed elements
- Less memory hoarding with large datasets
- Look-up routine for authority files
- Single template to generate
nfdicore/cto
triples - Template adapted to current
nfdicore/cto
version - Automatically create ARK IDs for
nfdicore/cto
- Prep work for further serialisations such as DCAT
- New command-line interface and argument parsing
- A
-quiet
option prevents reporting intermiedate progress - Provide optional OCI (Podman/Docker) container set-up
- Observe rules layed out in
robots.txt
files - Recognise
http
andhttps
namespaces in schema.org sources - Provide log files for scraping runs
- Switch to
httpx
- Provide infrastructure for CGIF filters
- Add ability to read triples from LIDO files
- Rename
-source_url_type
to-content_type
- Add option to harvest from file dump
- Bring back option to compile CSV table from scraped data
- Implement URL composition feature for Beacon files
- Add code of conduct
- Use speaking command-line arguments
- Add option to filter resource downloads by string
- Add optional content negotiation
- Test everything against the CVMA API