How to model the last crawl date #38
Unanswered
saumier
asked this question in
Data Model
Replies: 1 comment
-
The current implementation of Capacitor and Footlight as of MARCH 2022 is to use http://schema.org/lastReviewed to store the date of the last crawl of the Webpage. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Two questions data clients may ask about the data in Artsdata are: when was the event last modified, and when was the last crawl. How should these questions be answered by the model?
When a data client like a Calendar system queries Artsdata to find all events that have been modified since a certain date, the WebPage property schema:dateModified can be used. This approximation is mostly sufficient since the entire event can be retrieved even if only a couple of event properties have actually been modified. An alternative could be to use schema:Observation to note dateModified on a specific node property.
But how should the notion of "last crawl date" be modelled? schema:sdDatePublished or http://schema.org/lastReviewed are ideas. The need is to know how 'fresh' the data is. Since we crawl webpages, it makes sense to apply the date to the entire webpage. The schema:lastReviewed property implies a human check or review. This could be used to indicate that a human reviewed the structured data, but is probably more used for cases when the actual Webpage is reviewed without being updated. The act of crawling does not clearly imply that the data was also published, especially if the data is not modified.
Beta Was this translation helpful? Give feedback.
All reactions