-
Notifications
You must be signed in to change notification settings - Fork 1
Survey 1 2013 Jun 22 methods
- Survey 1 2013 Jun 22 contains the survey questions
This page describes how we implemented a survey of authors for ~900 LOD datasets.
There are 337 lodcloud datasets and 902 datasets tagged 'lod'.
How much overlap?
The Prizms node at http://datafaqstest.tw.rpi.edu has two FAqT Brick datasets that survey the lod datasets:
-
how-o-is-lod gathers the VoID metadata for the 337 lodcloud datasets, and describes the named graphs in any SPARQL endpoint that is mentioned (run by
lodcloud@datafaqstest
on 2013-06-04). -
lod-tag gathers the VoID metadata for the 902 lod tag datasets, and describes the named graphs in any SPARQL endpoint that is mentioned (run by
lodcloud@datafaqstest
on 2013-06-22).
The two datasets above do not preserve the contact information that is available in the original datahub.io entries (due to unicode issues with RDF libraries), so we had to create a specific dataset (source/datahub-io/lod-tag-and-lodcloud-group-contacts) to access the contact information directly. Running the global retrieval script version/retrieve.sh will recreate the survey emails using contact information directly from datahub.io. This was done by lebot@datafaqstest
to create versions 2013-Jul-01
, 2013-Jul-02
, and 2013-Jul-03
.
The (not public) spreadsheet datahub.io lod tag contacts is a hand made list of email address contacts for each dataset. If contact information isn't in datahub.io, this is used. In the future, it'll be the other way around: if it's not in this list, it will fall back to datahub.io.
The emails are organized in the source/
directory according to whether or not the dataset had contact information:
source/tagged-lod/is-contactable-by-recovery/is-in-lodcloud
source/tagged-lod/is-contactable-by-recovery/not-in-lodcloud
source/tagged-lod/is-contactable/is-in-lodcloud
source/tagged-lod/is-contactable/not-in-lodcloud
source/tagged-lod/not-contactable/is-in-lodcloud
source/tagged-lod/not-contactable/not-in-lodcloud
Because the emails were generated on the server, but we want to store them on our laptop, the local retrieval script retrieve.sh creates a version of the dataset from the temporary dump file that is created on the server. This needed to be done to bypass our traditional provenance- and archive-intensive publishing procedure and version controlling through GitHub, since we needed to avoid publishing the contact information. The counts.sh script, when run from the conversion cockpit, counts the number of surveys in each partition:
bash-3.2$ pwd
projects/lodcloud/github/lodcloud/data/source/datahub-io/lod-tag-and-lodcloud-group-contacts/version/2013-Jul-03
bash-3.2$ ../counts.sh
890 tagged lod
756 originally contactable
456 not in lodcloud
300 in lodcloud
75 contactable by recovery
56 not in lodcloud
19 in lodcloud
59 not contactable
45 not in lodcloud
14 in lodcloud
The survey emails in manual/todo/
were copied from source/
, then sent manually via email, and then placed into manual/done/
directory. These files are available on the unpublished sending-survey-1
branch on our local laptop (see THERE-IS-MORE-DATA-HERE.readme
for details). It took about two hours for one person to email the 758 surveys.
See the description below about the sending-survey-1
branch of data/source/us/survey-1-responses/version/2012-Jul-03.
See the description below about the partially-public dataset data/source/us/survey-1-results.
data/source/us/survey-1-results/version/2013-Jul-07/doc/publishing-permission.graffle
illustrates the publication permission workflow. survey-methods-dataflow.graffle
illustrates the dataset flow.
- (public) datahub.io lod and lodcloud survey 1 questions contains the question text sent to the LOD publisher via email.
- (not public - never will be) datahub.io lod tag contacts contains a manually curated list of "updated" contact emails for the datasets. This is the result of getting responses pointing us to other people.
- (not public) The email responses are archived on an unpublished branch (
sending-survey-1
) of the lodcloud prizms repository at data/source/us/survey-1-responses/version/2012-Jul-03/source/lodcloud-survey-1.mbox
and data/source/us/survey-1-responses/version/2012-Jul-03/source/lodcloud-survey-1/
. This file and directory are created by right-clicking on thelodcloud-survey-1
folder in Mail.app, selecting "Export Mailbox...", selecting "Export all subfolders", choosing the directorydata/source/us/survey-1-responses/version/2012-Jul-03/source/
, and pressing "Choose" button. This should be done on thesending-survey-1
branch (git checkout sending-survey-1
).-
version/2013-Jul-07/manual/contact-paths.graffle
illustrates the "dead ends" faced when attempting to email the authors.
-
- (not public - portions will be) datahub.io lod and lodcloud survey 1 results contains quotes from the email responses, with annotations to guide their parsing into Linked Data.
- (public) datahub.io lod tools describes the tools mentioned in the survey (homepage, author, etc.).
- (not public) datahub.io lod and lodcloud survey forward responses contains a list of names and email addresses of the participants that wish to see the survey results when they are complete.
- Dataset us survey 1 results outlines some queries against this survey results.