Skip to content

Update Fetched Label and Reindex

Ryan Wick edited this page Mar 29, 2021 · 3 revisions

To handle name changes or updates by other organizations (where the URI stays the same) we need to update the fetched label and reindex relevant items. Since our Blazegraph triplestore caches the originally fetched graph, we need to clear out the cached data and re-fetch. The edit forms rely on the local authority YML files for the labels used, so those need to be updated separately.

Recent real example, in Other Affiliation field: http://www.wikidata.org/entity/Q7060130

  • Old Name: Northwest National Marine Renewable Energy Center
  • New Name: Pacific Marine Energy Center

To update a single work, start a Rails Console on the Production Server.

Then initialize the triplestore:

@triplestore ||= TriplestoreAdapter::Triplestore.new(TriplestoreAdapter::Client.new(ENV['SCHOLARSARCHIVE_TRIPLESTORE_ADAPTER_TYPE'] || 'blazegraph',
                                                                                    ENV['SCHOLARSARCHIVE_TRIPLESTORE_ADAPTER_URL'] || 'http://localhost:9999/blazegraph/namespace/development/sparql'))

Delete cached graph of URI provided, to clear Blazegraph:

@triplestore.delete('http://www.wikidata.org/entity/Q7060130')

Create working objects:

tps = ScholarsArchive::TriplePoweredService.new
work = ActiveFedora::Base.find(pid)

Fetch all labels on URI, should force new remote fetch since Blazegraph is empty:

tps.fetch_all_labels(work.other_affiliation)

Update index of item:

work.update_index

The new label should now appear on the work show page. Can also check from the console with:

tps.fetch_top_label(work.other_affiliation)

If multiple works need to be updated, find the relevant works with a Solr query and check number of results:

solr_query_str = "other_affiliation_label_ssim:\"Northwest National Marine Renewable Energy Center$http://www.wikidata.org/entity/Q7060130\""

response = ActiveFedora::SolrService.get(solr_query_str, 'fl' => 'id,title_tesim,other_affiliation_label_ssim', 'rows' => 100000, 'sort' => 'id asc')
response.count

Then loop through the results and update the index of each one:

response['response']['docs'].map{|x| x["id"]}.each do |pid|
  work = ActiveFedora::Base.find(pid)
  work.update_index
end

Fixing Problems with Fetched Labels

If the indexed label is not the desired one (possibly non-English), Blazegraph can be directly updated and then a reindex of affected works can happen. An example is fetches for the Linus Pauling Institute wikidata URI ( http://www.wikidata.org/entity/Q3151762 ) were returning Chinese or Azerbaijani labels. In addition, the default response from wikidata in RDF/XML format is currently (March 2021) invalid and causes errors when parsed.

Load graph of wikidata URI locally (?flavor=dump is optional, removes 'extra' external entity statements) Wikidata access info

g = RDF::Graph.load('https://www.wikidata.org/wiki/Special:EntityData/Q3151762.ttl?flavor=dump')

Delete all existing name/label statements (prefLabel, label, name)

g.delete([RDF::URI('http://www.wikidata.org/entity/Q3151762'), RDF::URI('http://www.w3.org/2004/02/skos/core#prefLabel'), nil])
g.delete([RDF::URI('http://www.wikidata.org/entity/Q3151762'), RDF::URI('http://www.w3.org/2000/01/rdf-schema#label'), nil])
g.delete([RDF::URI('http://www.wikidata.org/entity/Q3151762'), RDF::URI('http://schema.org/name'), nil])

Set new label and language

new_label = RDF::Literal.new("Linus Pauling Institute", language:'en')

Add back name/label statements for English value

g << RDF::Statement(RDF::URI('http://www.wikidata.org/entity/Q3151762'), RDF::URI('http://www.w3.org/2004/02/skos/core#prefLabel'), new_label)
g << RDF::Statement(RDF::URI('http://www.wikidata.org/entity/Q3151762'), RDF::URI('http://www.w3.org/2000/01/rdf-schema#label'), new_label)
g << RDF::Statement(RDF::URI('http://www.wikidata.org/entity/Q3151762'), RDF::URI('http://schema.org/name'), new_label)

Check current statements on graph (several ways to query depending on what you want to see)

g.statements.each do |s|
puts s
end

(If not done already) Clear existing statements cached in triplestore

@triplestore.delete('http://www.wikidata.org/entity/Q3151762')

Store new edited graph in triplestore

@triplestore.store(g)

Fetch label for work value. If correct, update work's index

tps.fetch_top_label(work.other_affiliation)
work.update_index

Relevant Source Code