Skip to content

Procedures

Nuno Macedo edited this page Jun 30, 2017 · 40 revisions

Synchronization Procedures

This section specifies the synchronization procedures that allow a CRIS service to keep a user’s profile consistent with the one at ORCID.

Overview

The procedures can be used to synchronize the profiles according to two different modes: one for importing information from the remote ORCID profile and another for exporting information from the local profile. These can be used independently depending on the needs of the interested service.

Import

This mode aims to harvest new research outputs from ORCID, namely new publications (import works) and new external identifiers of known publications (import updates). The general principle is that every external identifier in an ORCID profile should be harvested. The synchronization procedure supporting this mode is semi-automatic, based on a notification system, allowing the user to select which outputs or external identifiers he wishes to add to his PTCRIS profile. These procedures do not change the ORCID user profile, managing only the notifications of the input CRIS profile. This semi-automatic approach provides the user with valuable information while still allowing him to control the updates that are effectively applied to the local profile. This design decision is due to the fact that the ORCID user profile may contain erroneous information (for example, erroneous meta-data), and, as such, the propagation of such errors to the CRIS profile is avoided, giving the opportunity for the user to clean-up his ORCID profile beforehand (for example, deleting incorrect works or creating new versions with corrected meta-data).

Export

This mode is targeted at CRIS services that intend to be ORCID sources and export their productions to ORCID (export works), ensuring that other CRIS services can harvest them. The general principle is that every production selected to be exported in the CRIS profile should be inserted as a new work in the ORCID profile and then automatically kept up-to-date. These procedures do not change the input CRIS user profile and manage only the works on the remote ORCID profile whose source is the CRIS service (it does not have the ability to modify works sourced by other services or change which activity is selected as preferred in a group).

Sync

These modes are supported by separate synchronization procedures so that a CRIS service may choose to implement only one of the modes (e.g., RCAAP is only concerned with exporting outputs, while the SARIs are concerned with harvesting outputs) or the conjunction of both (e.g., the CV management system DeGóis). In the latter case, export procedures must be executed prior to import procedures, since running the former may change the grouping of works which affects the outcome of the latter (see the notes on scheduling below).

Group merging

The main conceptual difference between ORCID and typical CRIS services is that ORCID automatically groups productions that are considered the same into a single group. Two productions are considered the same if they share an external identifier, and this relation is transitive. In the ORCID web interface, the user is able to select which work of the group is preferred, which is the one that will be publicly displayed in the profile.

To synchronize CRIS profiles with ORCID profiles, ORCID work groups must be merged into single productions. Due to the central role of the external identifiers in ORCID and PTCRISync, the merging of a group as of PTCRISync v1.0 is performed as follows:

  • collect every external identifier from every work that comprises the group
  • collect the remainder meta-data from the work of the group selected as preferred by the user

Quality criteria

Every PTCRISync procedure relies on a quality criteria over the productions that are to be synchronized, both for the remote ORCID works that are to be imported and for the local CRIS productions that are to be exported. An alternative import procedure (import invalid) is provided to harvest productions considered invalid, warning the user about problematic entries in the ORCID profile.

To promote the performance of the procedures, this criteria are defined solely over the work summaries returned by the ORCID API, and not over the full works (which would require additional calls to the API).

As of PTCRISync v1.0, to pass the quality criteria, a work must have:

  • at least one external identifier assigned
  • the title
  • the work type
  • the publication year (unless the work is a data set or research technique)

Scheduling

Each service is free to choose when to run the synchronization procedures, as long as inconsistencies in the profiles are eventually resolved within a reasonable delay. The export procedures should also be executed prior to the import procedures, in order to guarantee the consistency of the import results.

In general, the import procedures need only be run when the user is managing the list of synchronized works, since these notifications are volatile and need not be persisted. A lighter method for detecting possible pending import notifications is also provided (import counter), which should be run otherwise.

The export procedure, in contrast, needs to be run to keep the ORCID profile synchronized with the list of local productions selected to be actively synchronized. Ideally it should be run when this selection list is updated, when any work of that list is modified, or when the ORCID profile is updated (in particular, if works sourced by the local service were deleted).

One possible choice is to run the procedures periodically in the specified order (export, then import) in batch mode, thus avoiding possible delays that can negatively affect the user interface. Premium ORCID members could also trigger the synchronization based on Webhooks Change Notifications from ORCID, by registering to be notified when a user profile is changed.

Another reasonable choice would be to run the import procedures at the begin of a user session and the export procedure at the end. This ensures that the visible parts of the profiles are consistent when the user is logged out, but that whenever he logs in again the correct notifications are shown. We believe that invoking the synchronization procedures every time the user performs an edit within a session may be counterproductive, as new notifications might keep popping-in and confuse the user. Similarly to distributed systems, the goal of the synchronization framework is to ensure eventual consistency and not necessarily real-time strong consistency among all services.

The scheduling employed in the reference implementation is the following:

  • The import works procedure is run whenever the user opens the synchronization page or updates the list of works being synchronized. The notification provided by this procedure are volatile, and need only be generated when the user is inspecting the synchronization pane. The same applies to the import invalid procedure. (The import counter is run before since it is used to warn the user in its homepage.)
  • The import updates procedure is run in background in a periodic basis. This procedure does not require input from the user, and simply harvests meta-data from the ORCID profile and automatically updates the local productions.
  • The export works must be run before any of the import procedures in order for them to produce consistent results. Thus, the procedure is run in background periodically (prior to the import updates) and whenever the synchronization menu is accessed/updated (prior to the import works/invalid).

Correctness

The consistency ensured by both modes is formally stated in the companion formal specification (with a precise set of constraints that instantiates the above general principles), and the synchronization procedures were designed to satisfy several "well-behavedness" properties concerning such consistency. The most important of those is correctness, namely ensuring that after running the synchronization procedures the user profiles in ORCID and in the CRIS service are consistent according to the specification. Another important "well-behavedness" law is stability, ensuring that if the synchronization procedures are run on already consistent profiles the result is the same (modulo differences in the internal identifiers).

Having stable synchronization procedures ensures that there is no need to explicitly check the consistency to determine whether they should be run, since running the procedures will not affect already consistent profiles. In fact, the checking procedures have the same approximate complexity as the synchronizers, and thus, no significant performance gains would be achieved by running them beforehand. It could even cause a performance degradation if the user profiles happen to be inconsistent.