Skip to content
This repository has been archived by the owner on Jun 26, 2020. It is now read-only.

Form Generation and Data Dictionary Model

gregjan edited this page Apr 11, 2012 · 3 revisions

Form Generation and Data Dictionary Model

The Problem

Repository providers need purpose-built web forms or wizards to collect deposit streams from various audiences and of various data types. Different drop-boxes will require their own sets of metadata fields and pick lists. Yet each deposit needs to comply with repository standards in terms of metadata schema, data dictionary and encoding practices. Repository managers need to quickly create new web forms for short term projects with special metadata needs. Data flowing from these deposit forms must be ready for ingest.

Model-driven Approach

Using the same Eclipse Modelling Framework as the workbench crosswalks, we can define shared data dictionaries and support rapid composition and deployment of new deposit forms.

Data Dictionary Model

A data dictionary is a mapping of user recognizable fields and best practices to a particular metadata encoding. They map and describe entities like "faculty author" as a certain set of elements within something like MODS XML. They include specific instructions for the encoding of elements and general usage guidelines. A model for a data dictionary could define each entity as block of metadata that is mapped to recognizable input fields.

An example block of metadata:

  • label: Faculty Author
  • usage: Use this block to record the name and affiliations of authors at the university.
  • inputs: first name, last name, researcher id, department
  • elements:

Each input is really tied to some part of the element encoding portion, in the same way that columns of delimited data are tied to the elements of a crosswalk mapping. Controlled vocabularies also enter in for each input/element combination. Each dictionary entity is sort of like a crosswalk in micro, mapping a semantic unit of metadata. The data dictionary can share the same EMF model as the crosswalks. This gives crosswalk creators the option of plugging delimited data into predefined data dictionary blocks, rather than configuring their own granular MODS elements.

Terminology side note: What is the best word for the elements in a data dictionary. I need one that does not conflict with other terms in this space, which throws out many:

  • "element" b/c XML
  • "entity" b/c XML (and preposterous)
  • "field" b/c web form The best I have so far is "block" or "metadata block". This gives the sense of building with blocks, which is what people can do when they make crosswalks and forms.

Deposit Form Model

A Deposit Form is perhaps a composition of data dictionary thingies, with some surrounding layout hints and descriptive text. Let assume that the layout support is relatively minimal, say an ordered list of text blocks and metadata blocks. Within each metadata block (referencing the dictionary) we have a set of input fields. If we follow the crosswalk model, then inputs will require specific data types. However, even plain text inputs need specifics for form rendering. These are things like the size of the form field, width and multi-line height. (Note: XForms may supply a ready model for this trick of form composition.)

Example: Faculty Poster Deposit Form

  • divs (a ordered list)
  1. "Welcome to the deposit form. Here is a link to policies. Please deposit your work and we will provide access to it forever."
  2. reference to "faculty author" in data dictionary
  3. reference to a file upload block
  4. reference to "type of scholarly work" in data dictionary
  5. reference to "conference entry" in data dictionary
  6. "Thank You"

There are so many different options you could put into a form definition and I won't even try to add them here. Simple is probably best. If we are rely on the data dictionary for best practices for encoding, then we may want to rely on it for best practices for forms as well. Perhaps blocks in the dictionary can come with a default input form mapping. (Once again, consider the XForms as a model here.)