Google Season of Docs 2021

About CDLI

If you are interested in working as a technical writer with us, please write to [email protected]

CDLI is an international digital library of ancient artifacts inscribed with cuneiform writing. The mission of CDLI is to collect, preserve and make available images, text, and metadata of all artifacts inscribed with the cuneiform script. It is the sole project with this mission and we estimate that our 335,000 catalog entries cover some two-thirds of all sources in collections around the world. Our data are available publicly at https://cdli.ucla.edu and our audiences comprise primarily scholars and students, but with growing numbers of informal learners.

At the heart of CDLI is a group of developers, language scientists, machine learning engineers, and cuneiform specialists who develop software infrastructure to process and analyze curated data. To this effect, we are actively wrapping up two projects: CDLI Framework Update https://cdli.ucla.edu/?q=news/cdli-core-update and Machine Translation and Automated Analysis of Cuneiform Languages https://cdli-gh.github.io/mtaac/. As part of these projects, we have been building a natural language processing platform to empower specialists of ancient languages for undertaking automated annotation and translation of Sumerian language texts thus enabling the data-driven study of languages, culture, history, economy, and politics of ancient Near Eastern civilizations. As part of this platform, we are focusing on data standardization using Linked Open Data to foster best practices in data exchange and integration with other digital humanities and computational philology projects.

Our tools are available as standalone software but at the core of our services to the community is a web platform which we are actively developing as we hope to phase out our current web platform in the next 12-18 months.

About our documentation

CDLI has a large version controlled codebase on Github & Gitlab with more than all time 100+ contributing members (about 10 active), cdli operates in different technical areas like full-stack development, natural language processing, machine translation, machine learning, databases, and data science.

Programmers and scholars have been documenting their work in developing and enhancing the cdli framework and its tools and all have contributed at least some minimal form of documentation.

Overall, the cdli documentation covers areas like software documentation for developers in the form of readme and install guides, tools guides explaining why (more scholarly) and how (more practical) to use certain features for example like our morphological annotator. We also have user guides from our current platform which contain information that can be adapted to the new framework.

https://cdli.ox.ac.uk/wiki/ Mostly a knowledge wiki about Mesopotamia but contains some guides on digitization of artifacts (scanning, turntable for photographing seals, etc)

https://cdli-gh.github.io/ Our main documentation site, sadly incomplete, contains extensive explanations on how Sumerian grammar was formalized so we could annotate textual data but also some practical guides on annotation with tools like Brat

https://cdli-gh.github.io/annodoc/ Extensive documentation on the annotation of syntax and the manipulation of such data using various tools

https://cdli.ucla.edu/?q=cdli-user-information, https://cdli.ucla.edu/?q=support-cdli, https://cdli.ucla.edu/?q=cdli-search-information Current web platform documentation on how to use search, on the meaning of the various data fields attached to artifacts and other information concerning the transliterations conventions.

http://oracc.museum.upenn.edu/doc/help/editinginatf/ Explanations on how to prepare transliterations in the C-ATF format.

Chosen project

For this year, in concert with the active developers, we have chosen to focus on the documentation restructuring and user guides consolidation and writing. CDLI has been putting a lot of efforts into making its new web interface as accessible as possible, but those efforts cannot be complete without proper documentation to accompany the interface.

All Projects

Audit and restructuring of our documentation

The current state of our documentation is partial, it is not well organized and for the past 15 years has been especially difficult for users and editors to follow. We first need to make it easy to find specific documentation, whichever audience the person searching is part of, and whatever sort of information they are looking for.

Developpement of a documentation workflow

Workflows should be put in place so we can preserve and evolve the new documentation structure, while growing and improving our documentation in an organic and useful way.

Guides and Documentation

Our various existing guides need to be refactored and some need to be written from scratch.

Developper guides

Some of the Framework features have very well organized and useful readmes, maybe we can keep those as is but we should centralize access to all those readmes somewhere so it's easy to find the appropriate documentation for each cdli framework feature.

Framework installation guide

We have two main readmes with information about installing the cdli framework locally for development. Most developers cannot install the stack without facing a problem or another so this shows how important the update of this documentation is. Some features of the platform have installation and configuration instructions but in separate documents, they should be linked together and checked for accuracy. For instance, it needs to be clear how to deploy the search indexes or the sparql server.

Framework core and features

There is currently no guide on where things are in the codebase and how things work other than the readmes. We need to have a more detailed documentation which can help developers get started. Maybe a FAQ would be useful.

User guides

Consolidate existing user guide documentation under the new documentation organization and write up remaining user guides for the CDLI framework (website) core features and additional features

Main web interface
Simple Search
Advanced Search
Search Filter
Expanded results
Compact results
Browse
Heatmap
Data entities and attributes
Data formats for download or to consume
Downloading data (single artifacts or based on search results)
How to report errors or give feedback
Using the multilayer annotations search tool
Using the commodity visualizer
Using the cts data service

Editor guides

Write up the editor guide for the management of CDLI data on the framework (website). Editors are individuals that have a very high knowledge about the information cdli treats but they are not necessarily very technical people so they need very clear instructions on how to prepare and upload data, and how to use the administrative features of the web platform. In the case of preparing linguistic annotations we have extensive documentation on the "why" but not on the "how".

Capturing a digital surrogate of artifacts
Preparing catalogue data
Preparing transliterations, transcriptions and translations
Preparing linguistic annotations
prepering a bibliopgraphy and links between references and artifacts or other entities
Uploading bulk data
Using the ATF editor
Adding or editing artifacts
Adding or editing other entities
Adding or editing references
Minio server and fatcrossing feature
Preparing web images
Managing users
Managing the journals
Managing cdli tablet app data

Metrics of evaluation

To track the project metrics we hope we can integrate a form of automated feedback directly in the documentation. As part of this project, we will systematize feedback collection from users and editors while and after they read the guides. We will also ask our GSoC participants to tell us about how the guides mirror their understanding of the core and peripheral features of the site. We will examine the documents based on the quantity and reach of the document whereas the document is expected to have good coverage of project features.

Before the start of the writing process, after the audit of the documentation, we will expect the technical writer to thoroughly plan their work to make sure the scope is reasonable and achievable. We will also work out personalized measures of success with them.

Additional Information

CDLI has been part of the Google Summer of Code for 4 years now but this would be our first year with Google Season of Docs. About 30% of our participants have been able to produce good quality documentation and about 10% excellent documentation.

We hope to be able to rely on an experienced technical writer to help us organize our documentation and it's workflow so we can follow in their steps to write better documentation in the future.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly