-
Notifications
You must be signed in to change notification settings - Fork 15
Google Season of Docs 2021
If you are interested in working as a technical writer with us, please write to [email protected]
CDLI is an international digital library of ancient artifacts inscribed with cuneiform writing. The mission of CDLI is to collect, preserve and make available images, text, and metadata of all artifacts inscribed with the cuneiform script. It is the sole project with this mission and we estimate that our 335,000 catalog entries cover some two-thirds of all sources in collections around the world. Our data are available publicly at https://cdli.ucla.edu and our audiences comprise primarily scholars and students, but with growing numbers of informal learners.
At the heart of CDLI is a group of developers, language scientists, machine learning engineers, and cuneiform specialists who develop software infrastructure to process and analyze curated data. To this effect, we are actively wrapping up two projects: CDLI Framework Update https://cdli.ucla.edu/?q=news/cdli-core-update and Machine Translation and Automated Analysis of Cuneiform Languages https://cdli-gh.github.io/mtaac/. As part of these projects, we have been building a natural language processing platform to empower specialists of ancient languages for undertaking automated annotation and translation of Sumerian language texts thus enabling the data-driven study of languages, culture, history, economy, and politics of ancient Near Eastern civilizations. As part of this platform, we are focusing on data standardization using Linked Open Data to foster best practices in data exchange and integration with other digital humanities and computational philology projects.
Our tools are available as standalone software but at the core of our services to the community is a web platform which we are actively developing as we hope to phase out our current web platform in the next 12-18 months.
CDLI has a large version controlled codebase on Github & Gitlab with more than all time 100+ contributing members (about 10 active), cdli operates in different technical areas like full-stack development, natural language processing, machine translation, machine learning, databases, and data science.
Programmers and scholars have been documenting their work in developing and enhancing the cdli framework and its tools and all have contributed at least some minimal form of documentation.
Overall, the cdli documentation covers areas like software documentation for developers in the form of readme and install guides, tools guides explaining why (more scholarly) and how (more practical) to use certain features for example like our morphological annotator. We also have user guides from our current platform which contain information that can be adapted to the new framework.
https://cdli.ox.ac.uk/wiki/ Mostly a knowledge wiki about Mesopotamia but contains some guides on digitization of artifacts (scanning, turntable for photographing seals, etc)
https://cdli-gh.github.io/ Our main documentation site, sadly incomplete, contains extensive explanations on how Sumerian grammar was formalized so we could annotate textual data but also some practical guides on annotation with tools like Brat
https://cdli-gh.github.io/annodoc/ Extensive documentation on the annotation of syntax and the manipulation of such data using various tools
https://cdli.ucla.edu/?q=cdli-user-information, https://cdli.ucla.edu/?q=support-cdli, https://cdli.ucla.edu/?q=cdli-search-information Current web platform documentation on how to use search, on the meaning of the various data fields attached to artifacts and other information concerning the transliterations conventions.
http://oracc.museum.upenn.edu/doc/help/editinginatf/ Explanations on how to prepare transliterations in the C-ATF format.
For this year, in concert with the active developers, we have chosen to focus on the documentation restructuring and user guides consolidation and writing. CDLI has been putting a lot of efforts into making its new web interface as accessible as possible, but those efforts cannot be complete without proper documentation to accompany the interface.
The current state of our documentation is partial, it is not well organized and for the past 15 years has been especially difficult for users and editors to follow. We first need to make it easy to find specific documentation, whichever audience the person searching is part of, and whatever sort of information they are looking for.
Workflows should be put in place so we can preserve and evolve the new documentation structure, while growing and improving our documentation in an organic and useful way.
Our various existing guides need to be refactored and some need to be written from scratch.
Some of the Framework features have very well organized and useful readmes, maybe we can keep those as is but we should centralize access to all those readmes somewhere so it's easy to find the appropriate documentation for each cdli framework feature.
We have two main readmes with information about installing the cdli framework locally for development. Most developers cannot install the stack without facing a problem or another so this shows how important the update of this documentation is. Some features of the platform have installation and configuration instructions but in separate documents, they should be linked together and checked for accuracy. For instance, it needs to be clear how to deploy the search indexes or the sparql server.
There is currently no guide on where things are in the codebase and how things work other than the readmes. We need to have a more detailed documentation which can help developers get started. Maybe a FAQ would be useful.
Consolidate existing user guide documentation under the new documentation organization and write up remaining user guides for the CDLI framework (website) core features and additional features
- Main web interface
- Simple Search
- Advanced Search
- Search Filter
- Expanded results
- Compact results
- Browse
- Heatmap
- Data entities and attributes
- Data formats for download or to consume
- Downloading data (single artifacts or based on search results)
- How to report errors or give feedback
- Using the multilayer annotations search tool
- Using the commodity visualizer
- Using the cts data service
Write up the editor guide for the management of CDLI data on the framework (website). Editors are individuals that have a very high knowledge about the information cdli treats but they are not necessarily very technical people so they need very clear instructions on how to prepare and upload data, and how to use the administrative features of the web platform. In the case of preparing linguistic annotations we have extensive documentation on the "why" but not on the "how".
- Capturing a digital surrogate of artifacts
- Preparing catalogue data
- Preparing transliterations, transcriptions and translations
- Preparing linguistic annotations
- prepering a bibliopgraphy and links between references and artifacts or other entities
- Uploading bulk data
- Using the ATF editor
- Adding or editing artifacts
- Adding or editing other entities
- Adding or editing references
- Minio server and fatcrossing feature
- Preparing web images
- Managing users
- Managing the journals
- Managing cdli tablet app data
To track the project metrics we hope we can integrate a form of automated feedback directly in the documentation. As part of this project, we will systematize feedback collection from users and editors while and after they read the guides. We will also ask our GSoC participants to tell us about how the guides mirror their understanding of the core and peripheral features of the site. We will examine the documents based on the quantity and reach of the document whereas the document is expected to have good coverage of project features.
Before the start of the writing process, after the audit of the documentation, we will expect the technical writer to thoroughly plan their work to make sure the scope is reasonable and achievable. We will also work out personalized measures of success with them.
CDLI has been part of the Google Summer of Code for 4 years now but this would be our first year with Google Season of Docs. About 30% of our participants have been able to produce good quality documentation and about 10% excellent documentation.
We hope to be able to rely on an experienced technical writer to help us organize our documentation and it's workflow so we can follow in their steps to write better documentation in the future.