Skip to content

Latest commit

 

History

History

docs

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Kielipankki-utilities/docs

This folder contains the pipeline instructions for preparing text corpora. Described are all steps needed from converting the original data to publishing the corpus in Korp or in the download service.

In addition, the file corpus_publishing_tasklist.md contains checklists for tasks in the corpus publishing pipeline that can be copied to the description of a Jira ticket for publishing (a version of) a corpus for keeping track of the progress of the publication process.

The instructions are accessible through the GitHub browser interface or in a cloned Kielipankki-utilities Git repository (e.g., on Puhti). You should update your own copy of the repository with git pull to see the latest changes. A third option would be to use the desktop client of GitHub.

The instructions are organized in several files, all stored in this subfolder docs. The order of tasks can be seen from the checklists.

The instructions are written in Markdown format. (For more information please see: https://help.github.com/en/articles/about-writing-and-formatting-on-github.) The browser interface of GitHub displays this nicely and makes it easy to read and edit the text.

Please feel free to give feedback, correct and edit the instructions where needed, and add what you think is missing. You can also add new files. You might find placeholders in the instructions for still missing information (e.g. a guideline for testing text corpora in Korp) and of course everybody is welcome to fill them.