Skip to content

9. Deidentifying your data

Shelley Staples edited this page Nov 30, 2021 · 7 revisions

Deidentifying your data is usually the last step in preparing your files for a corpus that you want to share with others. We have developed a two-step process of the deidentification. As a first step, outlined in 9a. Automatic deidentification, we run a Python script that removes proper names and other identifying information outside the body of the students’ texts. Many proper names also occur in the texts themselves (especially certain assignments, such as reflections), which is usually something that needs to be deidentified manually. For the second step of the deidentification process, outlined in 9b. Manual deidentification, we have developed a tool that helps with the manual deidentification by highlighting capitalized words. If you are not comfortable running the python script, you can use the manual deidentification tool on its own.

Navigating CIABATTA

Previous: 8b. Adding headers and changing filenames script

Next: 9a. Automatic deidentification