-
Notifications
You must be signed in to change notification settings - Fork 6
8b. Adding headers and changing filenames script
- Required installations
- Downloading the script and the files
- Running the script on Mac
- Running the script on Windows
Before you try to run this script, make sure you have the following installed:
If you have Anaconda installed | If you have installed Python another way |
---|---|
conda install pandas |
pip install pandas |
conda install xlrd |
pip install xlrd |
conda install pyyaml |
pip install pyyaml |
We have included a folder with some sample files for you to run the script on. Both the folder and the script are inside ciabatta > metadata. The folder is called standardized, and the Python script is called ciabatta_headers.py. Inside the standardized folder there are three course subfolders: 101, 102 and 10600. When you run the script, you need to specify a course subfolder.
Here's what the folder structure inside the ciabatta folder looks like:
There are two ways (a and b below) you can download them.
a) From the git website: Navigate to the ciabatta directory, then in the upper right corner click on the "Code" button and select “Download zip”. This will download the zip file on your computer. Then unzip the file (Windows users: ensure you unzip the file), and you will have the script with the folder on your computer.
b) From the terminal: Navigate to the ciabatta directory, then in the upper right corner click on the "Code" button and copy the link. Now navigate to your terminal on a Mac (in Windows, use Command Prompt or Powershell) and run this line:
git clone https://github.com/writecrow/ciabatta.git
This will download the git directory with the script and the files on your computer.
Adding metadata to your files should come after your corpus files have been converted to .txt, encoded into UTF-8, and standardized to ascii characters (only for English). To do this, use the Corpus Text Processor.
Before you run the script, check how many files there are in the folder standardized/101 to make sure that it is the same number of files after you run the script. To count the number of files, run the following command:
ls standardized/101/**/**/*.txt | wc -l
After you’ve downloaded the ciabatta folder (See Downloading the script and files), navigate to metadata subfolder with this command:
cd metadata
When you’re inside the metadata subfolder, you’re ready to run the script. As a reminder, to run the script, you will need two components: a folder with your corpus in .txt files and a spreadsheet with metadata. Here is the command to run the script:
python ciabatta_headers.py --directory=standardized/101 --master_file=metadata_folder/master_student_data.xlsx
Now check how many files are in the new folder files_with_headers. To count the number of files, run the following command:
ls files_with_headers/**/**/**/**/*.txt | wc -l
A video version of this content is available on the Crow YouTube channel.
Video: Running the script on Mac
Adding metadata to your files should come after your corpus files have been converted to .txt, encoded into UTF-8, and standardized to ascii characters (only for English). To do this, use the Corpus Text Processor.
After you’ve downloaded the ciabatta folder (See Section … on how to do that), navigate to metadata subfolder with this command:
cd metadata
When you’re inside the metadata subfolder, you’re ready to run the script. Before you run the script, check how many files there are in the folder standardized/101 to make sure that it is the same number of files after you run the script. To count the number of files, run the following command:
ls standardized/101/**/**/*.txt | Measure-Object -Line
As a reminder, to run the script, you will need two components: a folder with your corpus in .txt files and a spreadsheet with metadata. Here is the command to run the script:
python ciabatta_headers.py --directory=standardized\101 --master_file=metadata_folder\master_student_data.xlsx
where directory is the place where you saved your files, and master_file is the path to your metadata spreadsheet.
Now your metadata folder should have a new folder called files_with_headers. Let’s first run the ls
command to see if it is there and then visually inspect the folder to make sure that the files have new filenames and headers. Also, check how many files are in the new folder files_with_headers. To count the number of files, run the following command:
ls files_with_headers/**/**/**/**/*.txt | Measure-Object -Line
A video version of this content is available on the Crow YouTube channel.
Video: Running the script on PC
Previous: 8a. Why add headers and filenames?
CIABATTA: Corpus in a Box: Automated Tools, Tutorials, & Advising
See a problem in this wiki? Report an issue. Unsure how to report using GitHub? Get help reporting.