Supplementary repository of the manuscript "Using Think-Aloud Data to Understand Relations between Self-Regulation Cycle Characteristics and Student Performance in Intelligent Tutoring Systems" accepted as full paper to LAK '24.
Borchers, C., Zhang, J., Baker, R. S., & Aleven, V. (2024). Using Think-Aloud Data to Understand Relations between Self-Regulation Cycle Characteristics and Student Performance in Intelligent Tutoring Systems. In Proceedings of the 14th International Learning Analytics and Knowledge Conference (LAK24). ACM.
@inproceedings{borchers2024thinkaloud,
title={Using Think-Aloud Data to Understand Relations between Self-Regulation Cycle Characteristics and Student Performance in Intelligent Tutoring Systems},
author={Borchers, Conrad and Zhang, Jiayi and Baker, Ryan S and Aleven, Vincent},
booktitle={LAK24: 14th International Learning Analytics and Knowledge Conference},
doi={10.1145/3636555.3636911},
url={https://doi.org/10.1145/3636555.3636911},
year={2024}
}
-
process-transcripts.ipynb
: Notebook to process think-aloud transcripts from Whisper AI based on native Zoom transcriptions and local transcriptions of audio files. Generates a file adding tutor log data context to transcripts for human labeling of utterances. The output of this process is available on CMU DataShop. The file itself cannot be executed as the raw dataset is not available due to PII. -
setup-lak24.R
: All R setup code to create the analysis dataset from the human-labeled dataset fromprocess-transcripts.ipynb
which is on CMU DataShop. -
setup.sh
: A setup script which renames the datasets from CMU DataShop and runs thesetup-lak24.R
script if everything is present. If the output says that an error has occurred, do NOT run the below two files as they may fail to execute. -
main-lak24.R
andlak24-functions.R
: All R analysis code that takes the analysis dataset fromsetup-lak24.R
to reproduce results for RQ1 and RQ2. -
rq3-analysis.ipynb
: Analysis code to reproduce results for RQ3.
Data to reproduce all analyses conducted for this study can be requested via CMU DataShop.
There are two methods to run the analysis in this repository: Development Containers (Dev Containers) or running the codebase locally. Most of the setup process is handled for you using dev containers; however, it does require Docker to execute. The second method downloads the required software to your local machine; however, this is more prone to error as existing setups may cause the process to fail in one way or another. It is recommended to use the dev container route if you can get it working.
Clone this repository using one of the available options under the <> Code
tab. If you do not know how to clone with git
:
# e.g., git clone https://github.com/exampleuser/exampleproject.git
git clone <URL TO PROJECT>
First, open the CMU Datashop project webpage in your browser. To access the dataset, you will need to request access with an account on DataShop. You can create an account using your Google or GitHub account, whichever is easiest.
Once you have created an account, navigate back to the project webpage and click on the button Request Access
. Provide your reasoning to access the dataset and click Confirm
. You should receive an email once the PI approves the request; however, you can also check by seeing whether you can click the Export
button on the project webpage.
Now that you have permission, you can get the first two CSVs needed for the project by clicking Files (X)
where X
is a number, and then clicking Files (X)
again if no datasets are shown. Click the file names of the CSVs to download them: lak24-coded-utterances.csv
and transcripts-with-logdata-reference-lak24.csv
.
To get the final dataset, click the Export
button. On the left hand side, make sure under Shared Samples
that there is a checkbox next to All Data
by clicking it. Then, click the Export Transactions
button when it appears. Wait for the server to process your request, and then you should have ds5371_tx_All_Data_7671_<timestamp>.txt
.
Put all three of these files in the root of the project folder on your machine:
srl-cycles-lak24
- .devcontainer
- renv
- lak24-coded-utterances.csv
- transcripts-with-logdata-reference-lak24.csv
- ds5371_tx_All_Data_7671_<timestamp>.txt
- main-lak24.R
- //...
To use dev containers, you will need the following installed:
- Docker
- This provides a link to Docker Desktop for ease of use
- Visual Studio Code
- As of the writing of this README, only Visual Studio Code and the JetBrains suite support dev containers natively.
- If you would like to use a different IDE or rich text editor, you will need to download and setup the Dev Containers CLI in Node.js.
- This method will use Visual Studio Code.
- Open the project folder in Visual Studio Code by clicking
File -> Open Folder...
and then selecting the project folder. You should see.devcontainer
,renv
,.gitignore
at the root level in the IDE. - Open up the extensions tab (the button on the left with four squares where the top right square is not directly part of the larger square) and install
ms-vscode-remote.vscode-remote-extensionpack
andms-azuretools.vscode-docker
. If you need to reload VSCode, do so. - Click the
><
symbol in the bottom left hand corner and selectReopen in Container
. - Wait for the setup process to finish. This may take anywhere from 15-45 minutes depending on the speed of your machine.
- Once the terminal says
Done. Press any key to close the terminal
, look at the line directly before it. If it says,Setup successfully finished! You can now run...
, then you can open and runmain-lak24.R
andrq3-analysis.ipynb
. - To run
main-lak24.R
, open the file and click the Play button in the upper right hand corner. This will open an R terminal and run all the executed code. The results should be saved into.html
files. - To run
rq3-analysis.ipynb
, click theRun All
button. If it asks you to select a kernel, clickPython Environments...
->Python 3.9.5 /usr/local/python/current/bin/python
. This option should have a star next to it along with the wordRecommended
. The results will be saved intoans_sorted.csv
and anything significant will be reported in the output terminal.
To run the code locally, you will need the following:
- R 4.2: Windows, Mac, Linux
- Any version of R 4.2 (e.g., 4.2.0-4.2.3) will work
- Python 3.9
- Any version of Python 3.9 (e.g., 3.9.0-3.9.19) will work
- Jupyter Notebook Viewer
- Visual Studio Code and other plugin-based software will also work
Note: If you are on Unix and would like to use the
setup.sh
script, you must make sure you haveRscript
on the PATH.
- Install the python requirements from
requirements.txt
to your global Python instance or virtual environment (recommended) via:
# or py on Windows
python3 -m pip install -r requirements.txt
- Once all the dependencies have been downloaded, install the
stopwords
dataset fromnltk
via:
# or py on Windows
python3 -m nltk.downloader stopwords
- Use one of the below options to get the necessary packages:
To use renv
with R
, open the .properties
file and set AUTOLOAD_RENV
to TRUE
:
AUTOLOAD_RENV=TRUE
Then, open R
and wait for renv
and the required dependencies to install themselves. It should do so using the .Rprofile
. If not, run the folowing in the R terminal:
install.packages("renv")
Then once renv
is installed, install the dependencies via:
renv::restore(prompt = FALSE)
If you want to use the already installed packages on your machine, move onto the below steps. There is no guarantee your packages will work due to version differences, but it is generally more stable than running renv
if you already have packages installed globally.
Here are the packages directly referenced in the codebase:
- tidyverse
- lme4
- languageserver
- janitor
- zoo
- sjPlot
- car
- Perform one of the following options based on your setup:
Run the setup.sh
script in your shell terminal. The script is written in bash
, so make sure your terminal environment supports it
./setup.sh
If a permission failed error occurs, run the following command first:
chmod +x ./setup.sh
If the output says, Setup successfully finished! You can now run...
, then you can open and run main-lak24.R
and rq3-analysis.ipynb
. Otherwise, make sure you have the files in the root directory.
Rename the transaction file ds5371_tx_All_Data_7671_<timestamp>.txt
to ds5371_tx_All_Data_7671.txt
.
Then, run the setup-lak24.R
script either through the terminal or in R
. Most likely, if you don't have R
on the path, use the second option.
Rscript ./setup-lak24.R
source("./setup-lak24.R")
- Run
main-lak24.R
using your R environment. - Run
rq3-analysis.ipynb
using your Jupyter environment.
renv
won't download the libraries with a weird error "cannot edit staging" / R won't load tidyverse due to a missing file link
If you have installed everything correctly, this means that you have some libraries from a different R minor version that is interfering. You can fix this by deleting ./renv/library
, ./renv/staging
, and the location of $RENV_PATHS_CACHE
. Then, run renv::restore()
again. If that doesn't work, try uninstalling and reinstalling all packages.