-
Notifications
You must be signed in to change notification settings - Fork 52
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add DINA examples and docs. Update docs of IRT and KaNCD. Modify pyte…
…st-flake8 version requirement in setup.py
- Loading branch information
1 parent
550a3eb
commit 9aa8460
Showing
7 changed files
with
90 additions
and
158 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,17 +1,43 @@ | ||
# Deterministic Inputs, Noisy “And” gate model | ||
The implementation of the classical cognitive diagnosis model, i.e., DINA (Deterministic Inputs, Noisy “And” gate) model. The training process is adapted to using gradient descending methods. If the reader wants to know the details of DINA, please refer to the Appendix of the paper: *[DINA model and parameter estimation: A didactic](https://journals.sagepub.com/doi/10.3102/1076998607309474)*. | ||
|
||
If the reader wants to know the details of DINA, please refer to the Appendix of the paper: *[DINA model and parameter estimation: A didactic](https://journals.sagepub.com/doi/10.3102/1076998607309474)*. | ||
```bibtex | ||
@article{de2009dina, | ||
title={DINA model and parameter estimation: A didactic}, | ||
author={De La Torre, Jimmy}, | ||
journal={Journal of educational and behavioral statistics}, | ||
volume={34}, | ||
number={1}, | ||
pages={115--130}, | ||
year={2009}, | ||
publisher={Sage Publications Sage CA: Los Angeles, CA} | ||
If this repository is helpful for you, please cite our work | ||
|
||
``` | ||
@misc{bigdata2021educdm, | ||
title={EduCDM}, | ||
author={bigdata-ustc}, | ||
publisher = {GitHub}, | ||
journal = {GitHub repository}, | ||
year = {2021}, | ||
howpublished = {\url{https://github.com/bigdata-ustc/EduCDM}}, | ||
} | ||
``` | ||
|
||
![model](_static/DINA.png) | ||
## Model description | ||
DINA model is a classical cognitive diagnosis model, where each learner $i$ is represented with a binary vector ($[\alpha_{ik}, k=1,2,...K]$ in the following figure) indicating the learner's knowledge mastery pattern. A Q-matrix $Q=\{0, 1\}^{J\times K}$ indicates relevant skills (or knowledge components) of the test items. For each test item $j$, there are possibilities to slip on it and guess the correct answer, which are characterized by the parameters $s_j$ and $g_j$ respectively. Overall, the probability that learner $i$ would provide a correct response to item $j$ is calculated as follows: | ||
$$Pr(X_{ij}=1|\alpha_i,q_j, s_j, g_j) = (1-s_j)^{\eta_{ij}}g_j^{1-\eta_{ij}},$$ | ||
|
||
$$ | ||
\eta_{ij} = \prod_{k=1}^{K}\alpha_{ik}^{q_{jk}}. | ||
$$ | ||
|
||
<img src=_static/DINA.png width=90%> | ||
|
||
## Parameters description | ||
|
||
| Parameters | Type | Description | | ||
| ---------- | ---- | ---------------------------------------- | | ||
| meta_data | dict | a dictionary containing all the userIds, itemIds, and skills. | | ||
| max_slip | float | the maximum value of possible slipping. default: 0.4 | | ||
| max_guess | float | the maximum value of possible slipping. default: 0.4 | | ||
|
||
|
||
## Methods summary | ||
|
||
| Methods | Description | | ||
| ----------------- | ---------------------------------------- | | ||
| fit | Fits the model to the training data. | | ||
| fit_predict | Use the model to predict the responses in the testing data and returns the results. The responses are either 1 (i.e., correct answer) or 0 (i.e., incorrect answer). | | ||
| fit_predict_proba | Use the model to predict the responses in the testing data and returns the probabilities (that the correct answers will be provided). | | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
# coding: utf-8 | ||
# @ WangFei | ||
import logging | ||
from EduCDM import DINA | ||
import torch | ||
from torch.utils.data import TensorDataset, DataLoader | ||
import pandas as pd | ||
import numpy as np | ||
from EduData import get_data | ||
|
||
|
||
# get_data("cdbd-a0910", "../../data") # Download dataset "cdbd-a0910" | ||
|
||
# load data and transform it to the required format | ||
train_data = pd.read_csv("../../data/a0910/train.csv") | ||
valid_data = pd.read_csv("../../data/a0910/valid.csv") | ||
test_data = pd.read_csv("../../data/a0910/test.csv") | ||
df_item = pd.read_csv("../../data/a0910/item.csv") | ||
knowledge_set, item_set = set(), set() | ||
for i, s in df_item.iterrows(): | ||
item_id, knowledge_codes = s['item_id'], list(set(eval(s['knowledge_code']))) | ||
knowledge_set.update(knowledge_codes) | ||
item_set.add(item_id) | ||
userIds = train_data['user_id'].unique() | ||
meta_data = {'userId': list(userIds), 'itemId': list(item_set), 'skill': list(knowledge_set)} | ||
train_data = (pd.merge(train_data, df_item, how='left', on='item_id') | ||
.rename(columns={'user_id': 'userId', 'item_id': 'itemId', 'knowledge_code': 'skill', 'score': 'response'})) | ||
valid_data = pd.merge(valid_data, df_item, how='left', on='item_id').rename(columns={'user_id': 'userId', 'item_id': 'itemId', 'knowledge_code': 'skill', 'score': 'response'}) | ||
test_data = pd.merge(test_data, df_item, how='left', on='item_id').rename(columns={'user_id': 'userId', 'item_id': 'itemId', 'knowledge_code': 'skill', 'score': 'response'}) | ||
|
||
# model training | ||
batch_size = 32 | ||
logging.getLogger().setLevel(logging.INFO) | ||
cdm = DINA(meta_data) | ||
cdm.fit(train_data, epoch=1, val_data=valid_data, device="cuda") | ||
|
||
# predict using the trained model | ||
print(cdm.predict(test_data)) | ||
|
||
# save model | ||
cdm.save("dina.snapshot") | ||
|
||
# load model and evaluate it on the test set | ||
cdm.load("dina.snapshot") | ||
auc, accuracy = cdm.eval(test_data) | ||
print("auc: %.6f, accuracy: %.6f" % (auc, accuracy)) | ||
|
||
|
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters