diff --git a/education/.DS_Store b/education/.DS_Store deleted file mode 100644 index 10a68da46..000000000 Binary files a/education/.DS_Store and /dev/null differ diff --git a/education/dina_independent/dina_independent.Rmd b/education/dina_independent/dina_independent.Rmd index 9356fd7ed..ad0fbb12e 100644 --- a/education/dina_independent/dina_independent.Rmd +++ b/education/dina_independent/dina_independent.Rmd @@ -41,9 +41,14 @@ In educational measurement, cognitive diagnosis models (CDMs) have been used to The *deterministic inputs, noisy "and"" gate* (DINA) model [@Junker2001] is a popular conjunctive CDM, which assumes that a respondent must have mastered all required attributes in order to correctly respond to an item on an assessment. -To estimate respondents' knowledge of attributes, we need information about which attributes are required for each item. For this, we use a Q-matrix which is an $I \times K$ matrix where $q_{ik}$=1 if item $i$ requires attribute $k$ and 0 if not. $I$ is the number of items and $K$ is the number of attributes in the assessment. +To estimate respondents' mastery of attributes, we need information about which attributes are required for each item. For this, we use a Q-matrix which is an $I \times K$ matrix where $q_{ik}$=1 if item $i$ requires attribute $k$ and 0 if not. $I$ is the number of items and $K$ is the number of attributes in the assessment. -A binary latent variable $\alpha_{jk}$ indicates respondent $j$'s knowledge of attribute $k$, where $\alpha_{jk}=1$ if respondent $j$ has mastered attribute $k$ and 0 if he or she has not. Then, an underlying attribute profile of respondent $j$, $\boldsymbol{\alpha_j}$, is a binary vector of length $K$ that indicates whether or not the respondent has mastered each of the $K$ attributes. +A binary latent variable $\alpha_{jk}$ indicates respondent $j$'s mastery of +attribute $k$, where $\alpha_{jk}=1$ if respondent $j$ has mastered attribute +$k$ and 0 if he or she has not. Then, an underlying attribute profile of +respondent $j$, $\boldsymbol{\alpha_j}$, is a binary vector of length $K$ that +indicates whether or not the respondent has mastered each of the $K$ +attributes. The deterministic element of the DINA model is a latent variable $\xi_{ij}$ that indicates whether or not respondent $j$ has mastered all attributes required for item $i$: $$ @@ -110,9 +115,9 @@ $$ \mathrm{Pr}(\alpha_{jk}=1 \, | \, \boldsymbol{y}_j)=\sum_{c=1}^{C}\mathrm{Pr}(\boldsymbol{\alpha_j}=\boldsymbol{\alpha_c} \, | \, \boldsymbol{y}_j)\times\alpha_{ck}. $$ -Instead of conditioning on the parameters $\nu_c,s_i,g_i$ to obtain $\mathrm{Pr}(\boldsymbol{\alpha_j}=\boldsymbol{\alpha_c}|\boldsymbol{Y}_j=\boldsymbol{y}_j)$, we want to derive the posterior probabilities, averaged over the posterior distribution of the parameters. This is achieved by evaluating the expressions above for posterior draws of the parameters and averaging these over the MCMC iterations. Let the vector of all parameters be denoted $\boldsymbol{\theta}$ and let the posterior draw in iteration $s$ be denoted $\boldsymbol{\theta}^{(s)}_{.}$ Then we estimate the posterior probability, not conditioning on the parameters, as +Instead of conditioning on the parameters $\nu_c,s_i,g_i$ to obtain $\mathrm{Pr}(\boldsymbol{\alpha_j}=\boldsymbol{\alpha_c}|\boldsymbol{Y}_j=\boldsymbol{y}_j)$, we want to derive the posterior probabilities, averaged over the posterior distribution of the parameters. This is achieved by evaluating the expressions above for posterior draws of the parameters and averaging these over the MCMC iterations. Let the vector of all parameters be denoted $\boldsymbol{\theta}$ and let the posterior draw in iteration $t$ be denoted $\boldsymbol{\theta}^{(t)}_{.}$ Then we estimate the posterior probability, not conditioning on the parameters, as $$ -\frac{1}{S}\sum_{s=1}^{S}\mathrm{Pr}(\boldsymbol{\alpha_j}=\boldsymbol{\alpha_c} \, | \, \boldsymbol{y}_j,\boldsymbol{\theta}^{(s)}). +\frac{1}{T}\sum_{t=1}^{T}\mathrm{Pr}(\boldsymbol{\alpha_j}=\boldsymbol{\alpha_c} \, | \, \boldsymbol{y}_j,\boldsymbol{\theta}^{(t)}). $$ In [Section 1.4](#stan_nostructure), we introduce the **Stan** program with no structure for $\nu_c$. [Section 2](#stan_ind) describes modification of this **Stan** program to specify the independence model for $\nu_c$ and presents simulation results. @@ -309,7 +314,6 @@ for (k in 1:K){ wanted_pars <- c(paste0("prob_resp_attr[", 1:dina_data_ind$J, ",", k, "]")) # Get predicted posterior probabilities of each attribute mastery for all respondents posterior_prob_attr <- sim_summary[wanted_pars, c("mean")] - dim(posterior_prob_attr) # Calculate mean of the probabilities for respondents who have mastered the attributes and for those who do not table_mean[k,"Group 1"] <- mean(posterior_prob_attr[A[,k]==1]) table_mean[k,"Group 2"] <- mean(posterior_prob_attr[A[,k]==0]) @@ -371,10 +375,10 @@ colnames(alpha_patt) <- paste0("A", 1:5) alpha_patt # Assemble data list for Stan -I=ncol(y) -J=nrow(y) -K=ncol(Q) -C=nrow(alpha_patt) +I <- ncol(y) +J <- nrow(y) +K <- ncol(Q) +C <- nrow(alpha_patt) xi <- matrix(0,I,C) for (i in 1:I){ @@ -449,4 +453,4 @@ ggplot(data=estimates, aes(x=mle, y=post.means, shape=pars)) + geom_point() + g # References - \ No newline at end of file +