-
Notifications
You must be signed in to change notification settings - Fork 44
Abstract Parsing in Meta Analysis
The process of parsing through abstracts for exclusion/inclusion can be made more automated with thoughtful researcher choices for abstract parsing vocabulary and the qdap
package's termco.a
function.
The steps to importing a references library into R and parsing with termco.a
depend on researcher choices and the citation program the researcher uses (Endnote, Zotero and Jabref [bibtex] are some of the key programs researchers use). The following series of videos and scripts are meant as a choose your own destiny style tutorial rather than a linear sequence of steps to pass through. The following list is a general pathway a researcher may take (the eventual goal is to get your reference file into a .csv format):
The following programs are open source tools used in this tutorial: Zotero, Jabref, Mendeley
###General Outline of the Process
-
If you use Endnote export your Endnote (.enl) library to Zotero (.ris) [got to step 2]
-
If you use Zotero export your Zotero (.ris) library to bibtex (.bib) format [go to step 3]
-
If you use bibtex (.bib; I use Jabref) export to csv and then import into R
-
Make sure you gathered from non traditional data bases as well (eg google Scholar, ProQuest Dissertations and Theses) as these may be sources of unpublished work and dissertations. Because often non significant results are never published and wind up on google scholar, using these types ofwork in your analysis can help reduce publication bias.
-
Remove duplicates. Mendeley is particularly good at removing duplicates. For a series of videos on using Mendeley click here
###Gathering Reference Search Results from Various Data Bases
Exporting From Database to Zotero (.ris)
Exporting From google Scholar to Zotero (.ris)
Exporting From Database to Jabref (.bib)
Exporting From ProQuest to Zotero (.ris)
###Preparing References for Importing Into R
Exporting Endnote Library (.enl) to Zotero (.ris)
Exporting Zotero Library (.ris) to Jabref Library (.bib)
Exporting Jabref Library (.bib) to .csv
###Importing References Into R and Cleaning
library(qdap)
url_dl("ref_test.csv")
options(width=10000)
#header = FALSE allows all columns to read in
x <- read.csv("ref_test.csv", header=FALSE, row.names=NULL, stringsAsFactors = FALSE)
htruncdf(x, 20)
truncdf(x)
colnames(x)[1:26] <- as.character(unlist(x[1, 1:26]))
x <- x[-1, ]; rownames(x) <- NULL #remove first row (this was the header)
#remove any empty columns and rows
FUN <- function(x) !all(is.na(x)) #function to remove blank columns
x <- x[, sapply(x, FUN)] #remove blank columns
#function to rm empty columns
metaclean <- function(x)gsub('\"', "", gsub("\\.(?=\\.*$)", "", x, perl=TRUE))
x <- rm_empty_row(x) #remove blank rows
htruncdf(x, 20) #use this to make decisions about which columns to keep/paste
#create an index of the columns containing the abstract and key terms pieces
index <- which(colnames(x) %in% qcv(V27, V31))
truncdf(x[, index])
z <- data.frame(id=1:nrow(x), x[, 1:which(colnames(x) == "Year")],
abstract = metaclean(scrubber(paste2(x[, index[1]:index[2]]))),
stringsAsFactors = FALSE)
truncdf(z, 10) #view it
z$abstract #the abstract
#remove symbols etc
parse.symb <- c("{", "}", "(", ")", "/", "-") #vector of removal terms
z$abstract <- mgsub(parse.symb, " ", z$abstract)
v <- split(z, scrubber(z$abstract )%in% c("", " ")) #separate files with blank abstracts
# delete("ref_test.csv") #delete the sample csv file
###Using qdap
to Analyze Abstracts
This is a continuation from the Importing References Into R and Cleaning script.
Video to Accompany the Script Below
library(qdap)
url_dl("ref_test_clean.csv")
v <- list(read.csv("ref_test_clean.csv",
row.names=NULL, stringsAsFactors = FALSE), NA)
options(width=10000)
#generate word lists (dictionaries) to exclude/include terms
matches <- list(
gender = c(" male", " female", " women", " man ", " men ", " boy", " girl"),
brain = c(" brain", " cogni", " process", " mental"),
reading = c(" read ", " reads ", " reading ", " comprehen", " strat", " skill"),
teach = c(" teach", " taught", " instruct", " pedagogy")
)
a <- with(v[[1]], termco.a(abstract, id, match.list = matches,
short.term=TRUE, ignore.case = TRUE))
a
names(a)
head(a$raw, 20)
b <- with(v[[1]], termco.a(abstract, id, match.list = unlist(matches),
short.term=TRUE, ignore.case = TRUE))
b
head(b$raw, 20)
termco2mat(b$raw)
v[[2]] # <- find abstracts for these ones (they were missing)
htruncdf(a$raw)
All <- rowSums(a$raw[, 3:6]) > 0
brain <- a$raw[, 4] > 0
reading <- a$raw[, 5] > 0
#Articles that contain > 0 for each category
Reduce(`+`, lapply(a$raw[, 3:6], function(x) x > 0))
#notice that no article contained every category (n = 4)
p <- z[brain, ]
truncdf(p)
write.table(p, file = "foo.csv", sep = ",", col.names = T,
row.names=F, qmethod = "double")
truncdf(z[All, ])
# delete("ref_test_clean.csv") #delete the sample csv file
###Using qdap
to Classify an Article as Qualitative or Quantitative
Video to Accompany the Script Below
code to come