The required data (all contained in the unzipped orig
directory) are:
-
thresholded and unthresholded images for each team/hypothesis (teams excluded from the main analysis are included in the
rejected
directory) -
Metadata files:
analysis_pipelines_for_analysis.xlsx
(information about analysis pipelines)narps_neurovault_images_details_responses_corrected.csv
(information about images) -]narps_results.xlsx
(information about team decisions)
To run the full analysis pipeline using Docker, you will need at least about 12 GB of free space on your hard drive. To run it:
-
Install the Docker client
-
Install the git client if you don't already have it.
-
If you are on the Mac, you will need to install the XCode Command Line Tools if they are not already installed, using the following command:
xcode-select --install
-
Open a shell (e.g. Terminal on Mac OS) and set the following environment variables:
NARPS_BASEDIR
: location for the data and results - e.g. if you would like the data to be stored to /Users/poldrack/NARPS then you would use the command:export NARPS_BASEDIR=/Users/poldrack/NARPS
DATA_URL
: URL for the data:export DATA_URL=https://zenodo.org/record/3528329/files/narps_origdata_1.0.tgz
-
clone the present repository using the following command in the shell:
git clone https://github.com/poldrack/narps.git
-
cd to narps/ImageAnalyses
-
use the following command to run the full pipeline:
make run-all
This performs the following steps:
- Preparation of the data for analysis (using PreprocessMaps.py)
- Preparation of the metadata (using PrepareMetadata.py)
- Analysis of the maps (using AnalyzeMaps.py)
- Analysis of the decisions (using AnalyzeDecisions.Rmd)
- Consensus analysis across teams (using ConsensusAnalysis.py)
- ALE meta-analysis across teams (using MetaAnalysis.py)
The outputs can be found in the subdirectories of NARPS_BASEDIR:
-
outputs
- all of the intermediate maps generated by preprocessingthresh_mask_orig
- binarized versions of thresholded mapsresampled
- unthresholded and thresholded maps resampled into common MNI152 2mm spacerectified
- unthresholded maps with sign flipped for reverse contrasts, to ensure that all maps are positive for each contrastzstat
- unthresholded and thresholded maps converted to z statistics
-
figures
- figures generated by the analysis code -
metadata
- additional metadata/results files generated by the preparation codeall_metadata.csv
- all metadata used for anlaysesmedian_pattern_distance.csv
- median pattern distances for each hypothesisthresh_voxel_statistics.csv
- statistics on thresholded mapssmoothness_est.csv
- estimaed smoothness for each map
-
logs
- logs for the analyses, which include summary outputs for some analyses -
cached
- the cached narps structure
This will use the latest version of the docker image from Dockerhub. If you wish to build the Docker image locally, you should change the DOCKER_USERNAME variable in the Makefile to your own username, and then run make docker-build
.
NOTE: The docker container will continue to exist on your hard drive after you have run the analysis. To remove it and recover the space on your hard drive, using the command docker system prune
The data provided for download were obtained using from Neurovault using PrepareData.py. The tarball includes files describing the provenance of the downloaded data (including MD5 hashes for identity checking).
We have attempted to maximize the reproducibility of the analyses in this project as follows:
- Python/UNIX: All software versions for UNIX and Python packages are pinned in the Dockerfile
- R: R is tricky because it doesn't provide a straightforward way to specify versions of libraries. We use a package called checkpoint, which downloads the versions of all necessary packages as of a specific date (which we have set to 2019-07-16). The checkpoint packages analyzes R code in the project and downloads any libraries that are loaded by the code; unfortunately it doesn't read Rmd files, so we create a separate file called R_libraries.R that contains the imports needed to run the Rmd file, and which will be detected by checkpoint such that those libraries are loaded automatically. This is generated automatically when the analysis is run, and can also be generated using
make get-R-packages
.
Execution via Docker is recommended, but the analysis can also be run locally, using make run-all-local
- this will require that you have all of the various requirements in place, which must be inferred from the Dockerfile.
In order to validate the analysis stream, we have included the ability to run the entire analysis stream on simulated data. This requires that the entire analysis (e.g. run all) has first been run in regular mode, because the simulated data are generated on the basis of the consensus analysis results. The data are generated so that most teams will be correlated, but four teams will be anticorrelated and four teams will be pure noise, and 12 teams have higher variance than the others. See the code in narps.py for more details.
To run in simulation model, you should use python narps.py -s
. The specified basedir should be the one in which you have already run the full analysis. A new base directory, with "_simulated" appended to the original directory name, will be generated and the new data will be generated in that directory. The subsequent analysis methods can then be applied to that new simulated directory structure. This entire process can be executed using Docker via make run-simulated
.
The results of the simulated analysis are checked using CheckSimulatedValues.py
to ensure that the simulated results closely match the intended results. In addition, we visually check the clustering to ensure that the heatmaps and clustering identify the noise/flipped/high variance teams.