narps/ImageAnalyses at master · poldrack/narps

History

Name		Name	Last commit message	Last commit date
parent directory ..
metadata_files		metadata_files
original_notebooks		original_notebooks
ALEextract.py		ALEextract.py
AnalyzeMaps.py		AnalyzeMaps.py
CheckSimulatedValues.py		CheckSimulatedValues.py
ClusterImageCorrelation.py		ClusterImageCorrelation.py
ConsensusAnalysis.py		ConsensusAnalysis.py
CorrelatedMetaNotes.Rmd		CorrelatedMetaNotes.Rmd
CorrelatedMetaNotes.html		CorrelatedMetaNotes.html
DecisionAnalysis.Rmd		DecisionAnalysis.Rmd
GetMeanSimilarity.py		GetMeanSimilarity.py
MakeCombinedClusterFigures.py		MakeCombinedClusterFigures.py
MakeNeurovaultCollection.py		MakeNeurovaultCollection.py
MakeSuppFigures.py		MakeSuppFigures.py
MakeSuppTables.py		MakeSuppTables.py
MakeSupplementaryFigure1.py		MakeSupplementaryFigure1.py
Makefile		Makefile
MetaAnalysis.py		MetaAnalysis.py
PrepareData.py		PrepareData.py
PrepareMapDiagnosticReports.py		PrepareMapDiagnosticReports.py
PrepareMetadata.py		PrepareMetadata.py
PreprocessMaps.py		PreprocessMaps.py
README.md		README.md
README_outputs.txt		README_outputs.txt
SimulateData.py		SimulateData.py
SuppFiguresInfo.tsv		SuppFiguresInfo.tsv
SuppTablesInfo.tsv		SuppTablesInfo.tsv
ThreshVoxelStatistics.py		ThreshVoxelStatistics.py
ThresholdingSim.py		ThresholdingSim.py
ValueDiagnostics.py		ValueDiagnostics.py
get_3d_peaks.py		get_3d_peaks.py
get_neurovault_collection.sh		get_neurovault_collection.sh
narps.py		narps.py
tests.py		tests.py
tests_ale.py		tests_ale.py
tests_consensus.py		tests_consensus.py
tests_maps.py		tests_maps.py
tests_minimal.py		tests_minimal.py
tests_simulation.py		tests_simulation.py
tests_threshold.py		tests_threshold.py
utils.py		utils.py

README.md

Code for NARPS data analysis

Setup

The required data (all contained in the unzipped orig directory) are:

thresholded and unthresholded images for each team/hypothesis (teams excluded from the main analysis are included in the rejected directory)
Metadata files:
- analysis_pipelines_for_analysis.xlsx (information about analysis pipelines)
- narps_neurovault_images_details_responses_corrected.csv (information about images) -] narps_results.xlsx (information about team decisions)

Dockerized analysis pipeline

To run the full analysis pipeline using Docker, you will need at least about 12 GB of free space on your hard drive. To run it:

Install the Docker client
Install the git client if you don't already have it.
If you are on the Mac, you will need to install the XCode Command Line Tools if they are not already installed, using the following command:
- xcode-select --install
Open a shell (e.g. Terminal on Mac OS) and set the following environment variables:
- NARPS_BASEDIR: location for the data and results - e.g. if you would like the data to be stored to /Users/poldrack/NARPS then you would use the command:
  - export NARPS_BASEDIR=/Users/poldrack/NARPS
- DATA_URL: URL for the data:
  - export DATA_URL=https://zenodo.org/record/3528329/files/narps_origdata_1.0.tgz
clone the present repository using the following command in the shell:
- git clone https://github.com/poldrack/narps.git
cd to narps/ImageAnalyses
use the following command to run the full pipeline: make run-all

This performs the following steps:

Preparation of the data for analysis (using PreprocessMaps.py)
Preparation of the metadata (using PrepareMetadata.py)
Analysis of the maps (using AnalyzeMaps.py)
Analysis of the decisions (using AnalyzeDecisions.Rmd)
Consensus analysis across teams (using ConsensusAnalysis.py)
ALE meta-analysis across teams (using MetaAnalysis.py)

The outputs can be found in the subdirectories of NARPS_BASEDIR:

outputs - all of the intermediate maps generated by preprocessing
- thresh_mask_orig - binarized versions of thresholded maps
- resampled - unthresholded and thresholded maps resampled into common MNI152 2mm space
- rectified - unthresholded maps with sign flipped for reverse contrasts, to ensure that all maps are positive for each contrast
- zstat - unthresholded and thresholded maps converted to z statistics
figures - figures generated by the analysis code
metadata - additional metadata/results files generated by the preparation code
- all_metadata.csv - all metadata used for anlayses
- median_pattern_distance.csv - median pattern distances for each hypothesis
- thresh_voxel_statistics.csv - statistics on thresholded maps
- smoothness_est.csv - estimaed smoothness for each map
logs - logs for the analyses, which include summary outputs for some analyses
cached - the cached narps structure

This will use the latest version of the docker image from Dockerhub. If you wish to build the Docker image locally, you should change the DOCKER_USERNAME variable in the Makefile to your own username, and then run make docker-build.

NOTE: The docker container will continue to exist on your hard drive after you have run the analysis. To remove it and recover the space on your hard drive, using the command docker system prune

Data provenance

The data provided for download were obtained using from Neurovault using PrepareData.py. The tarball includes files describing the provenance of the downloaded data (including MD5 hashes for identity checking).

Reproducibility

We have attempted to maximize the reproducibility of the analyses in this project as follows:

Python/UNIX: All software versions for UNIX and Python packages are pinned in the Dockerfile
R: R is tricky because it doesn't provide a straightforward way to specify versions of libraries. We use a package called checkpoint, which downloads the versions of all necessary packages as of a specific date (which we have set to 2019-07-16). The checkpoint packages analyzes R code in the project and downloads any libraries that are loaded by the code; unfortunately it doesn't read Rmd files, so we create a separate file called R_libraries.R that contains the imports needed to run the Rmd file, and which will be detected by checkpoint such that those libraries are loaded automatically. This is generated automatically when the analysis is run, and can also be generated using make get-R-packages.

Local execution

Execution via Docker is recommended, but the analysis can also be run locally, using make run-all-local - this will require that you have all of the various requirements in place, which must be inferred from the Dockerfile.

Simulation mode

In order to validate the analysis stream, we have included the ability to run the entire analysis stream on simulated data. This requires that the entire analysis (e.g. run all) has first been run in regular mode, because the simulated data are generated on the basis of the consensus analysis results. The data are generated so that most teams will be correlated, but four teams will be anticorrelated and four teams will be pure noise, and 12 teams have higher variance than the others. See the code in narps.py for more details.

To run in simulation model, you should use python narps.py -s. The specified basedir should be the one in which you have already run the full analysis. A new base directory, with "_simulated" appended to the original directory name, will be generated and the new data will be generated in that directory. The subsequent analysis methods can then be applied to that new simulated directory structure. This entire process can be executed using Docker via make run-simulated.

The results of the simulated analysis are checked using CheckSimulatedValues.py to ensure that the simulated results closely match the intended results. In addition, we visually check the clustering to ensure that the heatmaps and clustering identify the noise/flipped/high variance teams.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ImageAnalyses

ImageAnalyses

README.md

Code for NARPS data analysis

Setup

Dockerized analysis pipeline

Data provenance

Reproducibility

Local execution

Simulation mode

Files

ImageAnalyses

Directory actions

More options

Directory actions

More options

Latest commit

History

ImageAnalyses

Folders and files

parent directory

README.md

Code for NARPS data analysis

Setup

Dockerized analysis pipeline

Data provenance

Reproducibility

Local execution

Simulation mode