Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Msr dev #2

Merged
merged 5 commits into from
Aug 5, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
*.log
settings.py
secret.py
50 changes: 24 additions & 26 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,48 +6,46 @@ You need to....
* ...know what you want to download, look at the folder structure using an sftp browser before using this script

### Prerequisites
* Python 3 (tested with 3.7)
* Python 3 (tested with 3.7+)
* pysftp

### How-To
The downloader assumes the following folder structure:

#### Folder structure
The downloader assumes the following folder structure on the sftp server:

```
remote_path/{time_dirs}/{overall_dirs}
or
remote_path/{time_dirs}/intermed_dir/{subjs}/{dirs}/{subj_files}
```

The paths need to be specified in the script, e.g.

```
remote_path = "/data/imagen/2.7"
time_dirs = ["BL","FU1"]
# time_dirs = ["BL","FU1","FU2","FU3"]
intermed_dir1 = "imaging/spm_first_level"
subjs = ["000099616225","000085724167"]
dirs = ["EPI_stop_signal/","EPI_short_MID/"]
subj_files =["con_0006_stop_failure_-_stop_success.nii.gz",
"con_0005_stop_success_-_stop_failure.nii.gz"]
# overall mode:
# "overall": all directories, subdirectories and files within a folder remote_path/{time_dirs}/{overall_dirs}
overall_dirs = ["dawba/", "geolocation/","cantab/", "meta_data/", "psytools/"]
```


#### Download modes
There are four modes that help to do different things
```
mode = "dirs" # "files" or "dirs" or "subjects" or "overall"
```
1. "overall" mode: download a set of given folders recursively (including all subdirectories and files)
2. "subjects" mode: download given subject folders recursively (including all subdirectories and files)
3. "dirs" mode: download specific subdirectories within subject folders recursively
4. "files" mode: download specific files within specific subdirectories within subject folders


### ToDos
* simple switch to download all subjects in given folders that are found on the server
* better logging to check what might have gone wrong
4. "files" mode: download files which match specific patterns within specific subdirectories within subject folders

#### Steps
* clone the repository
* use secret_template.py to create a new file secret.py
* enter your login information in secret.py
* use settings_template.py to create a new file settings.py
* enter your local path settings and your download definitions in settings.py
(some examples are given in settings_template.py)
* start the script:
```
python get_data.py
```
* logfiles are created for basic information (info_logger*) and debugging information (debug_logger*) to check if something did not work.


### Limitations
* pysftp may not work properly with windows (e.g. recursive downloads may be buggy)
* download is not super fast (own experience: 1.5GB per hour)

### Caution
* not extensively tested for all use-cases, use at your own risk.
Loading