Tutorials (#165)

* Rename: sliding_window -> moving_average. * Doc: typo. * Refact: better naming and more validation. * Fix: HSMM state time course should be int. * Doc: updated tutorials.
OHBA-analysis · Jul 7, 2023 · 0b0976c · 0b0976c
1 parent b7d6817
commit 0b0976c
Show file tree

Hide file tree

Showing 28 changed files with 691 additions and 1,192 deletions.
diff --git a/doc/documentation.rst b/doc/documentation.rst
@@ -46,8 +46,9 @@ The following tutorials illustrate basic usage and analysis that can be done wit
 - :doc:`tutorials_build/dynemo_mixing_coef_analysis`.
 - :doc:`tutorials_build/dynemo_plotting_networks`.
 
-**Other**:
+More examples scripts can be found in the `examples directory <https://github.com/OHBA-analysis/osl-dynamics/tree/main/examples>`_ of the repo.
 
-- :doc:`tutorials_build/statistical_significance_testing`.
+Workshops
+---------
 
-More examples scripts can be found in the `examples directory <https://github.com/OHBA-analysis/osl-dynamics/tree/main/examples>`_ of the repo.
+- `2023 OHBA Software Library (OSL) workshop <https://osf.io/zxb6c/>`_.
diff --git a/doc/models/dynemo.rst b/doc/models/dynemo.rst
@@ -60,7 +60,7 @@ Similar to the `HMM <hmm.html>`_, we perform variational Bayes on the latent var
 Amortized Variational Inference
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-In DyNeMo, we use a new approach for variational Bayes (from variational auto-encoders [3]) known as **amortized variational inference**. Here, we train an 'inference network' (**inference RNN**) to predict the posterior distribution for the model parameters. This network learns a mapping from the observed data to the parameters of the posterior distributions. This allows us to allows us to efficiently scale to large datasets [3].
+In DyNeMo, we use a new approach for variational Bayes (from variational auto-encoders [3]) known as **amortized variational inference**. Here, we train an 'inference network' (**inference RNN**) to predict the posterior distribution for the model parameters. This network learns a mapping from the observed data to the parameters of the posterior distributions. This allows us to efficiently scale to large datasets [3].
 
 Cost Function
 ^^^^^^^^^^^^^

diff --git a/doc/tutorials/data_loading.py b/doc/tutorials/data_loading.py
@@ -3,49 +3,45 @@
 ============
 
 In this tutorial we demonstrate the various options for loading data. This tutorial covers:
- 
+
 1. The Data Class
 2. Getting Example Data
 3. Loading Data in NumPy Format
 4. Loading Data in MATLAB Format
+5. Loading Data in fif Format
 
 Note, this webpage does not contain the output of running each cell. See `OSF <https://osf.io/9768c>`_ for the expected output.
 """
 
 #%%
 # The Data class
 # ^^^^^^^^^^^^^^
-# 
 # In osl-dynamics we typically load data using the `osl_dynamics.data.Data class <https://osl-dynamics.readthedocs.io/en/latest/autoapi/osl_dynamics/data/base/index.html#osl_dynamics.data.base.Data>`_. The Data class has a lot of useful methods that can be used to modify the data.
-# 
+#
 # Inputs
 # ******
-# 
 # There is one mandatory argument that needs to be passed to the Data class: `inputs`. This can be:
-# 
+#
 # - A path to a directory containing .npy files. Each .npy file should be a subject or session.
-# - A list of paths to .npy, .mat or .fif files. Each file should be a subject or session.
+# - A list of paths to .npy, .mat, or .fif files. Each file should be a subject or session.
 # - A numpy array. The array will be treated as continuous data from the same subject.
 # - A list of numpy arrays. Each numpy array should be the data for a subject or session.
-# 
+#
 # Data format
 # ***********
-# 
 # The data files or numpy arrays should be in the format `(n_samples, n_channels)`, i.e. time by channels. If your data is in `(n_channels, n_samples)` format, use should also pass `time_axis_first=False` to the Data class.
-# 
+#
 # The temporary store directory
 # *****************************
-# 
 # Note, when we load data using the Data class it loads the data as a `memory map <https://numpy.org/doc/stable/reference/generated/numpy.memmap.html>`_. This allows us to access the data without holding it in memory. If you prefer to load the data into memory pass `load_memmaps=False`. The Data class creates a directory called `tmp` which is used for storing temporary data (memory map files and prepared data). This directory can be safely deleted after you run your script. You can specify the name of the temporary directory by passing the `store_dir` argument.
-# 
+#
 # We will demonstate how the Data class is used with example data below.
-# 
+#
 # Getting Example Data
 # ^^^^^^^^^^^^^^^^^^^^
-# 
+#
 # Download the dataset
 # ********************
-# 
 # We will download example data hosted on `OSF <https://osf.io/by2tc/>`_. Note, `osfclient` must be installed. This can be done in jupyter notebook by running::
 #
 #     !pip install osfclient
@@ -61,19 +57,24 @@ def get_data(name):
     os.remove(f"{name}.zip")
     return f"Data downloaded to: {name}"
 
-# Download the dataset (approximately 52 MB)
+# Download the dataset (approximately 88 MB)
 get_data("example_loading_data")
 
 # List the contents of the downloaded directory containing the dataset
 print("Contents of example_loading_data:")
 os.listdir("example_loading_data")
 
 #%%
-# We can see there's two directories in `example_loading_data`: `numpy_format`, which contains `.npy` files, and `matlab_format`, which contains `.mat` files. We'll show how to load data in each of these data types.
-# 
+# We can see there's three directories in `example_loading_data`:
+#
+# - `numpy_format`, which contains `.npy` files.
+# - `matlab_format`, which contains `.mat` files.
+# - `fif_format`, which contains directories with `.fif` files.
+#
+# We'll show how to load data in each of these data types.
+#
 # Loading Data in NumPy Format
 # ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-# 
 # Let's first list the `example_loading_data/numpy_format` directory.
 
 os.listdir("example_loading_data/numpy_format")
@@ -90,7 +91,6 @@ def get_data(name):
 #%%
 # Importing a numpy array directly
 # ********************************
-# 
 # If we have already loaded a numpy array and just want to create an `osl_dynamics.data.Data` object, we can simply pass it to the class:
 
 from osl_dynamics.data import Data
@@ -115,7 +115,6 @@ def get_data(name):
 #%%
 # Loading from file
 # *****************
-# 
 # Rather than loading the data into memory then creating a Data object, we could load the data directly from the file.
 
 # Just load one of the files
@@ -152,13 +151,12 @@ def get_data(name):
 #%%
 # Loading Data in MATLAB Format
 # ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-# 
 # We will discuss two methods for loading MATLAB files. First, we will load the MATLAB files using public python packages (`scipy` and `mat73`), then we'll show how to pass MATLAB files to the Data class.
-# 
-# ### Loading MATLAB files in Python
-# 
+#
+# Loading MATLAB files in Python
+# ******************************
 # The popular python package SciPy has a function for loading MATLAB files: `scipy.io.loadmat <https://docs.scipy.org/doc/scipy/reference/generated/scipy.io.loadmat.html>`_. Note, this function can only be used to load a newer version of MATLAB files, if you saved your files using `v7.3` format, then you need to use `mat73.loadmat <https://github.com/skjerns/mat7.3>`_ to load the file in python. Both of these packages are automatically installed when you install osl-dynamics.
-# 
+#
 # Let first see what files we have in the `example_loading_data/matlab_format` directory.
 
 os.listdir("example_loading_data/matlab_format")
@@ -179,10 +177,9 @@ def get_data(name):
 
 #%%
 # The important field is `X`, which is the one that contains the 2D time series data for this subject. Note, MATLAB files created using the `HMM-MAR <https://github.com/OHBA-analysis/HMM-MAR>`_ toolbox come in the above format, i.e. with a `X` and `T` field. For us, only the `X` matters.
-# 
+#
 # Loading MATLAB data into the Data class
 # ***************************************
-# 
 # We can pass the numpy array contained in the `X` field of the dictionary directly to the Data class:
 
 data = Data(mat["X"])
@@ -196,16 +193,57 @@ def get_data(name):
 
 #%%
 # Note, the default value for the `data_field` argument is `X`, so the Data class would still be able to load the data without it being passed. The `data_field` is useful if the data is contained in a MATLAB in a field with a different name.
-# 
+#
 # If we wanted to load multiple data files in MATLAB format we would need to pass a list of file paths.
 
 files = [f"example_loading_data/matlab_format/subject{i}.mat" for i in [0, 1]]
 data = Data(files)
 print(data)
 
+#%%
+# Loading fif files
+# *****************
+# Another data format that can be loaded with the Data class is fif files. This format is commonly used in `MNE-Python <https://mne.tools/stable/index.html>`_ and is the data format used in `OSL <https://github.com/OHBA-analysis/osl>`_. Here, we will load source reconstruct (parcellated) data created with OSL. In OSL, we often have a separate directory for each subject. The `fif_format` directory contains two directories for different subjects.
+
+os.listdir("example_loading_data/fif_format")
+
+#%%
+# Let's see what's inside `subj001_run01`.
+
+os.listdir("example_loading_data/fif_format/subj001_run01")
+
+#%%
+# We have a fif file which contains the data for this subject. We could load this with MNE.
+
+import mne
+
+raw = mne.io.read_raw_fif("example_loading_data/fif_format/subj001_run01/sflip_parc-raw.fif")
+print(raw.info)
+
+#%%
+# We can see this particular fif file contains 38 `misc` channels and 3 `stim` channels. We're interested in the `misc` channels. Let's load these into the Data class.
+
+data = Data(
+    "example_loading_data/fif_format/subj001_run01/sflip_parc-raw.fif",
+    picks="misc",
+    reject_by_annotation="omit",
+)
+print(data)
+
+#%%
+# The `reject_by_annotation="omit"` argument is used to make sure we don't include bad segments. This argument is passed to `Raw.get_data <https://mne.tools/stable/generated/mne.io.Raw.html#mne.io.Raw.get_data>`_ in MNE.
+#
+# To load multiple subjects we can do:
+
+files =[
+    f"example_loading_data/fif_format/subj{i:03d}_run01/sflip_parc-raw.fif"
+    for i in range(1,3)
+]
+data = Data(files, picks="misc", reject_by_annotation="omit")
+print(data)
+
 #%%
 # Wrap Up
 # ^^^^^^^
-# 
 # - We've shown how to load data using the Data class in osl-dynamics.
 # - To see how we can prepare data for training a model, see the `Preparing Data tutorial <https://osl-dynamics.readthedocs.io/en/latest/tutorials_build/data_preparation.html>`_.