Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding final code #77

Draft
wants to merge 60 commits into
base: develop
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
60 commits
Select commit Hold shift + click to select a range
aa788f0
Updates to plotting funcs
berkgercek May 8, 2023
0c807c6
subpanel plot script for only units used in paper
berkgercek Jun 5, 2023
6c37ad2
Compatibility for plots with pandas 2.0
berkgercek Jun 16, 2023
051984e
Refactor subpanel_plots
berkgercek Jun 16, 2023
cf971d4
Changes to allow for direct plotting of fig panels
berkgercek Jun 22, 2023
44a07f4
Update utils.py
mschart Jun 22, 2023
e2a5747
add Berk's panels
mschart Jun 22, 2023
fe8a2ae
Rename bwm_figures.py to bwm_fig.py
mschart Jun 22, 2023
05be8a1
Include Berk's panels
mschart Jun 23, 2023
849bfa6
Update and rename bwm_fig.py to bwm_figs.py
mschart Jun 23, 2023
61cf501
works now for block, choice, stim, fback
mschart Jun 26, 2023
4153da8
Update bwm_figs.py
mschart Jun 27, 2023
37cec32
add some axis labels and table column names
mschart Jul 3, 2023
d6a96f2
add result pooling function
mschart Jul 6, 2023
342622a
harmonize file paths
mschart Jul 6, 2023
43eea02
save fit weights
berkgercek Sep 22, 2023
94e1a37
Update README.md
berkgercek Oct 3, 2023
6500aeb
save kernels after fitting
berkgercek Oct 3, 2023
76d116b
saving kernels and analyze using manifold methods
berkgercek Oct 3, 2023
7ae045f
Merge meta scripts into main
juhuntenburg May 8, 2023
f7c1f2f
Update bwm_figures.py
mschart May 10, 2023
4372ef4
add effect sizes to table
mschart May 23, 2023
62530cb
truncate values
mschart May 24, 2023
5772a54
mean and median for weight and age
GaelleChapuis May 25, 2023
73b2a30
remove pinned pandas version to match ibllib
oliche Jun 1, 2023
9cb5a24
qc adjustable in load_good_units
juhuntenburg Jun 2, 2023
4a16e57
small fix about qc value
juhuntenburg Jun 3, 2023
129b51b
add canonical info
mschart Jun 5, 2023
29d737b
Include wheel speed and velocity results
mschart Jun 7, 2023
bb4a8f4
fix Cosmos column
mschart Jun 13, 2023
34ef07a
fix Beryl Cosmos map for table
mschart Jun 13, 2023
98931db
add column for Cosmos names
mschart Jun 19, 2023
6b5b985
swap columns
mschart Jun 19, 2023
84b37a1
Add files via upload
mschart Jun 19, 2023
f6cb5f0
add prior paper columns
mschart Jun 26, 2023
f227e4f
again, less rounded
mschart Jun 26, 2023
061a4e1
correct typo
mschart Jun 26, 2023
a9dce24
Update region_info.csv
mschart Jun 26, 2023
bd27559
column name change
mschart Jun 26, 2023
c630b12
Atlas imports
k1o0 Aug 8, 2023
5aa2b59
groom the repository
oliche Aug 25, 2023
f0b2a88
fix links ?
oliche Aug 25, 2023
d3dcb32
add a link to the Imbizo course
oliche Aug 25, 2023
a6edc87
fix ONE link
juhuntenburg Dec 12, 2023
d370747
new release
juhuntenburg Dec 19, 2023
f15bff6
Merge remote-tracking branch 'origin/main' into encoding-revisions
berkgercek Jan 6, 2024
5861b00
Edits to fix fns
berkgercek Jan 29, 2024
4b1b749
Simpler dockerfile
berkgercek Jun 23, 2024
5b86fa1
Merge branch 'main' into encoding-revisions
berkgercek Jun 23, 2024
b0d200f
Merge branch 'develop' into encoding-revisions
oliche Jul 1, 2024
2a9784e
Minor docker file changes
berkgercek Jul 15, 2024
ddf61dc
Merge branch 'encoding-revisions' of https://github.com/int-brain-lab…
berkgercek Jul 15, 2024
9396815
changes to load scripts
berkgercek Jul 1, 2024
7680718
Add option for early RT splits
berkgercek Sep 9, 2024
1283ba5
fixed bug in saving unit cluster IDs
berkgercek Sep 9, 2024
f4b220c
allow for early RT fitting
berkgercek Sep 13, 2024
6466122
Docker updates and early RT fitting
berkgercek Sep 16, 2024
7033d5c
bugfix
berkgercek Sep 16, 2024
fa92142
dockerfile fix
berkgercek Sep 16, 2024
9651085
Update docker spec and remove pytorch deps
berkgercek Sep 16, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 9 additions & 16 deletions brainwidemap/encoding/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM nvidia/cuda:11.7.1-devel-ubuntu22.04
FROM ubuntu:latest
# This can optionally be built with just ubuntu, rather than the nvidia cuda container.
# If saving space is a concern, this is the way to go.
LABEL description="Core container which has the basic necessities to run analyses in the\
Expand All @@ -15,24 +15,17 @@ COPY ./environment.yaml /data/environment.yaml
SHELL ["/bin/bash", "-c"]
# For some reason ibllib.io.video needs opencv which requires libgl1-mesa-dev ¯\_(ツ)_/¯
RUN apt update && apt install -y wget git libgl1-mesa-dev
RUN wget -O Mambaforge.sh "https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh"
RUN bash Mambaforge.sh -b -p /opt/conda && rm Mambaforge.sh
RUN wget -O Miniforge3.sh "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
RUN bash Miniforge3.sh -b -p /opt/conda && rm Miniforge3.sh
RUN wget -O iblreq.txt "https://raw.githubusercontent.com/int-brain-lab/ibllib/master/requirements.txt"
RUN head -n -1 iblreq.txt > requirements.txt
RUN rm iblreq.txt
RUN /bin/bash -c "source /opt/conda/etc/profile.d/conda.sh && \
mamba install --yes conda-build &&\
mamba env create -n iblenv --file=environment.yaml"
RUN /bin/bash -c "source /opt/conda/etc/profile.d/conda.sh &&\
conda activate iblenv &&\
mamba install --yes pytorch pytorch-cuda=11.7 -c pytorch -c nvidia &&\
conda clean --all -f -y"
RUN /bin/bash -c "source /opt/conda/etc/profile.d/conda.sh &&\
conda activate iblenv &&\
pip install globus-sdk iblutil ibl-neuropixel ONE-api phylib pynrrd slidingRP &&\
git clone https://github.com/int-brain-lab/iblapps.git &&\
conda develop ./iblapps &&\
git clone https://github.com/int-brain-lab/ibllib &&\
conda develop ./ibllib &&\
git clone https://github.com/berkgercek/neurencoding &&\
conda develop ./neurencoding"
RUN /bin/bash -c "source /opt/conda/etc/profile.d/conda.sh && \
conda activate iblenv && pip install -r requirements.txt && pip install ibllib --no-deps"
RUN rm requirements.txt
# The below allows interactively running the container with the correct environment, but be warned
# that this will not work with commands passed to the container in a non-interactive shell.
# In the case of e.g. `docker run thiscontainer python myscript.py`, the environment will not
Expand Down
2 changes: 1 addition & 1 deletion brainwidemap/encoding/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ The `scripts/` folder contains small scripts that either run plotting or simple

### Cluster worker

`cluster_worker.py` implements a mother script for cluster workers to process individual probe insertions. This relies on a cached dataset, produced using the `pipelines/01_cache_regressors.py` script, as well as several files specifying the identity and parameters of a cached dataset and the parameters of the current run of the model.
`cluster_worker.py` implements a mother script for cluster workers to process individual probe insertions. This relies on a cached dataset, produced using the `pipelines/01_cache_regressors.py` script, as well as several files specifying the identity and parameters of a cached dataset and the parameters of the current run of the model. Note that the params.py file wil need to point to the appropriate cache locations as well for the worker to function.

### Design matrix

Expand Down
62 changes: 52 additions & 10 deletions brainwidemap/encoding/cluster_worker.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@

# Third party libraries
import numpy as np
from pandas import read_pickle

# Brainwidemap repo imports
from brainwidemap.encoding.design import generate_design
Expand All @@ -23,7 +24,7 @@

def get_cached_regressors(fpath):
with open(fpath, "rb") as fo:
d = pickle.load(fo)
d = read_pickle(fo)
return d["trialsdf"], d["spk_times"], d["spk_clu"], d["clu_regions"], d["clu_df"]


Expand All @@ -37,9 +38,9 @@ def _create_sub_sess_path(parent, subject, session):
return sesspath


def save_stepwise(subject, session_id, fitout, params, probes, input_fn, clu_reg, clu_df, fitdate):
def save_stepwise(subject, session_id, fitout, params, probes, input_fn, clu_reg, clu_df, fitdate, splitstr=""):
sesspath = _create_sub_sess_path(GLM_FIT_PATH, subject, session_id)
fn = sesspath.joinpath(f"{fitdate}_{probes}_stepwise_regression.pkl")
fn = sesspath.joinpath(f"{fitdate}_{probes}{splitstr}_stepwise_regression.pkl")
outdict = {
"params": params,
"probes": probes,
Expand Down Expand Up @@ -81,14 +82,41 @@ def fit_save_inputs(
t_before,
fitdate,
null=None,
earlyrts=False,
laterts=False,
):
stdf, sspkt, sspkclu, sclureg, scluqc = get_cached_regressors(eidfn)
sessprior = stdf["probabilityLeft"]
sessdesign = generate_design(stdf, sessprior, t_before, **params)
match (earlyrts, laterts):
case (False, False):
splitstr = ""
case (True, False):
splitstr = "_earlyrt"
case (False, True):
splitstr = "_latert"
if not earlyrts and not laterts:
sessdesign = generate_design(stdf, sessprior, t_before, **params)
else:
# Handle early and late RT flags, compute median for session if necessary
if "rt_thresh" not in params:
raise ValueError("Must specify rt_thresh if fitting early or late RTs")
if laterts and earlyrts:
raise ValueError(
"Cannot fit both early and late RTs. Disable both flags to fit all trials."
)
if params["rt_thresh"] == "session_median":
params["rt_thresh"] = np.median(stdf["firstMovement_times"] - stdf["trial_start"])

if earlyrts:
mask = (stdf["firstMovement_times"] - stdf["trial_start"]) < params["rt_thresh"]
elif laterts:
mask = (stdf["firstMovement_times"] - stdf["trial_start"]) >= params["rt_thresh"]
stdf = stdf[mask]
sessdesign = generate_design(stdf, sessprior, t_before, **params)
if null is None:
sessfit = fit_stepwise(sessdesign, sspkt, sspkclu, **params)
outputfn = save_stepwise(
subject, eid, sessfit, params, probes, eidfn, sclureg, scluqc, fitdate
subject, eid, sessfit, params, probes, eidfn, sclureg, scluqc, fitdate, splitstr
)
elif null == "pseudosession_pleft_iti":
sessfit, nullfits = fit_stepwise_with_pseudoblocks(
Expand All @@ -114,11 +142,13 @@ def fit_save_inputs(


if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Cluster GLM fitter. This script is called by"
"the batch script generated in "
"pipelines/02_fit_sessions.py and should in most "
"cases beyond debugging not be used in a "
"standalone fashion.")
parser = argparse.ArgumentParser(
description="Cluster GLM fitter. This script is called by"
"the batch script generated in "
"pipelines/02_fit_sessions.py and should in most "
"cases beyond debugging not be used in a "
"standalone fashion."
)
parser.add_argument(
"datafile",
type=Path,
Expand All @@ -131,6 +161,16 @@ def fit_save_inputs(
)
parser.add_argument("fitdate", help="Date of fit for output file")
parser.add_argument("--impostor_path", type=Path, help="Path to main impostor df file")
parser.add_argument(
"--earlyrt",
action="store_true",
help="Whether to fit separate movement kernels to early trials",
)
parser.add_argument(
"--latert",
action="store_true",
help="Whether to fit separate movement kernels to late trials",
)
args = parser.parse_args()

with open(args.datafile, "rb") as fo:
Expand All @@ -154,6 +194,8 @@ def fit_save_inputs(
t_before,
args.fitdate,
null=params["null"],
earlyrts=args.earlyrt,
laterts=args.latert,
)
print("Fitting completed successfully!")
print(outputfn)
7 changes: 3 additions & 4 deletions brainwidemap/encoding/design.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,12 @@
# Standard library
import logging

# IBL libraries
import neurencoding.design_matrix as dm

# Third party libraries
import numpy as np
import pandas as pd
from scipy.stats import norm

# IBL libraries
import neurencoding.design_matrix as dm

_logger = logging.getLogger("brainwide")

Expand Down
21 changes: 2 additions & 19 deletions brainwidemap/encoding/environment.yaml
Original file line number Diff line number Diff line change
@@ -1,31 +1,14 @@
name: iblenv
dependencies:
- python=3.9
- apptools >= 4.5.0
- boto3
- click
- colorcet
- colorlog
- cython
- dataclasses
- flake8
- graphviz
- h5py
- python=3.10
- ipython
- matplotlib
- numba
- numpy
- pandas
- plotly
- pyarrow
- pyflakes >= 2.4.0
- pytest
- requests
- scikit-learn
- scipy >=1.4.1
- seaborn
- statsmodels
- tqdm
- pip
- pip:
- opencv-python
- pyqt<6
22 changes: 19 additions & 3 deletions brainwidemap/encoding/fit.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,25 @@
from brainwidemap.encoding.design import generate_design


def fit(design, spk_t, spk_clu, binwidth, model, estimator, n_folds=5, contiguous=False, **kwargs):
def fit(
design,
spk_t,
spk_clu,
binwidth,
model,
estimator,
n_folds=5,
contiguous=False,
mintrials=100,
**kwargs
):
"""
Function to fit a model using a cross-validated design matrix.
"""
trials_idx = design.trialsdf.index
nglm = model(design, spk_t, spk_clu, binwidth=binwidth, estimator=estimator, mintrials=0)
nglm = model(
design, spk_t, spk_clu, binwidth=binwidth, estimator=estimator, mintrials=mintrials
)
splitter = KFold(n_folds, shuffle=not contiguous)
scores, weights, intercepts, alphas, splits = [], [], [], [], []
for test, train in splitter.split(trials_idx):
Expand Down Expand Up @@ -52,6 +65,7 @@ def fit_stepwise(
estimator,
n_folds=5,
contiguous=False,
mintrials=100,
seqsel_kwargs={},
seqselfit_kwargs={},
**kwargs
Expand Down Expand Up @@ -107,7 +121,9 @@ def fit_stepwise(
splits: list of dicts containing the test and train indices for each fold.
"""
trials_idx = design.trialsdf.index
nglm = model(design, spk_t, spk_clu, binwidth=binwidth, estimator=estimator, mintrials=0)
nglm = model(
design, spk_t, spk_clu, binwidth=binwidth, estimator=estimator, mintrials=mintrials
)
splitter = KFold(n_folds, shuffle=not contiguous)
sequences, scores, deltas, splits = [], [], [], []
for test, train in tqdm(splitter.split(trials_idx), desc="Fold", leave=False):
Expand Down
60 changes: 43 additions & 17 deletions brainwidemap/encoding/glm_predict.py
Original file line number Diff line number Diff line change
Expand Up @@ -204,23 +204,7 @@ def psth_summary(self, align_time, unit, t_before=0.1, t_after=0.6, trials=None,
ax=ax[0],
smoothing=0.01,
)
keytuple = (align_time, t_before, t_after, tuple(trials))
if keytuple not in self.full_psths:
self.full_psths[keytuple] = pred_psth(
self.nglm, align_time, t_before, t_after, trials=trials
)
self.cov_psths[keytuple] = {}
tmp = self.cov_psths[keytuple]
for cov in self.covar:
tmp[cov] = pred_psth(
self.nglm,
align_time,
t_before,
t_after,
targ_regressors=[cov],
trials=trials,
incl_bias=False,
)
keytuple = self.compute_model_psth(align_time, t_before, t_after, trials)
for cov in self.covar:
ax[2].plot(self.combweights[cov].loc[unit])
ax[2].set_title("Individual kernels (not PSTH contrib)")
Expand All @@ -244,3 +228,45 @@ def psth_summary(self, align_time, unit, t_before=0.1, t_after=0.6, trials=None,
plt.suptitle(f"Unit {unit}")
plt.tight_layout()
return ax

def compute_model_psth(self, align_time, t_before, t_after, trials):
"""
Generate and store internally model PSTHs for a given alignment time and trials.

Parameters
----------
align_time : str
Column in the design matrix to align PSTH to
t_before : float
Time before the align time to compute PSTH for
t_after : _type_
Time after the align time to compute PSTH for
trials : array-like of int
List of trials on which to compute the PSTH

Returns
-------
tuple
4-tuple with the alignment time, time before, time after, and trials used to compute,
can be used as a key in the internal self.full_psths and self.cov_psths dictionaries,
which contain the full PSTH and the PSTH for each regressor, respectively.
"""
keytuple = (align_time, t_before, t_after, tuple(trials))
if keytuple not in self.full_psths:
self.full_psths[keytuple] = pred_psth(
self.nglm, align_time, t_before, t_after, trials=trials
)
self.cov_psths[keytuple] = {}
tmp = self.cov_psths[keytuple]
for cov in self.covar:
tmp[cov] = pred_psth(
self.nglm,
align_time,
t_before,
t_after,
targ_regressors=[cov],
trials=trials,
incl_bias=False,
)

return keytuple
4 changes: 2 additions & 2 deletions brainwidemap/encoding/params.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,5 +4,5 @@
work.
"""

GLM_CACHE = "/mnt/Storage/glm_cache/"
GLM_FIT_PATH = "/mnt/Storage/results/glms/"
GLM_CACHE = "/home/gercek/Projects/glm_cache/"
GLM_FIT_PATH = "/home/gercek/Projects/results/glms/"
15 changes: 7 additions & 8 deletions brainwidemap/encoding/pipelines/01_cache_regressors.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,14 +7,11 @@

# Third party libraries
import dask
import numpy as np
import pandas as pd
from dask.distributed import Client
from dask_jobqueue import SLURMCluster

# IBL libraries
import brainbox.io.one as bbone
from iblutil.numerical import ismember
from one.api import ONE
from brainwidemap.encoding.params import GLM_CACHE
from brainwidemap.encoding.utils import load_regressors
Expand Down Expand Up @@ -68,7 +65,7 @@ def delayed_loadsave(subject, session_id, pid, params):
T_BEF = 0.6 # Time before stimulus onset to include in the definition of the trial
T_AFT = 0.6 # Time after feedback to include in the definition of a trial
BINWIDTH = 0.02 # Size of binwidth for wheel velocity traces, in seconds
ABSWHEEL = False # Whether to return wheel velocity (False) or speed (True)
ABSWHEEL = True # Whether to return wheel velocity (False) or speed (True)
CLU_CRITERIA = "bwm" # Criteria on cluster inclusion in cache
# End parameters

Expand All @@ -79,13 +76,15 @@ def delayed_loadsave(subject, session_id, pid, params):
"binwidth": BINWIDTH,
"abswheel": ABSWHEEL,
"clu_criteria": CLU_CRITERIA,
"one_url": "https://alyx.internationalbrainlab.org",
"one_pw": "international",
}

pw = 'international'
one = ONE(base_url='https://openalyx.internationalbrainlab.org', password=pw, silent=True)
one = ONE(base_url=params["one_url"], silent=True)
dataset_futures = []

sessdf = bwm_query().set_index("pid")
freeze = "2023_12_bwm_release" if CLU_CRITERIA == "bwm" else None
sessdf = bwm_query(freeze=freeze).set_index("pid")

for pid, rec in sessdf.iterrows():
subject = rec.subject
Expand All @@ -110,7 +109,7 @@ def delayed_loadsave(subject, session_id, pid, params):
f"export OPENBLAS_NUM_THREADS={N_CORES}",
],
)
cluster.scale(40)
cluster.scale(20)
client = Client(cluster)

tmp_futures = [client.compute(future[3]) for future in dataset_futures]
Expand Down
Loading