Rachaelj #68

narnij · 2021-02-09T01:02:30Z

upload .py version of linear regression, Lasso, and met2gcp

…R.ipynb

…chaelj

…for function

…chaelj

…alues <=0.05

…chaelj

… down

ScottCampit · 2021-02-09T11:40:45Z

src/python/LASSO.py

+    for j in range(Ytrain.shape[1]):
+        x=linear_model.Lasso(alpha=a).fit(Xtrain, Ytrain[:, j])
+        mdl.append(x)
+    ypred = mdl[i].predict(Xtest)


I would move this code to the inner loop and predict with x instead - the reason is because if you evaluate with mdl[i], you're not evaluating all the models you just trained. Instead, if you're on the first alpha, you'll only train 1 model - mdl[0].

ScottCampit · 2021-02-09T11:41:32Z

src/python/LASSO.py

+        x=linear_model.Lasso(alpha=a).fit(Xtrain, Ytrain[:, j])
+        mdl.append(x)
+    ypred = mdl[i].predict(Xtest)
+    error.append(mean_squared_error(ypred,Ytest[:, j]))


Same thing here for the same exact reason.

Then to save on memory, write some code to update your model list. The logic is the following:

Evaluate models with first alpha

On the next iteration, save your models and errors in another object

Write code that will see if the MSE in the current alpha level is less than the MSE in the previous alpha level

If it is, replace the model object in your original list, and update the alpha associated with this model

Otherwise, don't update and continue

ScottCampit · 2021-02-09T11:45:02Z

src/python/LASSO.py

+print(error)
+minmse_index = error.index(min(error))
+best_alpha = alphas[minmse_index]
+print(best_alpha)


This is why all of your models have a single alpha value. But remember - each model corresponding to each target variable should have a unique "best" alpha. The comments above should resolve this.

ScottCampit · 2021-02-09T11:46:02Z

src/python/LASSO.py

+
+GCP2MET_models = []
+for i in range(Ytrain.shape[1]):
+    mdl_G2M =linear_model.Lasso(alpha=best_alpha).fit(Xtrain, Ytrain[:, i])


In the case described above, you will need to run this code with the arugment alpha=best_alpha[i] to get the correct values.

ScottCampit · 2021-02-09T11:49:00Z

src/python/LASSO.py

+# In[7]:
+
+
+kf = KFold(n_splits=3)


So I had you run hold out previously just to get the concept down. In practice, you don't have to run hold out. You can just report the results from k-fold CV,

ScottCampit · 2021-02-09T11:54:28Z

src/python/linear regression.py

+
+        r, p_value = pearsonr(Ypred, Ytest[:, i])
+        # Compute P-value based on r-value
+        pval = norm.sf(abs(r))*2 


Ah, this may be where things are going wrong. I would run this code once you get a vector of r values out of the loop. I think this will return different values because your distribution will be different. I may be wrong, but it's worth a shot.

ScottCampit

I just made comments inline for your linear regression and LASSO code. Take a look and let me know if you have any questions, or if you want to pitch additional ideas, let me know too.

narnij and others added 24 commits September 25, 2020 10:57

Add correlation map to EDA

f3330bb

Scott's edits to the correlation notebook, now renamed as MET2GCP_COR…

4cc953d

…R.ipynb

add pvalues

344df4d

Merge branch 'rachaelj' of https://github.com/sriram-lab/egem into ra…

dc5fe1a

…chaelj

add pvalue

4a7d565

Delete metabolite_hm_corr_p.ipynb

d9400b8

Add server side

87eb620

resolve conflict

52d712f

Merge branch 'master' of github.com:sriram-lab/egem into rachaelj

2085bbb

modify function calculate_pvalues and variables' name, add docstring …

df87eca

…for function

Merge branch 'rachaelj' of https://github.com/sriram-lab/egem into ra…

22e207d

…chaelj

add new function generate_corrtable that return correlations with p-v…

2a1121a

…alues <=0.05

modify function generate_corrtable

62888c7

add all

5cd82cc

Added server-side scripts

983564f

Merge branch 'rachaelj' of github.com:sriram-lab/egem into rachaelj

3b5bf02

Add parallel processing

9012112

Merge branch 'master' into rachaelj

f4ad767

modify function generate_corrtable to show only 1 and -1

33d1378

Merge branch 'rachaelj' of https://github.com/sriram-lab/egem into ra…

6b4e9f3

…chaelj

linear regression

26502ec

edit linear regression, add lasso

78b59cd

change the evaluate_model function in Linear regression, and add mark…

0fd9644

… down

add .py version of MET2GCP, LASSO, and Linear regression

2a0e8f9

ScottCampit reviewed Feb 9, 2021

View reviewed changes

to find correlation between individual histone marker and metabolites

2565915

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rachaelj #68

Rachaelj #68

narnij commented Feb 9, 2021

ScottCampit Feb 9, 2021 •

edited

Loading

ScottCampit Feb 9, 2021

ScottCampit Feb 9, 2021

ScottCampit Feb 9, 2021

ScottCampit Feb 9, 2021

ScottCampit Feb 9, 2021

ScottCampit Feb 9, 2021

ScottCampit left a comment

Rachaelj #68

Are you sure you want to change the base?

Rachaelj #68

Conversation

narnij commented Feb 9, 2021

ScottCampit Feb 9, 2021 • edited Loading

Choose a reason for hiding this comment

ScottCampit Feb 9, 2021

Choose a reason for hiding this comment

ScottCampit Feb 9, 2021

Choose a reason for hiding this comment

ScottCampit Feb 9, 2021

Choose a reason for hiding this comment

ScottCampit Feb 9, 2021

Choose a reason for hiding this comment

ScottCampit Feb 9, 2021

Choose a reason for hiding this comment

ScottCampit Feb 9, 2021

Choose a reason for hiding this comment

ScottCampit left a comment

Choose a reason for hiding this comment

ScottCampit Feb 9, 2021 •

edited

Loading