Initial commit

firmai · Apr 11, 2020 · f29895e · f29895e
commit f29895e
Show file tree

Hide file tree

Showing 175 changed files with 79,921 additions and 0 deletions.
diff --git a/.gitattributes b/.gitattributes
@@ -0,0 +1,2 @@
+# Auto detect text files and perform LF normalization
+* text=auto
diff --git a/LICENSE.md b/LICENSE.md
@@ -0,0 +1,21 @@
+# License
+
+Copyright 2019, 2020, ML-AIM
+
+The ML-AIM software is released under the [3-Clause BSD license](https://opensource.org/licenses/BSD-3-Clause) unless mentioned otherwise by the respective algorithms.
+
+## BSD-3-Clause
+
+
+Copied from https://opensource.org/licenses/BSD-3-Clause at 2020-02-08:
+
+Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
+
+1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
+
+2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
+
+3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
diff --git a/README.md b/README.md
@@ -0,0 +1,110 @@
+# ML-AIM: Machine Learning and Artificial Intelligence for Medicine
+
+This repository contains the implementations of algorithms developed
+by the [ML-AIM](http://www.vanderschaar-lab.com) Laboratory.
+
+1. [AutoPrognosis](https://icml.cc/Conferences/2018/Schedule?showEvent=2050): Automated Clinical Prognostic Modeling, ICML 2018 [software](alg/autoprognosis)
+2. [GAIN](http://proceedings.mlr.press/v80/yoon18a.html): a GAN based missing data imputation algorithm, ICML 2018 [software](alg/gain)
+3. [INVASE](https://openreview.net/forum?id=BJg_roAcK7): an Actor-critic model based instance wise feature selection algorithm, ICLR 2019 [software](alg/invase)
+4. [GANITE](https://openreview.net/forum?id=ByKWUeWA-): a GAN based algorithm for estimating individualized treatment effects, ICLR 2018 [software](alg/ganite)
+5. [DeepHit](http://medianetlab.ee.ucla.edu/papers/AAAI_2018_DeepHit): a Deep Learning Approach to Survival Analysis with Competing Risks, AAAI 2018 [software](alg/deephit)
+6. [PATE-GAN](https://openreview.net/forum?id=S1zk9iRqF7): Generating Synthetic Data with Differential Privacy Guarantees, ICLR 2019 [software](alg/pategan)
+7. [KnockoffGAN](https://openreview.net/pdf?id=ByeZ5jC5YQ): generating knockoffs for feature selection using generative adversarial networks, ICLR 2019 [software](alg/knockoffgan)
+8. [Causal Multi-task Gaussian Processes](https://papers.nips.cc/paper/6934-bayesian-inference-of-individualized-treatment-effects-using-multi-task-gaussian-processes.pdf): Bayesian Inference of Individualized Treatment Effects using Multi-task Gaussian Processes, NIPS 2017 [software](alg/causal_multitask_gaussian_processes_ite)
+9. [Limits of Estimating Heterogeneous Treatment Effects:Guidelines for Practical Algorithm Design](http://proceedings.mlr.press/v80/alaa18a/alaa18a.pdf)[software](alg/causal_multitask_gaussian_processes_ite)
+10. [ASAC](https://arxiv.org/abs/1906.06796): Active Sensing using Actor-Critic Models, MLHC 2019 [software](alg/asac)
+11. [DGPSurvival](https://papers.nips.cc/paper/6827-deep-multi-task-gaussian-processes-for-survival-analysis-with-competing-risks.pdf): Deep Multi-task Gaussian Processes for Survival Analysis with Competing Risks, NIPS 2018 [software](alg/dgp_survival)
+12. [Symbolic Metamodeling](https://papers.nips.cc/paper/9308-demystifying-black-box-models-with-symbolic-metamodels) Demystifying Black-box Models with Symbolic Metamodels, NeurIPS 2019 [software](alg/symbolic_metamodeling)
+13. [DPBAG](https://papers.nips.cc/paper/8684-differentially-private-bagging-improved-utility-and-cheaper-privacy-than-subsample-and-aggregate) Differentially Private Bagging: Improved utility and cheaper privacy than subsample-and-aggregate, NeurIPS 2019 [software](alg/dpbag)
+14. [TimeGAN](https://papers.nips.cc/paper/8789-time-series-generative-adversarial-networks) Time-series Generative Adversarial Networks, NeurIPS 2019 [software](alg/timegan)
+15. [Attentiveness](https://papers.nips.cc/paper/9311-attentive-state-space-modeling-of-disease-progression) Attentive State-Space Modeling of Disease Progression, NeurIPS 2019 [software](alg/attentivess)
+16. [GCIT](https://arxiv.org/pdf/1907.04068.pdf): Conditional Independence Testing with Generative Adversarial Networks, NeurIPS 2019 [software](alg/gcit)
+17. [Counterfactual Recurrent Network](https://openreview.net/forum?id=BJg866NFvB): Estimating counterfactual treatment outcomes over time through adversarially balanced representations, ICLR 2020 [software](alg/counterfactual_recurrent_network)
+18. [C3T Budget](https://arxiv.org/abs/2001.02463): Contextual constrained learning for dose-finding clinical trials, AISTATS 2020 [software](alg/c3t_budgets)
+19. [DKLITE](https://arxiv.org/abs/2001.04754): Learning Overlapping Representations for the Estimation of Individualized Treatment Effects, AISTATS 2020 [software](alg/dklite)
+20. [Dynamic disease network ddp](https://arxiv.org/abs/2001.02585): Learning Dynamic and Personalized Comorbidity Networks from Event Data using Deep Diffusion Processes, AISTATS 2020 [software](alg/dynamic_disease_network_ddp)
+21. [SMS-DKL](https://arxiv.org/abs/2001.03898): Stepwise Model Selection for Sequence Prediction via Deep Kernel Learning, AISTATS 2020 [software](alg/smsdkl)
+
+Prepared for release and maintained by AvdSchaar
+
+Please send comments and suggestions to [email protected]
+
+## Citations
+
+Please cite the [ML-AIM repository](https://bitbucket.org/mvdschaar/mlforhealthlabpub) and or the applicable papers if you use the software.
+
+## License
+
+Copyright 2019, 2020, ML-AIM
+
+The ML-AIM software is released under the [3-Clause BSD license](https://opensource.org/licenses/BSD-3-Clause) unless mentioned otherwise by the respective algorithms.
+
+## [Installation instructions](doc/install.md)
+
+See doc/install.md for installation instructions
+
+## Tutorials and or examples
+
+* AutoPrognosis:
+--  alg/autoprognosis/tutorial_autoprognosis_api.ipynb
+--  alg/autoprognosis/tutorial_autoprognosis_cli.ipynb
+* GAIN: alg/gain/tutorial_gain.ipynb
+* INVASE: alg/invase/tutorial_invase.ipynb
+* GANITE: alg/ganite/tutorial_ganite.ipynb
+* PATE-GAN: alg/pategan/tutorial_pategan.ipynb
+* KnockoffGAN: alg/knockoffgan/tutorial_knockoffgan.ipynb
+* ASAC: alg/asac/tutorial_asac.ipynb
+* DGPSurvival: alg/dgp_survival/tutorial_dgp.ipynb
+* Symbolic Metamodeling:
+-- alg/symbolic_metamodeling/1-_Introduction_to_Meijer_G-functions.ipynb
+-- alg/symbolic_metamodeling/2-_Metamodeling_of_univariate_black-box_functions_using_Meijer_G-functions.ipynb
+-- alg/symbolic_metamodeling/3-_Building_Symbolic_Metamodels.ipynb
+* Differentially Private Bagging: alg/dpbag/DPBag_Tutorial.ipynb
+* Time-series Generative Adversarial Networks: alg/timegan/tutorial_timegan.ipynb
+* Attentive State-Space Modeling of Disease Progression: alg/attentivess/Tutorial_for_Attentive_State-space_Models.ipynb
+* Conditional Independence Testing with Generative Adversarial Networks: alg/gcit/tutorial_gcit.ipynb
+* DKLITE: alg/dklite/tutorial_dklite.ipynb
+* SMS-DKL: alg/smsdkl/test_smsdkl.py
+
+### [Presentation Autoprognosis](https://www.youtube.com/watch?v=d1uEATa0qIo)
+
+You can find a presentation by Prof. van der Schaar describing AutoPrognosis here: https://www.youtube.com/watch?v=d1uEATa0qIo
+
+## Version history
+
+- version 1.7: February 27, 2020: SMS-DKL
+- version 1.6: February 24, 2020: DKLITE and dynamic disease network ddp
+- version 1.5: February 23, 2020: C3T Budget
+- version 1.4: February 3, 2020: Counterfactual Recurrent Network
+- version 1.3: December 7, 2019: Conditional Independence Testing with Generative Adversarial Networks
+- version 1.1: November 30, 2019: Attentive State-Space Modeling
+- version 1.0: November 4, 2019: Differentially Private Bagging and Time-series Generative Adversarial Networks
+- version 0.9: October 25, 2019: Symbolic Metamodeling
+- version 0.8: September 29, 2019: DGP Survival
+- version 0.7: September 20, 2019: ASAC
+- version 0.6: August 5, 2019: Causal Multi-task Gaussian Processes
+- version 0.5: July 24, 2019: KnockoffGAN
+- version 0.4: June 18, 2019: Deephit and PATE-GAN
+
+## References
+1. [AutoPrognosis: Automated Clinical Prognostic Modeling via Bayesian Optimization with Structured Kernel Learning](https://icml.cc/Conferences/2018/Schedule?showEvent=2050)
+2. [Prognostication and Risk Factors for Cystic Fibrosis via Automated Machine Learning](https://www.nature.com/articles/s41598-018-29523-2)
+3. [Cardiovascular Disease Risk Prediction using Automated Machine Learning: A Prospective Study of 423,604 UK Biobank Participants](https://www.ncbi.nlm.nih.gov/pubmed/31091238)
+4. [GAIN: Missing Data Imputation using Generative Adversarial Nets](http://proceedings.mlr.press/v80/yoon18a.html)
+5. [INVASE: Instance-wise Variable Selection using Neural Networks](https://openreview.net/forum?id=BJg_roAcK7)
+6. [GANITE: Estimation of Individualized Treatment Effects using Generative Adversarial Nets](https://openreview.net/forum?id=ByKWUeWA-)
+7. [KnockoffGAN](https://openreview.net/pdf?id=ByeZ5jC5YQ): generating knockoffs for feature selection using generative adversarial networks
+8. [Bayesian Inference of Individualized Treatment Effects using Multi-task Gaussian Processes](https://papers.nips.cc/paper/6934-bayesian-inference-of-individualized-treatment-effects-using-multi-task-gaussian-processes.pdf)
+9. [Limits of Estimating Heterogeneous Treatment Effects:Guidelines for Practical Algorithm Design](http://proceedings.mlr.press/v80/alaa18a/alaa18a.pdf)
+10. [ASAC](https://arxiv.org/abs/1906.06796) Active Sensing using Actor-Critic Models
+11. [DGPSurvival](https://papers.nips.cc/paper/6827-deep-multi-task-gaussian-processes-for-survival-analysis-with-competing-risks.pdf): Deep Multi-task Gaussian Processes for Survival Analysis with Competing Risks
+12. [GCIT](https://arxiv.org/pdf/1907.04068.pdf): Conditional Independence Testing with Generative Adversarial Networks
+13. [Counterfactual Recurrent Network](https://openreview.net/forum?id=BJg866NFvB): Estimating counterfactual treatment outcomes over time through adversarially balanced representations
+14. [C3T Budget](https://arxiv.org/abs/2001.02463): Contextual constrained learning for dose-finding clinical trials
+15. [SMS-DKL](https://arxiv.org/abs/2001.03898): Stepwise Model Selection for Sequence Prediction via Deep Kernel Learning
+16. Dua, D. and Graff, C. (2019). [UCI Machine Learning Repository](http://archive.ics.uci.edu/ml). Irvine, CA: University of California, School of Information and Computer Science.
+17. Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011.
+18. [TensorFlow](tensorflow.org): Large-scale machine learning on heterogeneous systems, 2015. Software available from tensorflow.org.
+19. [GPyOpt](http://github.com/SheffieldML/GPyOpt): A Bayesian Optimization framework in python
+20. [scikit-survival](https://github.com/sebp/scikit-survival) survival analysis built on top of scikit-learn
+21. [3-Clause BSD license](https://opensource.org/licenses/BSD-3-Clause)
diff --git a/alg/asac/ASAC.py b/alg/asac/ASAC.py
@@ -0,0 +1,116 @@
+'''
+ASAC (Active Sensing using Actor-Critic Model) (12/18/2018)
+Active Sensing Function
+'''
+
+#%% Necessary Packages
+import tensorflow as tf
+import numpy as np
+
+#%% ASAC Function
+'''
+Inputs: 
+  - trainX, train Y (training set)
+  - testX: testing features
+  - cost: measurement costs
+  
+Outputs:
+  - Selected training samples
+  - Selected testing samples
+'''
+
+
+def ASAC(
+        trainX,
+        trainY,
+        testX,
+        cost,
+        iterations=5001,
+        learning_rate=0.01):
+
+    ##### Initialization on the Graph
+    tf.reset_default_graph()
+
+    # Network Parameters
+    seq_length = len(trainX[0][:,0])
+    data_dim = len(trainX[0][0,:])
+    hidden_dim1 = 5
+    hidden_dim2 = 5
+    output_dim1 = data_dim
+    output_dim2 = 1
+    # learning_rate = 0.01
+
+    #%% Preprocessing
+    New_trainX = list()
+    for i in range(len(trainX)):
+        Temp = trainX[i].copy()
+        Temp[1:,:] = Temp[:(seq_length-1),:] 
+        Temp[0,:] = np.zeros([data_dim])
+
+        New_trainX.append(Temp)
+
+    New_testX = list()
+    for i in range(len(testX)):
+        Temp = testX[i].copy()
+        Temp[1:,:] = Temp[:(seq_length-1),:] 
+        Temp[0,:] = np.zeros([data_dim])
+
+        New_testX.append(Temp)
+
+    #%% Network Building
+    # input place holders
+
+    New_X = tf.placeholder(tf.float32, [None, seq_length, data_dim])
+    X = tf.placeholder(tf.float32, [None, seq_length, data_dim])
+    Y = tf.placeholder(tf.float32, [None, seq_length])
+
+    # build a LSTM network
+    cell1 = tf.contrib.rnn.BasicLSTMCell(num_units=hidden_dim1, state_is_tuple=True, activation=None, name = 'cell1')
+    outputs1, _states1 = tf.nn.dynamic_rnn(cell1, New_X, dtype=tf.float32)
+    Mask = tf.contrib.layers.fully_connected(outputs1, output_dim1, activation_fn=tf.sigmoid)  # We use the last cell's output
+    New_Mask = tf.maximum(Mask-0.499,0)
+    New_Mask = New_Mask * 10000
+    New_Mask = tf.minimum(New_Mask,1)
+
+    cell2 = tf.contrib.rnn.BasicLSTMCell(num_units=hidden_dim2, state_is_tuple=True, activation=tf.tanh, name = 'cell2')
+    outputs2, _states2 = tf.nn.dynamic_rnn(cell2, X * New_Mask, dtype=tf.float32)
+    Y_pred = tf.contrib.layers.fully_connected(outputs2, output_dim2, activation_fn=None)  # We use the last cell's output
+
+    # cost/loss
+    loss1 = tf.reduce_sum(tf.square(tf.reshape(Y_pred, [-1,seq_length]) - Y))   # sum of the squares
+    loss2 = tf.reduce_sum(New_Mask * cost)
+    loss = loss1 + 0.0001 * loss2
+    # optimizer
+    optimizer = tf.train.AdamOptimizer(learning_rate)
+    train = optimizer.minimize(loss)
+
+    #%% Sessions    
+    sess = tf.Session()
+
+    # Initialization
+    sess.run(tf.global_variables_initializer())
+
+    #%% Training step
+    for i in range(iterations):
+        _, step_loss1, step_loss2 = sess.run([train, loss1, loss2], feed_dict={X: trainX, Y: trainY, New_X: New_trainX})
+
+        if i % 1000 == 0:
+            print('step: ' + str(i) + ', Loss1: ' + str(step_loss1) + ', Loss2: ' + str(step_loss2))
+
+    #%% Test step
+    train_mask = sess.run(Mask, feed_dict = {X: trainX, New_X: New_trainX})
+    test_mask = sess.run(Mask, feed_dict = {X: testX, New_X: New_testX})
+
+    #%% Output
+    # Selected Training / Testing Samples
+
+    Final_train_mask = list()
+    Final_test_mask = list()
+
+    for i in range(len(trainX)):
+        Final_train_mask.append(np.round(train_mask[i,:,:]))        
+
+    for i in range(len(testX)):
+        Final_test_mask.append(np.round(test_mask[i,:,:]))        
+
+    return Final_train_mask, Final_test_mask
diff --git a/alg/asac/Data_Generation_X.py b/alg/asac/Data_Generation_X.py
@@ -0,0 +1,74 @@
+# Necessary Packages
+import numpy as np
+
+#%% AR(1) Generation. 
+'''
+X_t = coef * X_t-1 + n 
+n ~ N(0, sigma^2)
+sigma = np.sqrt(margin*(1-coef*coef)) 
+Therefore, Marginal distribution is N(0, margin)
+
+Inputs
+- n: Number of samples
+- p: Number of features
+- phi: Autoregressiveness
+- margin: Std of the normal distribution 
+'''
+def AR_Gauss_X1 (n, d, t, phi, sigma):
+
+    # Initialization
+    Output_X = list()
+
+    # For each sample
+    for i in range(n):
+
+        Temp_Output_X = np.zeros([t,d])
+
+        # For each feature
+        for j in range(d):
+
+            for k in range(t):
+
+                # Starting feature
+                if (k == 0):            
+                    Temp_Output_X[k,j] = np.random.normal(0,sigma)
+
+                # AR(1) Generation
+                else:                
+                    Temp_Output_X[k,j] = phi[j] * Temp_Output_X[k-1,j] + (1-phi[j])*np.random.normal(0,sigma)
+
+        Output_X.append(Temp_Output_X)    
+
+    return Output_X
+
+#%% 
+def AR_Gauss_X2 (n, d, t, phi, sigma, gamma):
+
+    # Initialization
+    Output_X = list()
+
+    # For each sample
+    for i in range(n):
+
+        Temp_Output_X = np.zeros([t,2*d])
+
+        # For each feature
+        for j in range(d):
+
+            for k in range(t):
+
+                # Starting feature
+                if (k == 0):            
+                    Temp_Output_X[k,j] = np.random.normal(0,sigma)
+
+                # AR(1) Generation
+                else:                
+                    Temp_Output_X[k,j] = phi[j] * Temp_Output_X[k-1,j] + (1-phi[j])*np.random.normal(0,sigma)
+
+                Temp_Output_X[k,d+j] = Temp_Output_X[k,j] + np.random.normal(0,gamma)       
+
+
+        Output_X.append(Temp_Output_X)    
+
+    return Output_X
+
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		# Auto detect text files and perform LF normalization
		* text=auto