Skip to content

Commit

Permalink
Initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
Finance-781 committed Apr 11, 2020
0 parents commit f29895e
Show file tree
Hide file tree
Showing 175 changed files with 79,921 additions and 0 deletions.
2 changes: 2 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Auto detect text files and perform LF normalization
* text=auto
21 changes: 21 additions & 0 deletions LICENSE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# License

Copyright 2019, 2020, ML-AIM

The ML-AIM software is released under the [3-Clause BSD license](https://opensource.org/licenses/BSD-3-Clause) unless mentioned otherwise by the respective algorithms.

## BSD-3-Clause


Copied from https://opensource.org/licenses/BSD-3-Clause at 2020-02-08:

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

110 changes: 110 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
# ML-AIM: Machine Learning and Artificial Intelligence for Medicine

This repository contains the implementations of algorithms developed
by the [ML-AIM](http://www.vanderschaar-lab.com) Laboratory.

1. [AutoPrognosis](https://icml.cc/Conferences/2018/Schedule?showEvent=2050): Automated Clinical Prognostic Modeling, ICML 2018 [software](alg/autoprognosis)
2. [GAIN](http://proceedings.mlr.press/v80/yoon18a.html): a GAN based missing data imputation algorithm, ICML 2018 [software](alg/gain)
3. [INVASE](https://openreview.net/forum?id=BJg_roAcK7): an Actor-critic model based instance wise feature selection algorithm, ICLR 2019 [software](alg/invase)
4. [GANITE](https://openreview.net/forum?id=ByKWUeWA-): a GAN based algorithm for estimating individualized treatment effects, ICLR 2018 [software](alg/ganite)
5. [DeepHit](http://medianetlab.ee.ucla.edu/papers/AAAI_2018_DeepHit): a Deep Learning Approach to Survival Analysis with Competing Risks, AAAI 2018 [software](alg/deephit)
6. [PATE-GAN](https://openreview.net/forum?id=S1zk9iRqF7): Generating Synthetic Data with Differential Privacy Guarantees, ICLR 2019 [software](alg/pategan)
7. [KnockoffGAN](https://openreview.net/pdf?id=ByeZ5jC5YQ): generating knockoffs for feature selection using generative adversarial networks, ICLR 2019 [software](alg/knockoffgan)
8. [Causal Multi-task Gaussian Processes](https://papers.nips.cc/paper/6934-bayesian-inference-of-individualized-treatment-effects-using-multi-task-gaussian-processes.pdf): Bayesian Inference of Individualized Treatment Effects using Multi-task Gaussian Processes, NIPS 2017 [software](alg/causal_multitask_gaussian_processes_ite)
9. [Limits of Estimating Heterogeneous Treatment Effects:Guidelines for Practical Algorithm Design](http://proceedings.mlr.press/v80/alaa18a/alaa18a.pdf)[software](alg/causal_multitask_gaussian_processes_ite)
10. [ASAC](https://arxiv.org/abs/1906.06796): Active Sensing using Actor-Critic Models, MLHC 2019 [software](alg/asac)
11. [DGPSurvival](https://papers.nips.cc/paper/6827-deep-multi-task-gaussian-processes-for-survival-analysis-with-competing-risks.pdf): Deep Multi-task Gaussian Processes for Survival Analysis with Competing Risks, NIPS 2018 [software](alg/dgp_survival)
12. [Symbolic Metamodeling](https://papers.nips.cc/paper/9308-demystifying-black-box-models-with-symbolic-metamodels) Demystifying Black-box Models with Symbolic Metamodels, NeurIPS 2019 [software](alg/symbolic_metamodeling)
13. [DPBAG](https://papers.nips.cc/paper/8684-differentially-private-bagging-improved-utility-and-cheaper-privacy-than-subsample-and-aggregate) Differentially Private Bagging: Improved utility and cheaper privacy than subsample-and-aggregate, NeurIPS 2019 [software](alg/dpbag)
14. [TimeGAN](https://papers.nips.cc/paper/8789-time-series-generative-adversarial-networks) Time-series Generative Adversarial Networks, NeurIPS 2019 [software](alg/timegan)
15. [Attentiveness](https://papers.nips.cc/paper/9311-attentive-state-space-modeling-of-disease-progression) Attentive State-Space Modeling of Disease Progression, NeurIPS 2019 [software](alg/attentivess)
16. [GCIT](https://arxiv.org/pdf/1907.04068.pdf): Conditional Independence Testing with Generative Adversarial Networks, NeurIPS 2019 [software](alg/gcit)
17. [Counterfactual Recurrent Network](https://openreview.net/forum?id=BJg866NFvB): Estimating counterfactual treatment outcomes over time through adversarially balanced representations, ICLR 2020 [software](alg/counterfactual_recurrent_network)
18. [C3T Budget](https://arxiv.org/abs/2001.02463): Contextual constrained learning for dose-finding clinical trials, AISTATS 2020 [software](alg/c3t_budgets)
19. [DKLITE](https://arxiv.org/abs/2001.04754): Learning Overlapping Representations for the Estimation of Individualized Treatment Effects, AISTATS 2020 [software](alg/dklite)
20. [Dynamic disease network ddp](https://arxiv.org/abs/2001.02585): Learning Dynamic and Personalized Comorbidity Networks from Event Data using Deep Diffusion Processes, AISTATS 2020 [software](alg/dynamic_disease_network_ddp)
21. [SMS-DKL](https://arxiv.org/abs/2001.03898): Stepwise Model Selection for Sequence Prediction via Deep Kernel Learning, AISTATS 2020 [software](alg/smsdkl)

Prepared for release and maintained by AvdSchaar

Please send comments and suggestions to [email protected]

## Citations

Please cite the [ML-AIM repository](https://bitbucket.org/mvdschaar/mlforhealthlabpub) and or the applicable papers if you use the software.

## License

Copyright 2019, 2020, ML-AIM

The ML-AIM software is released under the [3-Clause BSD license](https://opensource.org/licenses/BSD-3-Clause) unless mentioned otherwise by the respective algorithms.

## [Installation instructions](doc/install.md)

See doc/install.md for installation instructions

## Tutorials and or examples

* AutoPrognosis:
-- alg/autoprognosis/tutorial_autoprognosis_api.ipynb
-- alg/autoprognosis/tutorial_autoprognosis_cli.ipynb
* GAIN: alg/gain/tutorial_gain.ipynb
* INVASE: alg/invase/tutorial_invase.ipynb
* GANITE: alg/ganite/tutorial_ganite.ipynb
* PATE-GAN: alg/pategan/tutorial_pategan.ipynb
* KnockoffGAN: alg/knockoffgan/tutorial_knockoffgan.ipynb
* ASAC: alg/asac/tutorial_asac.ipynb
* DGPSurvival: alg/dgp_survival/tutorial_dgp.ipynb
* Symbolic Metamodeling:
-- alg/symbolic_metamodeling/1-_Introduction_to_Meijer_G-functions.ipynb
-- alg/symbolic_metamodeling/2-_Metamodeling_of_univariate_black-box_functions_using_Meijer_G-functions.ipynb
-- alg/symbolic_metamodeling/3-_Building_Symbolic_Metamodels.ipynb
* Differentially Private Bagging: alg/dpbag/DPBag_Tutorial.ipynb
* Time-series Generative Adversarial Networks: alg/timegan/tutorial_timegan.ipynb
* Attentive State-Space Modeling of Disease Progression: alg/attentivess/Tutorial_for_Attentive_State-space_Models.ipynb
* Conditional Independence Testing with Generative Adversarial Networks: alg/gcit/tutorial_gcit.ipynb
* DKLITE: alg/dklite/tutorial_dklite.ipynb
* SMS-DKL: alg/smsdkl/test_smsdkl.py

### [Presentation Autoprognosis](https://www.youtube.com/watch?v=d1uEATa0qIo)

You can find a presentation by Prof. van der Schaar describing AutoPrognosis here: https://www.youtube.com/watch?v=d1uEATa0qIo

## Version history

- version 1.7: February 27, 2020: SMS-DKL
- version 1.6: February 24, 2020: DKLITE and dynamic disease network ddp
- version 1.5: February 23, 2020: C3T Budget
- version 1.4: February 3, 2020: Counterfactual Recurrent Network
- version 1.3: December 7, 2019: Conditional Independence Testing with Generative Adversarial Networks
- version 1.1: November 30, 2019: Attentive State-Space Modeling
- version 1.0: November 4, 2019: Differentially Private Bagging and Time-series Generative Adversarial Networks
- version 0.9: October 25, 2019: Symbolic Metamodeling
- version 0.8: September 29, 2019: DGP Survival
- version 0.7: September 20, 2019: ASAC
- version 0.6: August 5, 2019: Causal Multi-task Gaussian Processes
- version 0.5: July 24, 2019: KnockoffGAN
- version 0.4: June 18, 2019: Deephit and PATE-GAN

## References
1. [AutoPrognosis: Automated Clinical Prognostic Modeling via Bayesian Optimization with Structured Kernel Learning](https://icml.cc/Conferences/2018/Schedule?showEvent=2050)
2. [Prognostication and Risk Factors for Cystic Fibrosis via Automated Machine Learning](https://www.nature.com/articles/s41598-018-29523-2)
3. [Cardiovascular Disease Risk Prediction using Automated Machine Learning: A Prospective Study of 423,604 UK Biobank Participants](https://www.ncbi.nlm.nih.gov/pubmed/31091238)
4. [GAIN: Missing Data Imputation using Generative Adversarial Nets](http://proceedings.mlr.press/v80/yoon18a.html)
5. [INVASE: Instance-wise Variable Selection using Neural Networks](https://openreview.net/forum?id=BJg_roAcK7)
6. [GANITE: Estimation of Individualized Treatment Effects using Generative Adversarial Nets](https://openreview.net/forum?id=ByKWUeWA-)
7. [KnockoffGAN](https://openreview.net/pdf?id=ByeZ5jC5YQ): generating knockoffs for feature selection using generative adversarial networks
8. [Bayesian Inference of Individualized Treatment Effects using Multi-task Gaussian Processes](https://papers.nips.cc/paper/6934-bayesian-inference-of-individualized-treatment-effects-using-multi-task-gaussian-processes.pdf)
9. [Limits of Estimating Heterogeneous Treatment Effects:Guidelines for Practical Algorithm Design](http://proceedings.mlr.press/v80/alaa18a/alaa18a.pdf)
10. [ASAC](https://arxiv.org/abs/1906.06796) Active Sensing using Actor-Critic Models
11. [DGPSurvival](https://papers.nips.cc/paper/6827-deep-multi-task-gaussian-processes-for-survival-analysis-with-competing-risks.pdf): Deep Multi-task Gaussian Processes for Survival Analysis with Competing Risks
12. [GCIT](https://arxiv.org/pdf/1907.04068.pdf): Conditional Independence Testing with Generative Adversarial Networks
13. [Counterfactual Recurrent Network](https://openreview.net/forum?id=BJg866NFvB): Estimating counterfactual treatment outcomes over time through adversarially balanced representations
14. [C3T Budget](https://arxiv.org/abs/2001.02463): Contextual constrained learning for dose-finding clinical trials
15. [SMS-DKL](https://arxiv.org/abs/2001.03898): Stepwise Model Selection for Sequence Prediction via Deep Kernel Learning
16. Dua, D. and Graff, C. (2019). [UCI Machine Learning Repository](http://archive.ics.uci.edu/ml). Irvine, CA: University of California, School of Information and Computer Science.
17. Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011.
18. [TensorFlow](tensorflow.org): Large-scale machine learning on heterogeneous systems, 2015. Software available from tensorflow.org.
19. [GPyOpt](http://github.com/SheffieldML/GPyOpt): A Bayesian Optimization framework in python
20. [scikit-survival](https://github.com/sebp/scikit-survival) survival analysis built on top of scikit-learn
21. [3-Clause BSD license](https://opensource.org/licenses/BSD-3-Clause)
116 changes: 116 additions & 0 deletions alg/asac/ASAC.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
'''
ASAC (Active Sensing using Actor-Critic Model) (12/18/2018)
Active Sensing Function
'''

#%% Necessary Packages
import tensorflow as tf
import numpy as np

#%% ASAC Function
'''
Inputs:
- trainX, train Y (training set)
- testX: testing features
- cost: measurement costs
Outputs:
- Selected training samples
- Selected testing samples
'''


def ASAC(
trainX,
trainY,
testX,
cost,
iterations=5001,
learning_rate=0.01):

##### Initialization on the Graph
tf.reset_default_graph()

# Network Parameters
seq_length = len(trainX[0][:,0])
data_dim = len(trainX[0][0,:])
hidden_dim1 = 5
hidden_dim2 = 5
output_dim1 = data_dim
output_dim2 = 1
# learning_rate = 0.01

#%% Preprocessing
New_trainX = list()
for i in range(len(trainX)):
Temp = trainX[i].copy()
Temp[1:,:] = Temp[:(seq_length-1),:]
Temp[0,:] = np.zeros([data_dim])

New_trainX.append(Temp)

New_testX = list()
for i in range(len(testX)):
Temp = testX[i].copy()
Temp[1:,:] = Temp[:(seq_length-1),:]
Temp[0,:] = np.zeros([data_dim])

New_testX.append(Temp)

#%% Network Building
# input place holders

New_X = tf.placeholder(tf.float32, [None, seq_length, data_dim])
X = tf.placeholder(tf.float32, [None, seq_length, data_dim])
Y = tf.placeholder(tf.float32, [None, seq_length])

# build a LSTM network
cell1 = tf.contrib.rnn.BasicLSTMCell(num_units=hidden_dim1, state_is_tuple=True, activation=None, name = 'cell1')
outputs1, _states1 = tf.nn.dynamic_rnn(cell1, New_X, dtype=tf.float32)
Mask = tf.contrib.layers.fully_connected(outputs1, output_dim1, activation_fn=tf.sigmoid) # We use the last cell's output
New_Mask = tf.maximum(Mask-0.499,0)
New_Mask = New_Mask * 10000
New_Mask = tf.minimum(New_Mask,1)

cell2 = tf.contrib.rnn.BasicLSTMCell(num_units=hidden_dim2, state_is_tuple=True, activation=tf.tanh, name = 'cell2')
outputs2, _states2 = tf.nn.dynamic_rnn(cell2, X * New_Mask, dtype=tf.float32)
Y_pred = tf.contrib.layers.fully_connected(outputs2, output_dim2, activation_fn=None) # We use the last cell's output

# cost/loss
loss1 = tf.reduce_sum(tf.square(tf.reshape(Y_pred, [-1,seq_length]) - Y)) # sum of the squares
loss2 = tf.reduce_sum(New_Mask * cost)
loss = loss1 + 0.0001 * loss2
# optimizer
optimizer = tf.train.AdamOptimizer(learning_rate)
train = optimizer.minimize(loss)

#%% Sessions
sess = tf.Session()

# Initialization
sess.run(tf.global_variables_initializer())

#%% Training step
for i in range(iterations):
_, step_loss1, step_loss2 = sess.run([train, loss1, loss2], feed_dict={X: trainX, Y: trainY, New_X: New_trainX})

if i % 1000 == 0:
print('step: ' + str(i) + ', Loss1: ' + str(step_loss1) + ', Loss2: ' + str(step_loss2))

#%% Test step
train_mask = sess.run(Mask, feed_dict = {X: trainX, New_X: New_trainX})
test_mask = sess.run(Mask, feed_dict = {X: testX, New_X: New_testX})

#%% Output
# Selected Training / Testing Samples

Final_train_mask = list()
Final_test_mask = list()

for i in range(len(trainX)):
Final_train_mask.append(np.round(train_mask[i,:,:]))

for i in range(len(testX)):
Final_test_mask.append(np.round(test_mask[i,:,:]))

return Final_train_mask, Final_test_mask
74 changes: 74 additions & 0 deletions alg/asac/Data_Generation_X.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
# Necessary Packages
import numpy as np

#%% AR(1) Generation.
'''
X_t = coef * X_t-1 + n
n ~ N(0, sigma^2)
sigma = np.sqrt(margin*(1-coef*coef))
Therefore, Marginal distribution is N(0, margin)
Inputs
- n: Number of samples
- p: Number of features
- phi: Autoregressiveness
- margin: Std of the normal distribution
'''
def AR_Gauss_X1 (n, d, t, phi, sigma):

# Initialization
Output_X = list()

# For each sample
for i in range(n):

Temp_Output_X = np.zeros([t,d])

# For each feature
for j in range(d):

for k in range(t):

# Starting feature
if (k == 0):
Temp_Output_X[k,j] = np.random.normal(0,sigma)

# AR(1) Generation
else:
Temp_Output_X[k,j] = phi[j] * Temp_Output_X[k-1,j] + (1-phi[j])*np.random.normal(0,sigma)

Output_X.append(Temp_Output_X)

return Output_X

#%%
def AR_Gauss_X2 (n, d, t, phi, sigma, gamma):

# Initialization
Output_X = list()

# For each sample
for i in range(n):

Temp_Output_X = np.zeros([t,2*d])

# For each feature
for j in range(d):

for k in range(t):

# Starting feature
if (k == 0):
Temp_Output_X[k,j] = np.random.normal(0,sigma)

# AR(1) Generation
else:
Temp_Output_X[k,j] = phi[j] * Temp_Output_X[k-1,j] + (1-phi[j])*np.random.normal(0,sigma)

Temp_Output_X[k,d+j] = Temp_Output_X[k,j] + np.random.normal(0,gamma)


Output_X.append(Temp_Output_X)

return Output_X

Loading

0 comments on commit f29895e

Please sign in to comment.