You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have re-trained the model on only the LINCS L1000 data, and I would now like to apply this model to a new dataset and use the predicted expression for another task. In this dataset, I believe I already have the information I need (i.e. SMILES, dose, cell line information, and baseline gene expression). However, I have not been able to find a function in the model that will simply output the predicted gene expression based on this information.
I know that in Issue #109 you mentioned that one could potentially use the evaluate_r2 method (and specifically the compute_prediction function contained within it) for this type of analysis. However, I have not been able to find a set of functions that will allow me to properly instantiate the model and pull only the predictions from this method.
Do you have any advice for those who simply want to run a pre-trained model on a dataset, and have the output simply be the predicted gene expression rather than a set of evaluation metrics? Thank you!
Edit note: I am also having trouble with using new SMILES that were not in the original training set. I know some experiments in the paper used drugs outside of the training data, but I do not see any way to do this without retraining the model. Is this true, or is there a way for the model to process novel SMILES?
The text was updated successfully, but these errors were encountered:
on the branch biroscak/predict-method-for-new-dataset there's now a chemCPA/predict.py file, that shows how to use a pretrained checkpoint to get predictions on data.
The predict function accepts the information that you have (drug_embeddings, covariate configuration, control gene expression data and checkpoint) and returns the predictions.
The prepare function shows an example way for how to get the 4 points for the lincs_full_sciplex_genes dataset.
As pertaining to using new smiles, this should also work. - you simply compute embeddings for your new smiles, and supply it via the aforementioned drug_embeddings. Just note, that the model loads the embeddings that were used during training during its setup, so the code might crash if you don't have those, but the prediction can use embeddings whatsoever. This will be fixed.
Hi! Could you add embeddings for testing predict.py from biroscak/predict-method-for-new-dataset (e.g. sciplex_complete_lincs_genes_v2_rdkit2D_embedding.parquet), please?
Hello,
I have re-trained the model on only the LINCS L1000 data, and I would now like to apply this model to a new dataset and use the predicted expression for another task. In this dataset, I believe I already have the information I need (i.e. SMILES, dose, cell line information, and baseline gene expression). However, I have not been able to find a function in the model that will simply output the predicted gene expression based on this information.
I know that in Issue #109 you mentioned that one could potentially use the evaluate_r2 method (and specifically the compute_prediction function contained within it) for this type of analysis. However, I have not been able to find a set of functions that will allow me to properly instantiate the model and pull only the predictions from this method.
Do you have any advice for those who simply want to run a pre-trained model on a dataset, and have the output simply be the predicted gene expression rather than a set of evaluation metrics? Thank you!
Edit note: I am also having trouble with using new SMILES that were not in the original training set. I know some experiments in the paper used drugs outside of the training data, but I do not see any way to do this without retraining the model. Is this true, or is there a way for the model to process novel SMILES?
The text was updated successfully, but these errors were encountered: