You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Since the evaluation-of-OPE requires knowledge of the on-policy policy values, is OPS only relevant for synthetic data where the underlying behavior policy value is known? Or is it possible to estimate the on-policy policy value from real-world data, as well?
When I run this code block from the basic_synthetic_continious_advanced.ipynb notebook on my real-world dataset
I get the following error: ValueError: one of the candidate policies, cql, does not contain on-policy policy value in input_dict.
Edit: After posting this issue, it occured to me that "to estimate the on-policy policy value from real-world data" would just be equivalent to doing OPE, so evaluation-of-OPE would not be possible in that case. Please, correct me if I am misunderstanding anything.
The text was updated successfully, but these errors were encountered:
Since the evaluation-of-OPE requires knowledge of the on-policy policy values, is OPS only relevant for synthetic data where the underlying behavior policy value is known? Or is it possible to estimate the on-policy policy value from real-world data, as well?
When I run this code block from the
basic_synthetic_continious_advanced.ipynb
notebook on my real-world datasetI get the following error:
ValueError: one of the candidate policies, cql, does not contain on-policy policy value in input_dict
.Edit: After posting this issue, it occured to me that "to estimate the on-policy policy value from real-world data" would just be equivalent to doing OPE, so evaluation-of-OPE would not be possible in that case. Please, correct me if I am misunderstanding anything.
The text was updated successfully, but these errors were encountered: