From a23690350f7ffdb72aa1ae55cc5953ec505cd7fd Mon Sep 17 00:00:00 2001 From: Nathan Zhao Date: Thu, 8 Aug 2024 01:57:46 +0000 Subject: [PATCH] lint --- training.notes | 12 ++++++++++++ 1 file changed, 12 insertions(+) create mode 100644 training.notes diff --git a/training.notes b/training.notes new file mode 100644 index 0000000..66d9976 --- /dev/null +++ b/training.notes @@ -0,0 +1,12 @@ + +# Currently tests: +- Hidden layer size of 256 shows progress (loss is based on state.q[2]) + +- setting std to zero makes rewards nans why. I wonder if there NEEDS to be randomization in the enviornment + +- ctrl cost is whats giving nans? interesting? +- it is unrelated to randomization of enviornmnet. i think gradient related + +- first thing to become nans seems to be actor loss and scores. after that, everything becomes nans + +- fixed entropy epsilon. hope this works now. \ No newline at end of file