From a23690350f7ffdb72aa1ae55cc5953ec505cd7fd Mon Sep 17 00:00:00 2001
From: Nathan Zhao <nathanzh@stanford.edu>
Date: Thu, 8 Aug 2024 01:57:46 +0000
Subject: [PATCH] lint

---
 training.notes | 12 ++++++++++++
 1 file changed, 12 insertions(+)
 create mode 100644 training.notes

diff --git a/training.notes b/training.notes
new file mode 100644
index 0000000..66d9976
--- /dev/null
+++ b/training.notes
@@ -0,0 +1,12 @@
+
+# Currently tests:
+- Hidden layer size of 256 shows progress (loss is based on state.q[2])
+
+- setting std to zero makes rewards nans why. I wonder if there NEEDS to be randomization in the enviornment
+
+- ctrl cost is whats giving nans? interesting?
+- it is unrelated to randomization of enviornmnet. i think gradient related
+
+- first thing to become nans seems to be actor loss and scores. after that, everything becomes nans
+
+- fixed entropy epsilon. hope this works now.
\ No newline at end of file