tensorflow · pcoet · Dec 19, 2023 · Dec 19, 2023 · Dec 19, 2023 · Dec 19, 2023
@@ -6,7 +6,7 @@
         "id": "I1JiGtmRbLVp"
       },
       "source": [
-        "##### Copyright 2021 The TF-Agents Authors."
+        "##### Copyright 2023 The TF-Agents Authors."
       ]
     },
     {
@@ -100,7 +100,9 @@
         "\n",
         "Q-Learning 基于 Q 函数的概念。策略 $\\pi$, $Q^{\\pi}(s, a)$ 的 Q 函数（又称状态-操作值函数）用于衡量通过首先采取操作 $a$、随后采取策略 $\\pi$，从状态 $s$ 获得的预期回报或折扣奖励总和。我们将最优 Q 函数 $Q^*(s, a)$ 定义为从观测值 $s$ 开始，先采取操作 $a$，随后采取最优策略所能获得的最大回报。最优 Q 函数遵循以下*贝尔曼*最优性方程：\n",
         "\n",
+        "```\n",
         "$\\begin{equation}Q^\\ast(s, a) = \\mathbb{E}[ r + \\gamma \\max_{a'} Q^\\ast(s', a') ]\\end{equation}$\n",
+        "```\n",
         "\n",
         "这意味着，从状态 $s$ 和操作 $a$ 获得的最大回报等于即时奖励 $r$ 与通过遵循最优策略，随后直到片段结束所获得的回报（折扣因子为 $\\gamma$）的总和（即，来自下一个状态 $s'$ 的最高奖励）。期望是在即时奖励 $r$ 的分布以及可能的下一个状态 $s'$ 的基础上计算的。\n",
         "\n",
@@ -110,17 +112,17 @@
         "\n",
         "对于大多数问题，将 $Q$ 函数表示为包含 $s$ 和 $a$ 每种组合的值的表是不切实际的。相反，我们训练一个函数逼近器（例如，带参数 $\\theta$ 的神经网络）来估算 Q 值，即 $Q(s, a; \\theta) \\approx Q^*(s, a)$。这可以通过在每个步骤 $i$ 使以下损失最小化来实现：\n",
         "\n",
-        "$\\begin{equation}L_i(\\theta_i) = \\mathbb{E}*{s, a, r, s'\\sim \\rho(.)} \\left[ (y_i - Q(s, a; \\theta_i))^2 \\right]\\end{equation}$，其中 $y_i = r +  \\gamma \\max*{a'} Q(s', a'; \\theta_{i-1})$\n",
+        "$\\begin{equation}L_i(\\theta_i) = \\mathbb{E}{em0}{s, a, r, s'\\sim \\rho(.)} \\left[ (y_i - Q(s, a; \\theta_i))^2 \\right]\\end{equation}$，其中 $y_i = r +  \\gamma \\max{/em0}{a'} Q(s', a'; \\theta_{i-1})$\n",
         "\n",
-        "此处，$y_i$ 称为 TD（时间差分）目标，而 $y_i - Q$ 称为 TD 误差。$\\rho$ 表示行为分布，即从环境中收集的转换 ${s, a, r, s'}$ 的分布。\n",
+        "此处，$y_i$ is 称为 TD（时间差分）目标，而 $y_i - Q$ 称为 TD 误差。$\\rho$ 表示行为分布，即从环境中收集的转换 ${s, a, r, s'}$ 的分布。\n",
         "\n",
         "注意，先前迭代 $\\theta_{i-1}$ 中的参数是固定的，不会更新。实际上，我们使用前几次迭代而不是最后一次迭代的网络参数快照。此副本称为*目标网络*。\n",
         "\n",
         "Q-Learning 是一种*离策略*算法，可在学习贪心策略 $a = \\max_{a} Q(s, a; \\theta)$ 的同时使用不同的行为策略在环境/收集数据过程中执行操作。此行为策略通常是一种 $\\epsilon$ 贪心策略，可选择概率为 $1-\\epsilon$ 的贪心操作和概率为 $\\epsilon$ 的随机操作，以确保良好覆盖状态-操作空间。\n",
         "\n",
         "### 经验回放\n",
         "\n",
-        "为了避免计算 DQN 损失的全期望，我们可以使用随机梯度下降算法将其最小化。如果仅使用最后一个转换 ${s, a, r, s'}$ 来计算损失，那么这会简化为标准 Q-Learning。\n",
+        "为了避免计算 DQN 损失的全期望，我们可以使用随机梯度下降算法将其最小化。如果仅使用最后的转换 ${s, a, r, s'}$ 计算损失，这将简化为标准 Q-Learning。\n",
         "\n",
         "Atari DQN 工作引入了一种称为“经验回放”的技术，可使网络更新更加稳定。在数据收集的每个时间步骤，转换都会添加到称为*回放缓冲区*的循环缓冲区中。然后，在训练过程中，我们不是仅仅使用最新的转换来计算损失及其梯度，而是使用从回放缓冲区中采样的转换的 mini-batch 来计算它们。这样做有两个优点：通过在许多更新中重用每个转换来提高数据效率，以及在批次中使用不相关的转换来提高稳定性。\n"
       ]

@@ -6,7 +6,7 @@
         "id": "W7rEsKyWcxmu"
       },
       "source": [
-        "##### Copyright 2021 The TF-Agents Authors.\n"
+        "##### Copyright 2023 The TF-Agents Authors.\n"
       ]
     },
     {

@@ -6,7 +6,7 @@
         "id": "klGNgWREsvQv"
       },
       "source": [
-        "##### Copyright 2021 The TF-Agents Authors."
+        "##### Copyright 2023 The TF-Agents Authors."
       ]
     },
     {
@@ -40,12 +40,9 @@
         "# 使用 TF-Agents 训练深度 Q 网络\n",
         "\n",
         "<table class=\"tfo-notebook-buttons\" align=\"left\">\n",
-        "  <td>     <a target=\"_blank\" href=\"https://tensorflow.google.cn/agents/tutorials/1_dqn_tutorial\"><img src=\"https://tensorflow.google.cn/images/tf_logo_32px.png\">在 TensorFlow.org 上查看</a>\n",
-        "</td>\n",
-        "  <td>     <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/docs-l10n/blob/master/site/zh-cn/agents/tutorials/1_dqn_tutorial.ipynb\">     <img src=\"https://tensorflow.google.cn/images/colab_logo_32px.png\">     在 Google Colab 中运行</a>\n",
-        "</td>\n",
-        "  <td>     <a target=\"_blank\" href=\"https://github.com/tensorflow/docs-l10n/blob/master/site/zh-cn/agents/tutorials/1_dqn_tutorial.ipynb\"><img src=\"https://tensorflow.google.cn/images/GitHub-Mark-32px.png\">在 Github 上查看源代码</a>\n",
-        "</td>\n",
+        "  <td>     <a target=\"_blank\" href=\"https://tensorflow.google.cn/agents/tutorials/1_dqn_tutorial\"><img src=\"https://tensorflow.google.cn/images/tf_logo_32px.png\">在 TensorFlow.org 上查看</a> </td>\n",
+        "  <td>     <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/docs-l10n/blob/master/site/zh-cn/agents/tutorials/1_dqn_tutorial.ipynb\">     <img src=\"https://tensorflow.google.cn/images/colab_logo_32px.png\">     在 Google Colab 中运行</a> </td>\n",
+        "  <td>     <a target=\"_blank\" href=\"https://github.com/tensorflow/docs-l10n/blob/master/site/zh-cn/agents/tutorials/1_dqn_tutorial.ipynb\"><img src=\"https://tensorflow.google.cn/images/GitHub-Mark-32px.png\">在 Github 上查看源代码</a> </td>\n",
         "  <td>     <a href=\"https://storage.googleapis.com/tensorflow_docs/docs-l10n/site/zh-cn/agents/tutorials/1_dqn_tutorial.ipynb\"><img src=\"https://tensorflow.google.cn/images/download_logo_32px.png\">下载笔记本</a>   </td>\n",
         "</table>"
       ]

@@ -6,7 +6,7 @@
         "id": "Ma19Ks2CTDbZ"
       },
       "source": [
-        "##### Copyright 2021 The TF-Agents Authors."
+        "##### Copyright 2023 The TF-Agents Authors."
       ]
     },
     {
@@ -95,7 +95,6 @@
       },
       "outputs": [],
       "source": [
-        "!pip install \"gym>=0.21.0\"\n",
         "!pip install tf-agents[reverb]\n"
       ]
     },

@@ -6,7 +6,7 @@
         "id": "1Pi_B2cvdBiW"
       },
       "source": [
-        "##### Copyright 2021 The TF-Agents Authors."
+        "##### Copyright 2023 The TF-Agents Authors."
       ]
     },
     {
@@ -40,12 +40,9 @@
         "# 策略\n",
         "\n",
         "<table class=\"tfo-notebook-buttons\" align=\"left\">\n",
-        "  <td>     <a target=\"_blank\" href=\"https://tensorflow.google.cn/agents/tutorials/3_policies_tutorial\"><img src=\"https://tensorflow.google.cn/images/tf_logo_32px.png\">在 TensorFlow.org 上查看</a>\n",
-        "</td>\n",
-        "  <td>     <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/docs-l10n/blob/master/site/zh-cn/agents/tutorials/3_policies_tutorial.ipynb\"><img src=\"https://tensorflow.google.cn/images/colab_logo_32px.png\">在 Google Colab 运行</a>\n",
-        "</td>\n",
-        "  <td>     <a target=\"_blank\" href=\"https://github.com/tensorflow/docs-l10n/blob/master/site/zh-cn/agents/tutorials/3_policies_tutorial.ipynb\"><img src=\"https://tensorflow.google.cn/images/GitHub-Mark-32px.png\">在 Github 上查看源代码</a>\n",
-        "</td>\n",
+        "  <td>     <a target=\"_blank\" href=\"https://tensorflow.google.cn/agents/tutorials/3_policies_tutorial\"><img src=\"https://tensorflow.google.cn/images/tf_logo_32px.png\">在 TensorFlow.org 上查看</a> </td>\n",
+        "  <td>     <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/docs-l10n/blob/master/site/zh-cn/agents/tutorials/3_policies_tutorial.ipynb\"><img src=\"https://tensorflow.google.cn/images/colab_logo_32px.png\">在 Google Colab 运行</a> </td>\n",
+        "  <td>     <a target=\"_blank\" href=\"https://github.com/tensorflow/docs-l10n/blob/master/site/zh-cn/agents/tutorials/3_policies_tutorial.ipynb\"><img src=\"https://tensorflow.google.cn/images/GitHub-Mark-32px.png\">在 Github 上查看源代码</a> </td>\n",
         "  <td>     <a href=\"https://storage.googleapis.com/tensorflow_docs/docs-l10n/site/zh-cn/agents/tutorials/3_policies_tutorial.ipynb\"><img src=\"https://tensorflow.google.cn/images/download_logo_32px.png\">下载笔记本</a>   </td>\n",
         "</table>"
       ]

@@ -6,7 +6,7 @@
         "id": "beObUOFyuRjT"
       },
       "source": [
-        "##### Copyright 2021 The TF-Agents Authors."
+        "##### Copyright 2023 The TF-Agents Authors."
       ]
     },
     {

@@ -6,7 +6,7 @@
         "id": "beObUOFyuRjT"
       },
       "source": [
-        "##### Copyright 2021 The TF-Agents Authors."
+        "##### Copyright 2023 The TF-Agents Authors."
       ]
     },
     {

@@ -6,7 +6,7 @@
         "id": "klGNgWREsvQv"
       },
       "source": [
-        "##### Copyright 2021 The TF-Agents Authors."
+        "##### Copyright 2023 The TF-Agents Authors."
       ]
     },
     {
@@ -40,12 +40,9 @@
         "# REINFORCE 代理\n",
         "\n",
         "<table class=\"tfo-notebook-buttons\" align=\"left\">\n",
-        "  <td>     <a target=\"_blank\" href=\"https://tensorflow.google.cn/agents/tutorials/6_reinforce_tutorial\"><img src=\"https://tensorflow.google.cn/images/tf_logo_32px.png\">在 TensorFlow.org 上查看</a>\n",
-        "</td>\n",
-        "  <td>     <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/docs-l10n/blob/master/site/zh-cn/agents/tutorials/6_reinforce_tutorial.ipynb\"><img src=\"https://tensorflow.google.cn/images/colab_logo_32px.png\">在 Google Colab 运行</a>\n",
-        "</td>\n",
-        "  <td>     <a target=\"_blank\" href=\"https://github.com/tensorflow/docs-l10n/blob/master/site/zh-cn/agents/tutorials/6_reinforce_tutorial.ipynb\"><img src=\"https://tensorflow.google.cn/images/GitHub-Mark-32px.png\">在 Github 上查看源代码</a>\n",
-        "</td>\n",
+        "  <td>     <a target=\"_blank\" href=\"https://tensorflow.google.cn/agents/tutorials/6_reinforce_tutorial\"><img src=\"https://tensorflow.google.cn/images/tf_logo_32px.png\">在 TensorFlow.org 上查看</a> </td>\n",
+        "  <td>     <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/docs-l10n/blob/master/site/zh-cn/agents/tutorials/6_reinforce_tutorial.ipynb\"><img src=\"https://tensorflow.google.cn/images/colab_logo_32px.png\">在 Google Colab 运行</a> </td>\n",
+        "  <td>     <a target=\"_blank\" href=\"https://github.com/tensorflow/docs-l10n/blob/master/site/zh-cn/agents/tutorials/6_reinforce_tutorial.ipynb\"><img src=\"https://tensorflow.google.cn/images/GitHub-Mark-32px.png\">在 Github 上查看源代码</a> </td>\n",
         "  <td>     <a href=\"https://storage.googleapis.com/tensorflow_docs/docs-l10n/site/zh-cn/agents/tutorials/6_reinforce_tutorial.ipynb\"><img src=\"https://tensorflow.google.cn/images/download_logo_32px.png\">下载笔记本</a>   </td>\n",
         "</table>"
       ]

@@ -6,7 +6,7 @@
         "id": "klGNgWREsvQv"
       },
       "source": [
-        "**Copyright 2021 The TF-Agents Authors.**"
+        "**Copyright 2023 The TF-Agents Authors.**"
       ]
     },
     {

@@ -6,7 +6,7 @@
         "id": "1Pi_B2cvdBiW"
       },
       "source": [
-        "##### Copyright 2021 The TF-Agents Authors."
+        "##### Copyright 2023 The TF-Agents Authors."
       ]
     },
     {

@@ -6,7 +6,7 @@
         "id": "klGNgWREsvQv"
       },
       "source": [
-        "##### Copyright 2021 The TF-Agents Authors."
+        "##### Copyright 2023 The TF-Agents Authors."
       ]
     },
     {
@@ -40,12 +40,9 @@
         "# DQN C51/Rainbow\n",
         "\n",
         "<table class=\"tfo-notebook-buttons\" align=\"left\">\n",
-        "  <td>     <a target=\"_blank\" href=\"https://tensorflow.google.cn/agents/tutorials/9_c51_tutorial\"><img src=\"https://tensorflow.google.cn/images/tf_logo_32px.png\">在 TensorFlow.org 上查看</a>\n",
-        "</td>\n",
-        "  <td>     <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/docs-l10n/blob/master/site/zh-cn/agents/tutorials/9_c51_tutorial.ipynb\"><img src=\"https://tensorflow.google.cn/images/colab_logo_32px.png\">在 Google Colab 运行</a>\n",
-        "</td>\n",
-        "  <td>     <a target=\"_blank\" href=\"https://github.com/tensorflow/docs-l10n/blob/master/site/zh-cn/agents/tutorials/9_c51_tutorial.ipynb\"><img src=\"https://tensorflow.google.cn/images/GitHub-Mark-32px.png\">在 Github 上查看源代码</a>\n",
-        "</td>\n",
+        "  <td>     <a target=\"_blank\" href=\"https://tensorflow.google.cn/agents/tutorials/9_c51_tutorial\"><img src=\"https://tensorflow.google.cn/images/tf_logo_32px.png\">在 TensorFlow.org 上查看</a> </td>\n",
+        "  <td>     <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/docs-l10n/blob/master/site/zh-cn/agents/tutorials/9_c51_tutorial.ipynb\"><img src=\"https://tensorflow.google.cn/images/colab_logo_32px.png\">在 Google Colab 运行</a> </td>\n",
+        "  <td>     <a target=\"_blank\" href=\"https://github.com/tensorflow/docs-l10n/blob/master/site/zh-cn/agents/tutorials/9_c51_tutorial.ipynb\"><img src=\"https://tensorflow.google.cn/images/GitHub-Mark-32px.png\">在 Github 上查看源代码</a> </td>\n",
         "  <td>     <a href=\"https://storage.googleapis.com/tensorflow_docs/docs-l10n/site/zh-cn/agents/tutorials/9_c51_tutorial.ipynb\"><img src=\"https://tensorflow.google.cn/images/download_logo_32px.png\">下载笔记本</a>   </td>\n",
         "</table>"
       ]

@@ -6,7 +6,7 @@
         "id": "klGNgWREsvQv"
       },
       "source": [
-        "##### Copyright 2020 The TF-Agents Authors."
+        "##### Copyright 2023 The TF-Agents Authors."
       ]
     },
     {

@@ -6,7 +6,7 @@
         "id": "I1JiGtmRbLVp"
       },
       "source": [
-        "##### Copyright 2021 The TF-Agents Authors."
+        "##### Copyright 2023 The TF-Agents Authors."
       ]
     },
     {

@@ -6,7 +6,7 @@
         "id": "nPjtEgqN4SjA"
       },
       "source": [
-        "##### Copyright 2021 The TF-Agents Authors."
+        "##### Copyright 2023 The TF-Agents Authors."
       ]
     },
     {

@@ -6,7 +6,7 @@
         "id": "6tzp2bPEiK_S"
       },
       "source": [
-        "##### Copyright 2022 The TF-Agents Authors."
+        "##### Copyright 2023 The TF-Agents Authors."
       ]
     },
     {
@@ -49,14 +49,10 @@
         "### 开始\n",
         "\n",
         "<table class=\"tfo-notebook-buttons\" align=\"left\">\n",
-        "  <td>     <a target=\"_blank\" href=\"https://tensorflow.google.cn/agents/tutorials/ranking_tutorial\">     <img src=\"https://tensorflow.google.cn/images/tf_logo_32px.png\">     在 TensorFlow.org 上查看</a>\n",
-        "</td>\n",
-        "  <td>     <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/docs-l10n/blob/master/site/zh-cn/agents/tutorials/ranking_tutorial.ipynb\">     <img src=\"https://tensorflow.google.cn/images/colab_logo_32px.png\">     在 Google Colab 中运行</a>\n",
-        "</td>\n",
-        "  <td>     <a target=\"_blank\" href=\"https://github.com/tensorflow/docs-l10n/blob/master/site/zh-cn/agents/tutorials/ranking_tutorial.ipynb\">     <img src=\"https://tensorflow.google.cn/images/GitHub-Mark-32px.png\">     在 GitHub 上查看源代码</a>\n",
-        "</td>\n",
-        "  <td>     <a href=\"https://storage.googleapis.com/tensorflow_docs/docs-l10n/site/zh-cn/agents/tutorials/ranking_tutorial.ipynb\"><img src=\"https://tensorflow.google.cn/images/download_logo_32px.png\">下载笔记本</a>\n",
-        "</td>\n",
+        "  <td>     <a target=\"_blank\" href=\"https://tensorflow.google.cn/agents/tutorials/ranking_tutorial\">     <img src=\"https://tensorflow.google.cn/images/tf_logo_32px.png\">     在 TensorFlow.org 上查看</a> </td>\n",
+        "  <td>     <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/docs-l10n/blob/master/site/zh-cn/agents/tutorials/ranking_tutorial.ipynb\">     <img src=\"https://tensorflow.google.cn/images/colab_logo_32px.png\">     在 Google Colab 中运行</a> </td>\n",
+        "  <td>     <a target=\"_blank\" href=\"https://github.com/tensorflow/docs-l10n/blob/master/site/zh-cn/agents/tutorials/ranking_tutorial.ipynb\">     <img src=\"https://tensorflow.google.cn/images/GitHub-Mark-32px.png\">     在 GitHub 上查看源代码</a> </td>\n",
+        "  <td>     <a href=\"https://storage.googleapis.com/tensorflow_docs/docs-l10n/site/zh-cn/agents/tutorials/ranking_tutorial.ipynb\"><img src=\"https://tensorflow.google.cn/images/download_logo_32px.png\">下载笔记本</a> </td>\n",
         "</table>\n"
       ]
     },