Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ZH-CN] Translations #733

Merged
merged 5 commits into from Dec 19, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 6 additions & 4 deletions site/zh-cn/agents/tutorials/0_intro_rl.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
"id": "I1JiGtmRbLVp"
},
"source": [
"##### Copyright 2021 The TF-Agents Authors."
"##### Copyright 2023 The TF-Agents Authors."
]
},
{
Expand Down Expand Up @@ -100,7 +100,9 @@
"\n",
"Q-Learning 基于 Q 函数的概念。策略 $\\pi$, $Q^{\\pi}(s, a)$ 的 Q 函数(又称状态-操作值函数)用于衡量通过首先采取操作 $a$、随后采取策略 $\\pi$,从状态 $s$ 获得的预期回报或折扣奖励总和。我们将最优 Q 函数 $Q^*(s, a)$ 定义为从观测值 $s$ 开始,先采取操作 $a$,随后采取最优策略所能获得的最大回报。最优 Q 函数遵循以下*贝尔曼*最优性方程:\n",
"\n",
"```\n",
"$\\begin{equation}Q^\\ast(s, a) = \\mathbb{E}[ r + \\gamma \\max_{a'} Q^\\ast(s', a') ]\\end{equation}$\n",
"```\n",
"\n",
"这意味着,从状态 $s$ 和操作 $a$ 获得的最大回报等于即时奖励 $r$ 与通过遵循最优策略,随后直到片段结束所获得的回报(折扣因子为 $\\gamma$)的总和(即,来自下一个状态 $s'$ 的最高奖励)。期望是在即时奖励 $r$ 的分布以及可能的下一个状态 $s'$ 的基础上计算的。\n",
"\n",
Expand All @@ -110,17 +112,17 @@
"\n",
"对于大多数问题,将 $Q$ 函数表示为包含 $s$ 和 $a$ 每种组合的值的表是不切实际的。相反,我们训练一个函数逼近器(例如,带参数 $\\theta$ 的神经网络)来估算 Q 值,即 $Q(s, a; \\theta) \\approx Q^*(s, a)$。这可以通过在每个步骤 $i$ 使以下损失最小化来实现:\n",
"\n",
"$\\begin{equation}L_i(\\theta_i) = \\mathbb{E}*{s, a, r, s'\\sim \\rho(.)} \\left[ (y_i - Q(s, a; \\theta_i))^2 \\right]\\end{equation}$,其中 $y_i = r + \\gamma \\max*{a'} Q(s', a'; \\theta_{i-1})$\n",
"$\\begin{equation}L_i(\\theta_i) = \\mathbb{E}{em0}{s, a, r, s'\\sim \\rho(.)} \\left[ (y_i - Q(s, a; \\theta_i))^2 \\right]\\end{equation}$,其中 $y_i = r + \\gamma \\max{/em0}{a'} Q(s', a'; \\theta_{i-1})$\n",
"\n",
"此处,$y_i$ 称为 TD(时间差分)目标,而 $y_i - Q$ 称为 TD 误差。$\\rho$ 表示行为分布,即从环境中收集的转换 ${s, a, r, s'}$ 的分布。\n",
"此处,$y_i$ is 称为 TD(时间差分)目标,而 $y_i - Q$ 称为 TD 误差。$\\rho$ 表示行为分布,即从环境中收集的转换 ${s, a, r, s'}$ 的分布。\n",
"\n",
"注意,先前迭代 $\\theta_{i-1}$ 中的参数是固定的,不会更新。实际上,我们使用前几次迭代而不是最后一次迭代的网络参数快照。此副本称为*目标网络*。\n",
"\n",
"Q-Learning 是一种*离策略*算法,可在学习贪心策略 $a = \\max_{a} Q(s, a; \\theta)$ 的同时使用不同的行为策略在环境/收集数据过程中执行操作。此行为策略通常是一种 $\\epsilon$ 贪心策略,可选择概率为 $1-\\epsilon$ 的贪心操作和概率为 $\\epsilon$ 的随机操作,以确保良好覆盖状态-操作空间。\n",
"\n",
"### 经验回放\n",
"\n",
"为了避免计算 DQN 损失的全期望,我们可以使用随机梯度下降算法将其最小化。如果仅使用最后一个转换 ${s, a, r, s'}$ 来计算损失,那么这会简化为标准 Q-Learning。\n",
"为了避免计算 DQN 损失的全期望,我们可以使用随机梯度下降算法将其最小化。如果仅使用最后的转换 ${s, a, r, s'}$ 计算损失,这将简化为标准 Q-Learning。\n",
"\n",
"Atari DQN 工作引入了一种称为“经验回放”的技术,可使网络更新更加稳定。在数据收集的每个时间步骤,转换都会添加到称为*回放缓冲区*的循环缓冲区中。然后,在训练过程中,我们不是仅仅使用最新的转换来计算损失及其梯度,而是使用从回放缓冲区中采样的转换的 mini-batch 来计算它们。这样做有两个优点:通过在许多更新中重用每个转换来提高数据效率,以及在批次中使用不相关的转换来提高稳定性。\n"
]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
"id": "W7rEsKyWcxmu"
},
"source": [
"##### Copyright 2021 The TF-Agents Authors.\n"
"##### Copyright 2023 The TF-Agents Authors.\n"
]
},
{
Expand Down
11 changes: 4 additions & 7 deletions site/zh-cn/agents/tutorials/1_dqn_tutorial.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
"id": "klGNgWREsvQv"
},
"source": [
"##### Copyright 2021 The TF-Agents Authors."
"##### Copyright 2023 The TF-Agents Authors."
]
},
{
Expand Down Expand Up @@ -40,12 +40,9 @@
"# 使用 TF-Agents 训练深度 Q 网络\n",
"\n",
"<table class=\"tfo-notebook-buttons\" align=\"left\">\n",
" <td> <a target=\"_blank\" href=\"https://tensorflow.google.cn/agents/tutorials/1_dqn_tutorial\"><img src=\"https://tensorflow.google.cn/images/tf_logo_32px.png\">在 TensorFlow.org 上查看</a>\n",
"</td>\n",
" <td> <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/docs-l10n/blob/master/site/zh-cn/agents/tutorials/1_dqn_tutorial.ipynb\"> <img src=\"https://tensorflow.google.cn/images/colab_logo_32px.png\"> 在 Google Colab 中运行</a>\n",
"</td>\n",
" <td> <a target=\"_blank\" href=\"https://github.com/tensorflow/docs-l10n/blob/master/site/zh-cn/agents/tutorials/1_dqn_tutorial.ipynb\"><img src=\"https://tensorflow.google.cn/images/GitHub-Mark-32px.png\">在 Github 上查看源代码</a>\n",
"</td>\n",
" <td> <a target=\"_blank\" href=\"https://tensorflow.google.cn/agents/tutorials/1_dqn_tutorial\"><img src=\"https://tensorflow.google.cn/images/tf_logo_32px.png\">在 TensorFlow.org 上查看</a> </td>\n",
" <td> <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/docs-l10n/blob/master/site/zh-cn/agents/tutorials/1_dqn_tutorial.ipynb\"> <img src=\"https://tensorflow.google.cn/images/colab_logo_32px.png\"> 在 Google Colab 中运行</a> </td>\n",
" <td> <a target=\"_blank\" href=\"https://github.com/tensorflow/docs-l10n/blob/master/site/zh-cn/agents/tutorials/1_dqn_tutorial.ipynb\"><img src=\"https://tensorflow.google.cn/images/GitHub-Mark-32px.png\">在 Github 上查看源代码</a> </td>\n",
" <td> <a href=\"https://storage.googleapis.com/tensorflow_docs/docs-l10n/site/zh-cn/agents/tutorials/1_dqn_tutorial.ipynb\"><img src=\"https://tensorflow.google.cn/images/download_logo_32px.png\">下载笔记本</a> </td>\n",
"</table>"
]
Expand Down
3 changes: 1 addition & 2 deletions site/zh-cn/agents/tutorials/2_environments_tutorial.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
"id": "Ma19Ks2CTDbZ"
},
"source": [
"##### Copyright 2021 The TF-Agents Authors."
"##### Copyright 2023 The TF-Agents Authors."
]
},
{
Expand Down Expand Up @@ -95,7 +95,6 @@
},
"outputs": [],
"source": [
"!pip install \"gym>=0.21.0\"\n",
"!pip install tf-agents[reverb]\n"
]
},
Expand Down
11 changes: 4 additions & 7 deletions site/zh-cn/agents/tutorials/3_policies_tutorial.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
"id": "1Pi_B2cvdBiW"
},
"source": [
"##### Copyright 2021 The TF-Agents Authors."
"##### Copyright 2023 The TF-Agents Authors."
]
},
{
Expand Down Expand Up @@ -40,12 +40,9 @@
"# 策略\n",
"\n",
"<table class=\"tfo-notebook-buttons\" align=\"left\">\n",
" <td> <a target=\"_blank\" href=\"https://tensorflow.google.cn/agents/tutorials/3_policies_tutorial\"><img src=\"https://tensorflow.google.cn/images/tf_logo_32px.png\">在 TensorFlow.org 上查看</a>\n",
"</td>\n",
" <td> <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/docs-l10n/blob/master/site/zh-cn/agents/tutorials/3_policies_tutorial.ipynb\"><img src=\"https://tensorflow.google.cn/images/colab_logo_32px.png\">在 Google Colab 运行</a>\n",
"</td>\n",
" <td> <a target=\"_blank\" href=\"https://github.com/tensorflow/docs-l10n/blob/master/site/zh-cn/agents/tutorials/3_policies_tutorial.ipynb\"><img src=\"https://tensorflow.google.cn/images/GitHub-Mark-32px.png\">在 Github 上查看源代码</a>\n",
"</td>\n",
" <td> <a target=\"_blank\" href=\"https://tensorflow.google.cn/agents/tutorials/3_policies_tutorial\"><img src=\"https://tensorflow.google.cn/images/tf_logo_32px.png\">在 TensorFlow.org 上查看</a> </td>\n",
" <td> <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/docs-l10n/blob/master/site/zh-cn/agents/tutorials/3_policies_tutorial.ipynb\"><img src=\"https://tensorflow.google.cn/images/colab_logo_32px.png\">在 Google Colab 运行</a> </td>\n",
" <td> <a target=\"_blank\" href=\"https://github.com/tensorflow/docs-l10n/blob/master/site/zh-cn/agents/tutorials/3_policies_tutorial.ipynb\"><img src=\"https://tensorflow.google.cn/images/GitHub-Mark-32px.png\">在 Github 上查看源代码</a> </td>\n",
" <td> <a href=\"https://storage.googleapis.com/tensorflow_docs/docs-l10n/site/zh-cn/agents/tutorials/3_policies_tutorial.ipynb\"><img src=\"https://tensorflow.google.cn/images/download_logo_32px.png\">下载笔记本</a> </td>\n",
"</table>"
]
Expand Down
2 changes: 1 addition & 1 deletion site/zh-cn/agents/tutorials/4_drivers_tutorial.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
"id": "beObUOFyuRjT"
},
"source": [
"##### Copyright 2021 The TF-Agents Authors."
"##### Copyright 2023 The TF-Agents Authors."
]
},
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
"id": "beObUOFyuRjT"
},
"source": [
"##### Copyright 2021 The TF-Agents Authors."
"##### Copyright 2023 The TF-Agents Authors."
]
},
{
Expand Down
11 changes: 4 additions & 7 deletions site/zh-cn/agents/tutorials/6_reinforce_tutorial.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
"id": "klGNgWREsvQv"
},
"source": [
"##### Copyright 2021 The TF-Agents Authors."
"##### Copyright 2023 The TF-Agents Authors."
]
},
{
Expand Down Expand Up @@ -40,12 +40,9 @@
"# REINFORCE 代理\n",
"\n",
"<table class=\"tfo-notebook-buttons\" align=\"left\">\n",
" <td> <a target=\"_blank\" href=\"https://tensorflow.google.cn/agents/tutorials/6_reinforce_tutorial\"><img src=\"https://tensorflow.google.cn/images/tf_logo_32px.png\">在 TensorFlow.org 上查看</a>\n",
"</td>\n",
" <td> <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/docs-l10n/blob/master/site/zh-cn/agents/tutorials/6_reinforce_tutorial.ipynb\"><img src=\"https://tensorflow.google.cn/images/colab_logo_32px.png\">在 Google Colab 运行</a>\n",
"</td>\n",
" <td> <a target=\"_blank\" href=\"https://github.com/tensorflow/docs-l10n/blob/master/site/zh-cn/agents/tutorials/6_reinforce_tutorial.ipynb\"><img src=\"https://tensorflow.google.cn/images/GitHub-Mark-32px.png\">在 Github 上查看源代码</a>\n",
"</td>\n",
" <td> <a target=\"_blank\" href=\"https://tensorflow.google.cn/agents/tutorials/6_reinforce_tutorial\"><img src=\"https://tensorflow.google.cn/images/tf_logo_32px.png\">在 TensorFlow.org 上查看</a> </td>\n",
" <td> <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/docs-l10n/blob/master/site/zh-cn/agents/tutorials/6_reinforce_tutorial.ipynb\"><img src=\"https://tensorflow.google.cn/images/colab_logo_32px.png\">在 Google Colab 运行</a> </td>\n",
" <td> <a target=\"_blank\" href=\"https://github.com/tensorflow/docs-l10n/blob/master/site/zh-cn/agents/tutorials/6_reinforce_tutorial.ipynb\"><img src=\"https://tensorflow.google.cn/images/GitHub-Mark-32px.png\">在 Github 上查看源代码</a> </td>\n",
" <td> <a href=\"https://storage.googleapis.com/tensorflow_docs/docs-l10n/site/zh-cn/agents/tutorials/6_reinforce_tutorial.ipynb\"><img src=\"https://tensorflow.google.cn/images/download_logo_32px.png\">下载笔记本</a> </td>\n",
"</table>"
]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
"id": "klGNgWREsvQv"
},
"source": [
"**Copyright 2021 The TF-Agents Authors.**"
"**Copyright 2023 The TF-Agents Authors.**"
]
},
{
Expand Down
2 changes: 1 addition & 1 deletion site/zh-cn/agents/tutorials/8_networks_tutorial.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
"id": "1Pi_B2cvdBiW"
},
"source": [
"##### Copyright 2021 The TF-Agents Authors."
"##### Copyright 2023 The TF-Agents Authors."
]
},
{
Expand Down
11 changes: 4 additions & 7 deletions site/zh-cn/agents/tutorials/9_c51_tutorial.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
"id": "klGNgWREsvQv"
},
"source": [
"##### Copyright 2021 The TF-Agents Authors."
"##### Copyright 2023 The TF-Agents Authors."
]
},
{
Expand Down Expand Up @@ -40,12 +40,9 @@
"# DQN C51/Rainbow\n",
"\n",
"<table class=\"tfo-notebook-buttons\" align=\"left\">\n",
" <td> <a target=\"_blank\" href=\"https://tensorflow.google.cn/agents/tutorials/9_c51_tutorial\"><img src=\"https://tensorflow.google.cn/images/tf_logo_32px.png\">在 TensorFlow.org 上查看</a>\n",
"</td>\n",
" <td> <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/docs-l10n/blob/master/site/zh-cn/agents/tutorials/9_c51_tutorial.ipynb\"><img src=\"https://tensorflow.google.cn/images/colab_logo_32px.png\">在 Google Colab 运行</a>\n",
"</td>\n",
" <td> <a target=\"_blank\" href=\"https://github.com/tensorflow/docs-l10n/blob/master/site/zh-cn/agents/tutorials/9_c51_tutorial.ipynb\"><img src=\"https://tensorflow.google.cn/images/GitHub-Mark-32px.png\">在 Github 上查看源代码</a>\n",
"</td>\n",
" <td> <a target=\"_blank\" href=\"https://tensorflow.google.cn/agents/tutorials/9_c51_tutorial\"><img src=\"https://tensorflow.google.cn/images/tf_logo_32px.png\">在 TensorFlow.org 上查看</a> </td>\n",
" <td> <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/docs-l10n/blob/master/site/zh-cn/agents/tutorials/9_c51_tutorial.ipynb\"><img src=\"https://tensorflow.google.cn/images/colab_logo_32px.png\">在 Google Colab 运行</a> </td>\n",
" <td> <a target=\"_blank\" href=\"https://github.com/tensorflow/docs-l10n/blob/master/site/zh-cn/agents/tutorials/9_c51_tutorial.ipynb\"><img src=\"https://tensorflow.google.cn/images/GitHub-Mark-32px.png\">在 Github 上查看源代码</a> </td>\n",
" <td> <a href=\"https://storage.googleapis.com/tensorflow_docs/docs-l10n/site/zh-cn/agents/tutorials/9_c51_tutorial.ipynb\"><img src=\"https://tensorflow.google.cn/images/download_logo_32px.png\">下载笔记本</a> </td>\n",
"</table>"
]
Expand Down
2 changes: 1 addition & 1 deletion site/zh-cn/agents/tutorials/bandits_tutorial.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
"id": "klGNgWREsvQv"
},
"source": [
"##### Copyright 2020 The TF-Agents Authors."
"##### Copyright 2023 The TF-Agents Authors."
]
},
{
Expand Down
2 changes: 1 addition & 1 deletion site/zh-cn/agents/tutorials/intro_bandit.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
"id": "I1JiGtmRbLVp"
},
"source": [
"##### Copyright 2021 The TF-Agents Authors."
"##### Copyright 2023 The TF-Agents Authors."
]
},
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
"id": "nPjtEgqN4SjA"
},
"source": [
"##### Copyright 2021 The TF-Agents Authors."
"##### Copyright 2023 The TF-Agents Authors."
]
},
{
Expand Down
14 changes: 5 additions & 9 deletions site/zh-cn/agents/tutorials/ranking_tutorial.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
"id": "6tzp2bPEiK_S"
},
"source": [
"##### Copyright 2022 The TF-Agents Authors."
"##### Copyright 2023 The TF-Agents Authors."
]
},
{
Expand Down Expand Up @@ -49,14 +49,10 @@
"### 开始\n",
"\n",
"<table class=\"tfo-notebook-buttons\" align=\"left\">\n",
" <td> <a target=\"_blank\" href=\"https://tensorflow.google.cn/agents/tutorials/ranking_tutorial\"> <img src=\"https://tensorflow.google.cn/images/tf_logo_32px.png\"> 在 TensorFlow.org 上查看</a>\n",
"</td>\n",
" <td> <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/docs-l10n/blob/master/site/zh-cn/agents/tutorials/ranking_tutorial.ipynb\"> <img src=\"https://tensorflow.google.cn/images/colab_logo_32px.png\"> 在 Google Colab 中运行</a>\n",
"</td>\n",
" <td> <a target=\"_blank\" href=\"https://github.com/tensorflow/docs-l10n/blob/master/site/zh-cn/agents/tutorials/ranking_tutorial.ipynb\"> <img src=\"https://tensorflow.google.cn/images/GitHub-Mark-32px.png\"> 在 GitHub 上查看源代码</a>\n",
"</td>\n",
" <td> <a href=\"https://storage.googleapis.com/tensorflow_docs/docs-l10n/site/zh-cn/agents/tutorials/ranking_tutorial.ipynb\"><img src=\"https://tensorflow.google.cn/images/download_logo_32px.png\">下载笔记本</a>\n",
"</td>\n",
" <td> <a target=\"_blank\" href=\"https://tensorflow.google.cn/agents/tutorials/ranking_tutorial\"> <img src=\"https://tensorflow.google.cn/images/tf_logo_32px.png\"> 在 TensorFlow.org 上查看</a> </td>\n",
" <td> <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/docs-l10n/blob/master/site/zh-cn/agents/tutorials/ranking_tutorial.ipynb\"> <img src=\"https://tensorflow.google.cn/images/colab_logo_32px.png\"> 在 Google Colab 中运行</a> </td>\n",
" <td> <a target=\"_blank\" href=\"https://github.com/tensorflow/docs-l10n/blob/master/site/zh-cn/agents/tutorials/ranking_tutorial.ipynb\"> <img src=\"https://tensorflow.google.cn/images/GitHub-Mark-32px.png\"> 在 GitHub 上查看源代码</a> </td>\n",
" <td> <a href=\"https://storage.googleapis.com/tensorflow_docs/docs-l10n/site/zh-cn/agents/tutorials/ranking_tutorial.ipynb\"><img src=\"https://tensorflow.google.cn/images/download_logo_32px.png\">下载笔记本</a> </td>\n",
"</table>\n"
]
},
Expand Down
Loading
Loading