[cherry-pick2.4]docs fix (#47669)

* #46165 * #45752 * fix some doc bug test=document_fix (#45488) * fix some doc bug test=document_fix * fix some docs issues, test=document_fix * beta -> \beta in softplus * threshold -> \varepsilon in softplus * parameter name * delta -> \delta in smooth_l1_loss * fix some docs test=document_fix * fix docs test=document_fix * fix docs && 增加空行 test=document_fix * Update python/paddle/nn/functional/activation.py, test=document_fix * Update python/paddle/nn/layer/activation.py, test=document_fix Co-authored-by: SigureMo <[email protected]> * [docs] add ipustrategy Hyperlink (#46422) * [docs] add ipustrategy Hyperlink * fix ipu_shard_guard docs; test=document_fix * [docs] add set_ipu_shard note * [docs] fix hyperlink * update framework.py * fix mlu_places docs; test=document_fix * fix put_along_axis docs; test=document_fix * fix flake8 W293 error, test=document_fix * fix typo in typing, test=document_fix Co-authored-by: Ligoml <[email protected]> Co-authored-by: Nyakku Shigure <[email protected]> * #46659 * Update README_cn.md (#46927) 修复了错别字 * #46738 * fix paddle.get_default_dtype (#47040) Chinese and English return values are inconsistent * fix bug Co-authored-by: 张春乔 <[email protected]> Co-authored-by: Infinity_lee <[email protected]> Co-authored-by: mrcangye <[email protected]> Co-authored-by: SigureMo <[email protected]> Co-authored-by: gouzil <[email protected]> Co-authored-by: Hamid Zare <[email protected]> Co-authored-by: Sqhttwl <[email protected]> Co-authored-by: OccupyMars2025 <[email protected]> Co-authored-by: 超级码牛 <[email protected]> Co-authored-by: jzhang533 <[email protected]>
PaddlePaddle · Nov 7, 2022 · cf668ab · cf668ab
1 parent 3a01478
commit cf668ab
Show file tree

Hide file tree

Showing 52 changed files with 1,193 additions and 1,301 deletions.
diff --git a/README.md b/README.md
@@ -89,8 +89,8 @@ We provide [English](https://www.paddlepaddle.org.cn/documentation/docs/en/guide
 
 ## Courses
 
-- [Server Deployments](https://aistudio.baidu.com/aistudio/course/introduce/19084): Courses intorducing high performance server deployments via local and remote services.
-- [Edge Deployments](https://aistudio.baidu.com/aistudio/course/introduce/22690): Courses intorducing edge deployments from mobile, IoT to web and applets.   
+- [Server Deployments](https://aistudio.baidu.com/aistudio/course/introduce/19084): Courses introducing high performance server deployments via local and remote services.
+- [Edge Deployments](https://aistudio.baidu.com/aistudio/course/introduce/22690): Courses introducing edge deployments from mobile, IoT to web and applets.
 
 ## Copyright and License
 PaddlePaddle is provided under the [Apache-2.0 license](LICENSE).
diff --git a/README_cn.md b/README_cn.md
@@ -88,7 +88,7 @@ PaddlePaddle用户可领取**免费Tesla V100在线算力资源**，训练模型
 ## 课程
 
 - [服务器部署](https://aistudio.baidu.com/aistudio/course/introduce/19084): 详细介绍高性能服务器端部署实操，包含本地端及服务化Serving部署等
-- [端侧部署](https://aistudio.baidu.com/aistudio/course/introduce/22690): 详细介绍端侧多场景部署实操，从移端端设备、IoT、网页到小程序部署
+- [端侧部署](https://aistudio.baidu.com/aistudio/course/introduce/22690): 详细介绍端侧多场景部署实操，从移动端设备、IoT、网页到小程序部署
 
 ## 版权和许可证
 PaddlePaddle由[Apache-2.0 license](LICENSE)提供
diff --git a/paddle/fluid/operators/activation_op.cc b/paddle/fluid/operators/activation_op.cc
@@ -172,9 +172,9 @@ class ActivationOpGrad : public framework::OperatorWithKernel {
 };
 
 UNUSED constexpr char SigmoidDoc[] = R"DOC(
-Sigmoid Activation Operator
+Sigmoid Activation
 
-$$out = \\frac{1}{1 + e^{-x}}$$
+$$out = \frac{1}{1 + e^{-x}}$$
 
 )DOC";
 

diff --git a/python/paddle/autograd/py_layer.py b/python/paddle/autograd/py_layer.py
@@ -55,7 +55,7 @@ def save_for_backward(self, *tensors):
         """
         Saves given tensors that backward need. Use ``saved_tensor`` in the `backward` to get the saved tensors.
 
-        .. note::
+        Note:
             This API should be called at most once, and only inside `forward`.
 
         Args:
@@ -341,7 +341,7 @@ def save_for_backward(self, *tensors):
         """
         Saves given tensors that backward need. Use ``saved_tensor`` in the `backward` to get the saved tensors.
 
-        .. note::
+        Note:
             This API should be called at most once, and only inside `forward`.
 
         Args:

diff --git a/python/paddle/device/cuda/__init__.py b/python/paddle/device/cuda/__init__.py
@@ -203,7 +203,7 @@ def max_memory_allocated(device=None):
     '''
     Return the peak size of gpu memory that is allocated to tensor of the given device.
 
-    .. note::
+    Note:
         The size of GPU memory allocated to tensor is 256-byte aligned in Paddle, which may larger than the memory size that tensor actually need.
         For instance, a float32 tensor with shape [1] in GPU will take up 256 bytes memory, even though storing a float32 data requires only 4 bytes.
 
@@ -269,7 +269,7 @@ def memory_allocated(device=None):
     '''
     Return the current size of gpu memory that is allocated to tensor of the given device.
 
-    .. note::
+    Note:
         The size of GPU memory allocated to tensor is 256-byte aligned in Paddle, which may be larger than the memory size that tensor actually need.
         For instance, a float32 tensor with shape [1] in GPU will take up 256 bytes memory, even though storing a float32 data requires only 4 bytes.
 

diff --git a/python/paddle/distributed/collective.py b/python/paddle/distributed/collective.py
@@ -1349,7 +1349,7 @@ def alltoall_single(
     """
     Scatter a single input tensor to all participators and gather the received tensors in out_tensor.
 
-    .. note::
+    Note:
         ``alltoall_single`` is only supported in eager mode.
 
     Args:

diff --git a/python/paddle/distributed/fleet/base/private_helper_function.py b/python/paddle/distributed/fleet/base/private_helper_function.py
@@ -30,9 +30,9 @@ def wait_server_ready(endpoints):
     ["127.0.0.1:8080", "127.0.0.1:8081"]
 
     Examples:
-    .. code-block:: python
+        .. code-block:: python
 
-         wait_server_ready(["127.0.0.1:8080", "127.0.0.1:8081"])
+             wait_server_ready(["127.0.0.1:8080", "127.0.0.1:8081"])
     """
     assert not isinstance(endpoints, str)
     while True:

diff --git a/python/paddle/distributed/parallel.py b/python/paddle/distributed/parallel.py
@@ -105,7 +105,7 @@ def init_parallel_env():
     """
     Initialize parallel training environment in dynamic graph mode.
 
-    .. note::
+    Note:
         Now initialize both `NCCL` and `GLOO` contexts for communication.
 
     Args:

diff --git a/python/paddle/distributed/sharding/group_sharded.py b/python/paddle/distributed/sharding/group_sharded.py
@@ -209,7 +209,7 @@ def save_group_sharded_model(model, output, optimizer=None):
     """
     Group sharded encapsulated model and optimizer state saving module.
 
-    .. note::
+    Note:
         If using save_group_sharded_model saves the model. When loading again, you need to set the model or optimizer state before using group_sharded_parallel.
 
     Args:

diff --git a/python/paddle/distribution/distribution.py b/python/paddle/distribution/distribution.py
@@ -140,7 +140,7 @@ def log_prob(self, value):
     def probs(self, value):
         """Probability density/mass function.
 
-        .. note::
+        Note:
 
             This method will be deprecated in the future, please use `prob`
             instead.

diff --git a/python/paddle/distribution/kl.py b/python/paddle/distribution/kl.py
@@ -38,11 +38,11 @@ def kl_divergence(p, q):
         KL(p||q) = \int p(x)log\frac{p(x)}{q(x)} \mathrm{d}x
 
     Args:
-        p (Distribution): ``Distribution`` object.
-        q (Distribution): ``Distribution`` object.
+        p (Distribution): ``Distribution`` object. Inherits from the Distribution Base class.
+        q (Distribution): ``Distribution`` object. Inherits from the Distribution Base class.
 
     Returns:
-        Tensor: Batchwise KL-divergence between distribution p and q.
+        Tensor, Batchwise KL-divergence between distribution p and q.
 
     Examples:
 
@@ -71,8 +71,8 @@ def register_kl(cls_p, cls_q):
     implemention funciton by the decorator.
 
     Args:
-        cls_p(Distribution): Subclass derived from ``Distribution``.
-        cls_q(Distribution): Subclass derived from ``Distribution``.
+        cls_p (Distribution): The Distribution type of Instance p. Subclass derived from ``Distribution``.
+        cls_q (Distribution): The Distribution type of Instance q. Subclass derived from ``Distribution``.
 
     Examples:
         .. code-block:: python

diff --git a/python/paddle/distribution/normal.py b/python/paddle/distribution/normal.py
@@ -47,7 +47,7 @@ class Normal(distribution.Distribution):
 
     .. math::
 
-        pdf(x; \mu, \sigma) = \\frac{1}{Z}e^{\\frac {-0.5 (x - \mu)^2}  {\sigma^2} }
+        pdf(x; \mu, \sigma) = \frac{1}{Z}e^{\frac {-0.5 (x - \mu)^2}  {\sigma^2} }
 
     .. math::
 
@@ -60,43 +60,43 @@ class Normal(distribution.Distribution):
     * :math:`Z`: is the normalization constant.
 
     Args:
-        loc(int|float|list|tuple|numpy.ndarray|Tensor): The mean of normal distribution.The data type is int, float, list, numpy.ndarray or Tensor.
-        scale(int|float|list|tuple|numpy.ndarray|Tensor): The std of normal distribution.The data type is int, float, list, numpy.ndarray or Tensor.
+        loc(int|float|list|tuple|numpy.ndarray|Tensor): The mean of normal distribution.The data type is float32 and float64.
+        scale(int|float|list|tuple|numpy.ndarray|Tensor): The std of normal distribution.The data type is float32 and float64.
         name(str, optional): Name for the operation (optional, default is None). For more information, please refer to :ref:`api_guide_Name`.
 
     Examples:
         .. code-block:: python
 
-          import paddle
-          from paddle.distribution import Normal
-
-          # Define a single scalar Normal distribution.
-          dist = Normal(loc=0., scale=3.)
-          # Define a batch of two scalar valued Normals.
-          # The first has mean 1 and standard deviation 11, the second 2 and 22.
-          dist = Normal(loc=[1., 2.], scale=[11., 22.])
-          # Get 3 samples, returning a 3 x 2 tensor.
-          dist.sample([3])
-
-          # Define a batch of two scalar valued Normals.
-          # Both have mean 1, but different standard deviations.
-          dist = Normal(loc=1., scale=[11., 22.])
-
-          # Complete example
-          value_tensor = paddle.to_tensor([0.8], dtype="float32")
-
-          normal_a = Normal([0.], [1.])
-          normal_b = Normal([0.5], [2.])
-          sample = normal_a.sample([2])
-          # a random tensor created by normal distribution with shape: [2, 1]
-          entropy = normal_a.entropy()
-          # [1.4189385] with shape: [1]
-          lp = normal_a.log_prob(value_tensor)
-          # [-1.2389386] with shape: [1]
-          p = normal_a.probs(value_tensor)
-          # [0.28969154] with shape: [1]
-          kl = normal_a.kl_divergence(normal_b)
-          # [0.34939718] with shape: [1]
+            import paddle
+            from paddle.distribution import Normal
+
+            # Define a single scalar Normal distribution.
+            dist = Normal(loc=0., scale=3.)
+            # Define a batch of two scalar valued Normals.
+            # The first has mean 1 and standard deviation 11, the second 2 and 22.
+            dist = Normal(loc=[1., 2.], scale=[11., 22.])
+            # Get 3 samples, returning a 3 x 2 tensor.
+            dist.sample([3])
+
+            # Define a batch of two scalar valued Normals.
+            # Both have mean 1, but different standard deviations.
+            dist = Normal(loc=1., scale=[11., 22.])
+
+            # Complete example
+            value_tensor = paddle.to_tensor([0.8], dtype="float32")
+
+            normal_a = Normal([0.], [1.])
+            normal_b = Normal([0.5], [2.])
+            sample = normal_a.sample([2])
+            # a random tensor created by normal distribution with shape: [2, 1]
+            entropy = normal_a.entropy()
+            # [1.4189385] with shape: [1]
+            lp = normal_a.log_prob(value_tensor)
+            # [-1.2389386] with shape: [1]
+            p = normal_a.probs(value_tensor)
+            # [0.28969154] with shape: [1]
+            kl = normal_a.kl_divergence(normal_b)
+            # [0.34939718] with shape: [1]
     """
 
     def __init__(self, loc, scale, name=None):
@@ -153,11 +153,11 @@ def sample(self, shape, seed=0):
         """Generate samples of the specified shape.
 
         Args:
-          shape (list): 1D `int32`. Shape of the generated samples.
-          seed (int): Python integer number.
+            shape (list): 1D `int32`. Shape of the generated samples.
+            seed (int): Python integer number.
 
         Returns:
-          Tensor: A tensor with prepended dimensions shape.The data type is float32.
+            Tensor, A tensor with prepended dimensions shape.The data type is float32.
 
         """
         if not _non_static_mode():
@@ -198,14 +198,14 @@ def entropy(self):
 
         .. math::
 
-            entropy(\sigma) = 0.5 \\log (2 \pi e \sigma^2)
+            entropy(\sigma) = 0.5 \log (2 \pi e \sigma^2)
 
         In the above equation:
 
         * :math:`scale = \sigma`: is the std.
 
         Returns:
-          Tensor: Shannon entropy of normal distribution.The data type is float32.
+            Tensor, Shannon entropy of normal distribution.The data type is float32.
 
         """
         name = self.name + '_entropy'
@@ -244,10 +244,10 @@ def probs(self, value):
         """Probability density/mass function.
 
         Args:
-          value (Tensor): The input tensor.
+            value (Tensor): The input tensor.
 
         Returns:
-          Tensor: probability.The data type is same with value.
+            Tensor, probability. The data type is same with value.
 
         """
         name = self.name + '_probs'
@@ -269,11 +269,11 @@ def kl_divergence(self, other):
 
         .. math::
 
-            KL\_divergence(\mu_0, \sigma_0; \mu_1, \sigma_1) = 0.5 (ratio^2 + (\\frac{diff}{\sigma_1})^2 - 1 - 2 \\ln {ratio})
+            KL\_divergence(\mu_0, \sigma_0; \mu_1, \sigma_1) = 0.5 (ratio^2 + (\frac{diff}{\sigma_1})^2 - 1 - 2 \ln {ratio})
 
         .. math::
 
-            ratio = \\frac{\sigma_0}{\sigma_1}
+            ratio = \frac{\sigma_0}{\sigma_1}
 
         .. math::
 
@@ -292,7 +292,7 @@ def kl_divergence(self, other):
             other (Normal): instance of Normal.
 
         Returns:
-            Tensor: kl-divergence between two normal distributions.The data type is float32.
+            Tensor, kl-divergence between two normal distributions.The data type is float32.
 
         """
         if not _non_static_mode():

diff --git a/python/paddle/distribution/transform.py b/python/paddle/distribution/transform.py
@@ -67,11 +67,11 @@ class Transform(object):
     used for transforming a random sample generated by ``Distribution`` 
     instance. 
 
-    Suppose :math:`X` is a K-dimensional random variable with probability 
-    density function :math:`p_X(x)`. A new random variable :math:`Y = f(X)` may 
-    be defined by transforming :math:`X` with a suitably well-behaved funciton 
-    :math:`f`. It suffices for what follows to note that if f is one-to-one and 
-    its inverse :math:`f^{-1}` have a well-defined Jacobian, then the density of 
+    Suppose :math:`X` is a K-dimensional random variable with probability
+    density function :math:`p_X(x)`. A new random variable :math:`Y = f(X)` may
+    be defined by transforming :math:`X` with a suitably well-behaved funciton
+    :math:`f`. It suffices for what follows to note that if `f` is one-to-one and
+    its inverse :math:`f^{-1}` have a well-defined Jacobian, then the density of
     :math:`Y` is
 
     .. math::
@@ -1049,16 +1049,16 @@ class StackTransform(Transform):
     specific axis.
 
     Args:
-        transforms(Sequence[Transform]): The sequence of transformations.
-        axis(int): The axis along which will be transformed.
+        transforms (Sequence[Transform]): The sequence of transformations.
+        axis (int, optional): The axis along which will be transformed. default
+            value is 0.
 
     Examples:
 
         .. code-block:: python
 
             import paddle
 
-
             x = paddle.stack(
                 (paddle.to_tensor([1., 2., 3.]), paddle.to_tensor([1, 2., 3.])), 1)
             t = paddle.distribution.StackTransform(
@@ -1071,11 +1071,13 @@ class StackTransform(Transform):
             #        [[2.71828175 , 1.         ],
             #         [7.38905621 , 4.         ],
             #         [20.08553696, 9.         ]])
+
             print(t.inverse(t.forward(x)))
             # Tensor(shape=[3, 2], dtype=float32, place=Place(gpu:0), stop_gradient=True,
             #        [[1., 1.],
             #         [2., 2.],
             #         [3., 3.]])
+
             print(t.forward_log_det_jacobian(x))
             # Tensor(shape=[3, 2], dtype=float32, place=Place(gpu:0), stop_gradient=True,
             #        [[1.        , 0.69314718],