DOC Fixed documents that refer to Bunch object scikit-learn#16438 (sc…

…ikit-learn#16447) * Added links to utils.Bunch and fixed format of the docstring in datasets * Added links to utils.Bunch in sklearn.compose * Added links to utils.Bunch in sklearn.tree * Added links to utils.Bunch in sklearn.ensemble * Added links to utils.Bunch in sklearn.inspection * Added links to utils.Bunch in sklearn.pipeline * modified docstring of Bunch * Added links to utils.Bunch to index.rst of sklearn.datasets * Fixed some docstrings because the lines are too long * Fixed some points as reviewed. * Add links and delete 'for more information...' * Fixed indent * Fixed forgotten points. * Fixed some points as reviewed.
LocalSEOGuide · Feb 27, 2020 · ca78d75 · ca78d75
1 parent 54cbf42
commit ca78d75
Show file tree

Hide file tree

Showing 18 changed files with 249 additions and 205 deletions.
diff --git a/doc/datasets/index.rst b/doc/datasets/index.rst
@@ -21,46 +21,50 @@ also possible to generate synthetic data.
 General dataset API
 ===================
 
-There are three main kinds of dataset interfaces that can be used to get 
+There are three main kinds of dataset interfaces that can be used to get
 datasets depending on the desired type of dataset.
-  
-**The dataset loaders.** They can be used to load small standard datasets, 
-described in the :ref:`toy_datasets` section.  
+
+**The dataset loaders.** They can be used to load small standard datasets,
+described in the :ref:`toy_datasets` section.
 
 **The dataset fetchers.** They can be used to download and load larger datasets,
 described in the :ref:`real_world_datasets` section.
 
-Both loaders and fetchers functions return a dictionary-like object holding 
-at least two items: an array of shape ``n_samples`` * ``n_features`` with 
-key ``data`` (except for 20newsgroups) and a numpy array of 
+Both loaders and fetchers functions return a :class:`sklearn.utils.Bunch`
+object holding at least two items:
+an array of shape ``n_samples`` * ``n_features`` with
+key ``data`` (except for 20newsgroups) and a numpy array of
 length ``n_samples``, containing the target values, with key ``target``.
 
+The Bunch object is a dictionary that exposes its keys are attributes.
+For more information about Bunch object, see :class:`sklearn.utils.Bunch`:
+
 It's also possible for almost all of these function to constrain the output
-to be a tuple containing only the data and the target, by setting the 
+to be a tuple containing only the data and the target, by setting the
 ``return_X_y`` parameter to ``True``.
 
-The datasets also contain a full description in their ``DESCR`` attribute and 
-some contain ``feature_names`` and ``target_names``. See the dataset 
-descriptions below for details.  
+The datasets also contain a full description in their ``DESCR`` attribute and
+some contain ``feature_names`` and ``target_names``. See the dataset
+descriptions below for details.
 
-**The dataset generation functions.** They can be used to generate controlled 
+**The dataset generation functions.** They can be used to generate controlled
 synthetic datasets, described in the :ref:`sample_generators` section.
 
 These functions return a tuple ``(X, y)`` consisting of a ``n_samples`` *
 ``n_features`` numpy array ``X`` and an array of length ``n_samples``
 containing the targets ``y``.
 
-In addition, there are also miscellaneous tools to load datasets of other 
+In addition, there are also miscellaneous tools to load datasets of other
 formats or from other locations, described in the :ref:`loading_other_datasets`
-section. 
+section.
 
 .. _toy_datasets:
 
 Toy datasets
 ============
 
-scikit-learn comes with a few small standard datasets that do not require to 
-download any file from some external website. 
+scikit-learn comes with a few small standard datasets that do not require to
+download any file from some external website.
 
 They can be loaded using the following functions:
 
@@ -484,17 +488,17 @@ Loading from external datasets
 scikit-learn works on any numeric data stored as numpy arrays or scipy sparse
 matrices. Other types that are convertible to numeric arrays such as pandas
 DataFrame are also acceptable.
-
-Here are some recommended ways to load standard columnar data into a 
-format usable by scikit-learn: 
 
-* `pandas.io <https://pandas.pydata.org/pandas-docs/stable/io.html>`_ 
+Here are some recommended ways to load standard columnar data into a
+format usable by scikit-learn:
+
+* `pandas.io <https://pandas.pydata.org/pandas-docs/stable/io.html>`_
   provides tools to read data from common formats including CSV, Excel, JSON
   and SQL. DataFrames may also be constructed from lists of tuples or dicts.
   Pandas handles heterogeneous data smoothly and provides tools for
   manipulation and conversion into a numeric array suitable for scikit-learn.
-* `scipy.io <https://docs.scipy.org/doc/scipy/reference/io.html>`_ 
-  specializes in binary formats often used in scientific computing 
+* `scipy.io <https://docs.scipy.org/doc/scipy/reference/io.html>`_
+  specializes in binary formats often used in scientific computing
   context such as .mat and .arff
 * `numpy/routines.io <https://docs.scipy.org/doc/numpy/reference/routines.io.html>`_
   for standard loading of columnar data into numpy arrays
@@ -508,18 +512,18 @@ For some miscellaneous data such as images, videos, and audio, you may wish to
 refer to:
 
 * `skimage.io <https://scikit-image.org/docs/dev/api/skimage.io.html>`_ or
-  `Imageio <https://imageio.readthedocs.io/en/latest/userapi.html>`_ 
+  `Imageio <https://imageio.readthedocs.io/en/latest/userapi.html>`_
   for loading images and videos into numpy arrays
-* `scipy.io.wavfile.read 
-  <https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.io.wavfile.read.html>`_ 
+* `scipy.io.wavfile.read
+  <https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.io.wavfile.read.html>`_
   for reading WAV files into a numpy array
 
-Categorical (or nominal) features stored as strings (common in pandas DataFrames) 
+Categorical (or nominal) features stored as strings (common in pandas DataFrames)
 will need converting to numerical features using :class:`sklearn.preprocessing.OneHotEncoder`
 or :class:`sklearn.preprocessing.OrdinalEncoder` or similar.
 See :ref:`preprocessing`.
 
-Note: if you manage your own numerical data it is recommended to use an 
+Note: if you manage your own numerical data it is recommended to use an
 optimized file format such as HDF5 to reduce data load times. Various libraries
-such as H5Py, PyTables and pandas provides a Python interface for reading and 
+such as H5Py, PyTables and pandas provides a Python interface for reading and
 writing data in that format.
diff --git a/sklearn/compose/_column_transformer.py b/sklearn/compose/_column_transformer.py
@@ -124,7 +124,7 @@ class ColumnTransformer(TransformerMixin, _BaseComposition):
         ``len(transformers_)==len(transformers)+1``, otherwise
         ``len(transformers_)==len(transformers)``.
 
-    named_transformers_ : Bunch
+    named_transformers_ : :class:`~sklearn.utils.Bunch`
         Read-only attribute to access any transformer by given name.
         Keys are transformer names and values are the fitted transformer
         objects.

diff --git a/sklearn/datasets/_base.py b/sklearn/datasets/_base.py
@@ -163,12 +163,20 @@ def load_files(container_path, description=None, categories=None,
 
     Returns
     -------
-    data : Bunch
-        Dictionary-like object, the interesting attributes are: either
-        data, the raw text data to learn, or 'filenames', the files
-        holding it, 'target', the classification labels (integer index),
-        'target_names', the meaning of the labels, and 'DESCR', the full
-        description of the dataset.
+    data : :class:`~sklearn.utils.Bunch`
+        Dictionary-like object, with the following attributes.
+
+        data : list of str
+            Only present when `load_content=True`.
+            The raw text data to learn.
+        target : ndarray
+            The target labels (integer index).
+        target_names : list
+            The names of target classes.
+        DESCR : str
+            The full description of the dataset.
+        filenames: ndarray
+            The filenames holding the dataset.
     """
     target = []
     target_names = []
@@ -295,8 +303,8 @@ def load_wine(return_X_y=False, as_frame=False):
 
     Returns
     -------
-    data : Bunch
-        Dictionary-like object, with attributes:
+    data : :class:`~sklearn.utils.Bunch`
+        Dictionary-like object, with the following attributes.
 
         data : {ndarray, dataframe} of shape (178, 13)
             The data matrix. If `as_frame=True`, `data` will be a pandas
@@ -409,8 +417,8 @@ def load_iris(return_X_y=False, as_frame=False):
 
     Returns
     -------
-    data : Bunch
-        Dictionary-like object, with attributes:
+    data : :class:`~sklearn.utils.Bunch`
+        Dictionary-like object, with the following attributes.
 
         data : {ndarray, dataframe} of shape (150, 4)
             The data matrix. If `as_frame=True`, `data` will be a pandas
@@ -521,8 +529,8 @@ def load_breast_cancer(return_X_y=False, as_frame=False):
 
     Returns
     -------
-    data : Bunch
-        Dictionary-like object, with attributes:
+    data : :class:`~sklearn.utils.Bunch`
+        Dictionary-like object, with the following attributes.
 
         data : {ndarray, dataframe} of shape (569, 30)
             The data matrix. If `as_frame=True`, `data` will be a pandas
@@ -645,8 +653,8 @@ def load_digits(n_class=10, return_X_y=False, as_frame=False):
 
     Returns
     -------
-    data : Bunch
-        Dictionary-like object, with attributes:
+    data : :class:`~sklearn.utils.Bunch`
+        Dictionary-like object, with the following attributes.
 
         data : {ndarray, dataframe} of shape (1797, 64)
             The flattened data matrix. If `as_frame=True`, `data` will be
@@ -759,8 +767,8 @@ def load_diabetes(return_X_y=False, as_frame=False):
 
     Returns
     -------
-    data : Bunch
-        Dictionary-like object, with attributes:
+    data : :class:`~sklearn.utils.Bunch`
+        Dictionary-like object, with the following attributes.
 
         data : {ndarray, dataframe} of shape (442, 10)
             The data matrix. If `as_frame=True`, `data` will be a pandas
@@ -853,8 +861,8 @@ def load_linnerud(return_X_y=False, as_frame=False):
 
     Returns
     -------
-    data : Bunch
-        Dictionary-like object, with attributes:
+    data : :class:`~sklearn.utils.Bunch`
+        Dictionary-like object, with the following attributes.
 
         data : {ndarray, dataframe} of shape (20, 3)
             The data matrix. If `as_frame=True`, `data` will be a pandas
@@ -943,12 +951,21 @@ def load_boston(return_X_y=False):
 
     Returns
     -------
-    data : Bunch
-        Dictionary-like object, the interesting attributes are:
-        'data', the data to learn, 'target', the regression targets,
-        'DESCR', the full description of the dataset,
-        and 'filename', the physical location of boston
-        csv dataset (added in version `0.20`).
+    data : :class:`~sklearn.utils.Bunch`
+        Dictionary-like object, with the following attributes.
+
+        data : ndarray of shape (506, 13)
+            The data matrix.
+        target : ndarray of shape (506, )
+            The regression target.
+        filename : str
+            The physical location of boston csv dataset.
+
+            .. versionadded:: 0.20
+        DESCR : str
+            The full description of the dataset.
+        feature_names : ndarray
+            The names of features
 
     (data, target) : tuple if ``return_X_y`` is True
 
@@ -1007,10 +1024,15 @@ def load_sample_images():
 
     Returns
     -------
-    data : Bunch
-        Dictionary-like object with the following attributes : 'images', the
-        two sample images, 'filenames', the file names for the images, and
-        'DESCR' the full description of the dataset.
+    data : :class:`~sklearn.utils.Bunch`
+        Dictionary-like object, with the following attributes.
+
+        images : list of ndarray of shape (427, 640, 3)
+            The two sample image.
+        filenames : list
+            The filenames for the images.
+        DESCR : str
+            The full description of the dataset.
 
     Examples
     --------

diff --git a/sklearn/datasets/_california_housing.py b/sklearn/datasets/_california_housing.py
@@ -87,21 +87,20 @@ def fetch_california_housing(data_home=None, download_if_missing=True,
 
     Returns
     -------
-    dataset : dict-like object with the following attributes:
-
-    dataset.data : ndarray, shape [20640, 8]
-        Each row corresponding to the 8 feature values in order.
-        If ``as_frame`` is True, ``data`` is a pandas object.
-
-    dataset.target : numpy array of shape (20640,)
-        Each value corresponds to the average house value in units of 100,000.
-        If ``as_frame`` is True, ``target`` is a pandas object.
-
-    dataset.feature_names : array of length 8
-        Array of ordered feature names used in the dataset.
-
-    dataset.DESCR : string
-        Description of the California housing dataset.
+    dataset : :class:`~sklearn.utils.Bunch`
+        Dictionary-like object, with the following attributes.
+
+        data : ndarray, shape (20640, 8)
+            Each row corresponding to the 8 feature values in order.
+            If ``as_frame`` is True, ``data`` is a pandas object.
+        target : numpy array of shape (20640,)
+            Each value corresponds to the average
+            house value in units of 100,000.
+            If ``as_frame`` is True, ``target`` is a pandas object.
+        feature_names : list of length 8
+            Array of ordered feature names used in the dataset.
+        DESCR : string
+            Description of the California housing dataset.
 
     (data, target) : tuple if ``return_X_y`` is True
 

diff --git a/sklearn/datasets/_covtype.py b/sklearn/datasets/_covtype.py
@@ -81,17 +81,17 @@ def fetch_covtype(data_home=None, download_if_missing=True,
 
     Returns
     -------
-    dataset : dict-like object with the following attributes:
-
-    dataset.data : numpy array of shape (581012, 54)
-        Each row corresponds to the 54 features in the dataset.
-
-    dataset.target : numpy array of shape (581012,)
-        Each value corresponds to one of the 7 forest covertypes with values
-        ranging between 1 to 7.
-
-    dataset.DESCR : string
-        Description of the forest covertype dataset.
+    dataset : :class:`~sklearn.utils.Bunch`
+        Dictionary-like object, with the following attributes.
+
+        data : numpy array of shape (581012, 54)
+            Each row corresponds to the 54 features in the dataset.
+        target : numpy array of shape (581012,)
+            Each value corresponds to one of
+            the 7 forest covertypes with values
+            ranging between 1 to 7.
+        DESCR : str
+            Description of the forest covertype dataset.
 
     (data, target) : tuple if ``return_X_y`` is True
 

diff --git a/sklearn/datasets/_kddcup99.py b/sklearn/datasets/_kddcup99.py
@@ -96,11 +96,15 @@ def fetch_kddcup99(subset=None, data_home=None, shuffle=False,
 
     Returns
     -------
-    data : Bunch
-        Dictionary-like object, the interesting attributes are:
-         - 'data', the data to learn.
-         - 'target', the regression target for each sample.
-         - 'DESCR', a description of the dataset.
+    data : :class:`~sklearn.utils.Bunch`
+        Dictionary-like object, with the following attributes.
+
+        data : ndarray of shape (494021, 41)
+            The data matrix to learn.
+        target : ndarray of shape (494021,)
+            The regression target for each sample.
+        DESCR : str
+            The full description of the dataset.
 
     (data, target) : tuple if ``return_X_y`` is True
 
@@ -190,13 +194,15 @@ def _fetch_brute_kddcup99(data_home=None,
 
     Returns
     -------
-    dataset : dict-like object with the following attributes:
-        dataset.data : numpy array of shape (494021, 41)
+    dataset : :class:`~sklearn.utils.Bunch`
+        Dictionary-like object, with the following attributes.
+
+        data : numpy array of shape (494021, 41)
             Each row corresponds to the 41 features in the dataset.
-        dataset.target : numpy array of shape (494021,)
+        target : numpy array of shape (494021,)
             Each value corresponds to one of the 21 attack types or to the
             label 'normal.'.
-        dataset.DESCR : string
+        DESCR : string
             Description of the kddcup99 dataset.
 
     """