2.0 documentation update (#1828)

* Updating documentation.
ROCm · Jul 8, 2019 · c02bedc · c02bedc
1 parent 7a8f787
commit c02bedc
Show file tree

Hide file tree

Showing 18 changed files with 430 additions and 249 deletions.
diff --git a/README.md b/README.md
@@ -18,8 +18,8 @@ MIOpen supports two programming models -
 * ROCm cmake modules can be installed from [here](https://github.com/RadeonOpenCompute/rocm-cmake)
 * [Half](http://half.sourceforge.net/) - IEEE 754-based half-precision floating point library
 * [Boost](http://www.boost.org/) at least version 1.58
-  * MIOpen uses `boost-system` and `boost-filesystem` packages to enable persistent [kernel cache](https://github.com/ROCmSoftwarePlatform/MIOpen/blob/master/doc/src/cache.md)
-* [rocBlas](https://github.com/ROCmSoftwarePlatform/rocBLAS) Minimum version 2.0.0 (recommended version 2.2.0)
+  * MIOpen uses `boost-system` and `boost-filesystem` packages to enable persistent [kernel cache](https://rocmsoftwareplatform.github.io/MIOpen/doc/html/cache.html)
+* [rocBlas](https://github.com/ROCmSoftwarePlatform/rocBLAS) Minimum version branch [master-rocm-2.6](https://github.com/ROCmSoftwarePlatform/rocBLAS/tree/master-rocm-2.6)
 
 
 ## Installing MIOpen with pre-built packages
@@ -44,12 +44,12 @@ cmake -P install_deps.cmake --prefix /some/local/dir
 ```
 This prefix can used to specify the dependency path during the configuration phase using the `CMAKE_PREFIX_PATH`.
 
-MIOpen's HIP backend uses [rocBlas](https://github.com/ROCmSoftwarePlatform/rocBLAS) by default. Users can intall rocBlas minimum release by using `apt-get install rocblas`. To disable using rocBlas set the configuration flag `-DMIOPEN_USE_ROCBLAS=Off`. rocBlas is *not* available for the OpenCL backend.
+MIOpen's HIP backend uses [rocBlas](https://github.com/ROCmSoftwarePlatform/rocBLAS) by default. Users can install rocBlas minimum release by using `apt-get install rocblas`. To disable using rocBlas set the configuration flag `-DMIOPEN_USE_ROCBLAS=Off`. rocBlas is *not* available for the OpenCL backend.
 
 
 ## Installing minimum dependencies in ROCm environment
 
-Users who are working in a fully installed and up to date ROCm environment may not wish to additionally install rocm-cmake, clang-ocl, MIOpenGEMM, or rocBLAS. This can be done by simpily inserting the command `--minimum` into the cmake command as shown below:
+Users who are working in a fully installed and up to date ROCm environment may not wish to additionally install rocm-cmake, clang-ocl, MIOpenGEMM, or rocBLAS. This can be done by simply inserting the command `--minimum` into the cmake command as shown below:
 
 ```
 cmake -P install_deps.cmake --minimum --prefix /some/local/dir
@@ -91,6 +91,7 @@ cmake -DMIOPEN_BACKEND=OpenCL -DCMAKE_PREFIX_PATH=/some/local/dir ..
 
 Set the C++ compiler to `hcc`.
 ```
+export CXX=<location-of-hcc-compiler>
 cmake -DMIOPEN_BACKEND=HIP -DCMAKE_PREFIX_PATH="<hip-installed-path>;<hcc-installed-path>;<clang-ocl-installed-path>" ..
 ```
 An example cmake step can be:
@@ -118,7 +119,7 @@ Database paths can be explicitly customized by means of `MIOPEN_SYSTEM_DB_PATH`
 
 If the user installs a new version of MIOpen, it is recommended that the user move, or delete their old user database file. The user can find the file with the suffix `*.updb.txt` in the user perf db path. 
 
-More information about the performance database can be found [here](https://github.com/ROCmSoftwarePlatform/MIOpen/blob/master/doc/src/perfdatabase.md).
+More information about the performance database can be found [here](https://rocmsoftwareplatform.github.io/MIOpen/doc/html/perfdatabase.html).
 
 
 ### Persistent Program Cache
@@ -127,7 +128,7 @@ MIOpen by default caches the device programs in the location `~/.cache/miopen/`.
 
 Users can also disable the cache during runtime using the environmental variable set as `MIOPEN_DISABLE_CACHE=1`. 
 
-If the compiler changes, or the user modifies the kernels then the cache must be deleted for the MIOpen version in use; e.g., `rm -rf ~/.cache/miopen/<miopen-version-number>`. More information about the cache can be found [here](https://github.com/ROCmSoftwarePlatform/MIOpen/blob/master/doc/src/cache.md).
+If the compiler changes, or the user modifies the kernels then the cache must be deleted for the MIOpen version in use; e.g., `rm -rf ~/.cache/miopen/<miopen-version-number>`. More information about the cache can be found [here](https://rocmsoftwareplatform.github.io/MIOpen/doc/html/cache.html).
 
 
 ### Changing the cmake configuration
@@ -158,7 +159,7 @@ The driver can be built using the `MIOpenDriver` target:
 
 ` cmake --build . --config Release --target MIOpenDriver ` **OR** ` make MIOpenDriver `
 
-Documentation on how to run the driver is [here](https://github.com/ROCmSoftwarePlatform/MIOpen/blob/master/driver/README.md). 
+Documentation on how to run the driver is [here](https://rocmsoftwareplatform.github.io/MIOpen/doc/html/driver.html). 
 
 ## Running the tests
 
@@ -187,7 +188,10 @@ HTML and PDFs are generated using [Sphinx](http://www.sphinx-doc.org/en/stable/i
 
 Requirements for both Sphinx, Breathe, and the ReadTheDocs theme can be filled for these in the MIOpen/doc folder:
 
-`pip install -r ./requirements.txt`
+```
+pip install -r ./requirements.txt
+```
+
 
 Depending on your setup `sudo` may be required for the pip install.
 
@@ -214,7 +218,7 @@ sudo apt-get install libboost-system-dev
 sudo apt-get install libboost-filesystem-dev
 ```
 
-*Note:* MIOpen by default will attempt to build with Boost staticially linked libraries. If it is needed, the user can build with dynamically linked Boost libraries by using this flag during the configruation stage:
+*Note:* MIOpen by default will attempt to build with Boost statically linked libraries. If it is needed, the user can build with dynamically linked Boost libraries by using this flag during the configruation stage:
 ```
 -DBoost_USE_STATIC_LIBS=Off
 ```
@@ -226,9 +230,13 @@ The `half` header needs to be installed from [here](http://half.sourceforge.net/
 ## Using docker
 
 The easiest way is to use docker. You can build the top-level docker file:
+```
+docker build -t miopen .
+```
 
-    docker build -t miopen .
+Then to enter the development environment use `docker run`:
+```
+docker run --device='/dev/kfd' --device='/dev/dri' -v=`pwd`:/data -w /data --group-add video -it miopen
+```
 
-Then to enter the developement environment use `docker run`:
 
-    docker run --device='/dev/kfd' --device='/dev/dri' -v=`pwd`:/data -w /data --group-add video -it miopen
diff --git a/doc/src/DebugAndLogging.md b/doc/src/DebugAndLogging.md
@@ -28,7 +28,8 @@ All logging messages output to standard error stream (`stderr`). The following e
 > **_NOTE:_ When asking for technical support, please include the console log obtained with the following settings:**
 > ```
 > export MIOPEN_ENABLE_LOGGING=1
-> export MIOPEN_LOG_LEVEL=5
+> export MIOPEN_ENABLE_LOGGING_CMD=1
+> export MIOPEN_LOG_LEVEL=6
 > ```
 
 * `MIOPEN_ENABLE_LOGGING_MPMT` - When enabled, each log line is prefixed with information which allows the user to identify records printed from different processes and/or threads. Useful for debugging multi-process/multi-threaded apps.
@@ -43,12 +44,13 @@ The following list of environment variables allow for enabling/disabling various
 > 0, no, false, disable, disabled - to disable kernels/algorithm
 > ```
 
-If a variable is not set, then MIOpen behaves as if it is set to `enabled`, unless otherwise specified. So all kinds of kernels/algorithms are enabled by default and variables can be used for disabling them.
+If a variable is not set, then MIOpen behaves as if it is set to `enabled`, unless otherwise specified. So all kinds of kernels/algorithms are enabled by default and the below variables can be used for disabling them. The exception to this rule is `MIOPEN_DEBUG_CONV_IMPLICIT_GEMM` which is disabled by default:
 
 * `MIOPEN_DEBUG_CONV_FFT` – FFT convolution algorithm. 
 * `MIOPEN_DEBUG_CONV_DIRECT` – Direct convolution algorithm.
 * `MIOPEN_DEBUG_CONV_GEMM` - GEMM convolution algorithm. These are implemented on top of miopengemm or rocBlas.
 * `MIOPEN_DEBUG_GCN_ASM_KERNELS` – Kernels written in assembly language. So far, the most of the assembly kernels are implementing the Direct convolution algorithm.
+* `MIOPEN_DEBUG_CONV_IMPLICIT_GEMM` – FP32 implicit GEMM convolution algorithm, disabled by default due to compatibility issue with older compiler. Set to 1 to turn on implicit GEMM algorithm.
 * `MIOPEN_DEBUG_AMD_ROCM_PRECOMPILED_BINARIES` - Binary kernels. Right now all the binary kernels are Winograd ones, however, not all Winograds are binaries. To disable all Winograd algorithms, the following two vars can be used:
 * `MIOPEN_DEBUG_AMD_WINOGRAD_3X3` - FP32 Winograd Fwd/Bwd, filter size fixed to 3x3.
 * `MIOPEN_DEBUG_AMD_WINOGRAD_RXS` - FP32 and FP16 Winograd Fwd/Bwd, variable filter size.

diff --git a/doc/src/Getting_Started_FusionAPI.md b/doc/src/Getting_Started_FusionAPI.md
@@ -190,231 +190,11 @@ Once the fusion plan object is destroyed, all the operations created are destroy
 The tables below outlines the supported fusions for fp32 and fp16 as well as any applicable constraints. **(C = convolution, B = bias, N = batch normalization, A = activation)**
 
 
-### Convolution based FP32 Fusion for Inference
-
-
-<table border=1 cellpadding=0 cellspacing=0 width=714 style='border-collapse:
- collapse;table-layout:fixed;width:535pt'>
- <col width=93 style='mso-width-source:userset;mso-width-alt:2986;width:70pt'>
- <col width=76 style='mso-width-source:userset;mso-width-alt:2432;width:57pt'>
- <col width=51 style='mso-width-source:userset;mso-width-alt:1621;width:38pt'>
- <col width=171 style='mso-width-source:userset;mso-width-alt:5461;width:128pt'>
- <col width=51 style='mso-width-source:userset;mso-width-alt:1621;width:38pt'>
- <col width=140 style='mso-width-source:userset;mso-width-alt:4480;width:105pt'>
- <col width=132 style='mso-width-source:userset;mso-width-alt:4224;width:99pt'>
- <tr height=21 style='height:16.0pt'>
-  <td colspan=7 height=21 class=xl66 width=714 style='height:16.0pt;width:535pt'>Single
-  Precision Floating Point</td>
- </tr>
- <tr height=45 style='height:34.0pt'>
-  <td height=45 class=xl66 style='height:34.0pt'><center><b>Combination</b></center></td>
-  <td class=xl66><center><b>Conv Algo</b></center></td>
-  <td class=xl66><center><b>Stride</b></center></td>
-  <td class=xl66><center><b>Filter Dims</b></center></td>
-  <td class=xl68 width=51 style='width:38pt'><center><b>N Mode*</b></center></td>
-  <td class=xl66><center><b>Activations</b></center></td>
-  <td class=xl68 width=132 style='width:99pt'><center><b>Other Constraints</b></center></td>
- </tr>
- <tr height=107 style='mso-height-source:userset;height:80.0pt'>
-  <td height=107 class=xl65 style='height:80.0pt'>CBNA</td>
-  <td class=xl65>Direct</td>
-  <td class=xl67 width=51 style='width:38pt'>1 and 2</td>
-  <td class=xl67 width=171 style='width:128pt'>3x3, 5x5, 7x7, 9x9, 11x11</td>
-  <td class=xl65>All</td>
-  <td class=xl65>All</td>
-  <td class=xl67 width=132 style='width:99pt'>stride and padding must be either
-  1 or 2</td>
- </tr>
- <tr height=63 style='mso-height-source:userset;height:47.0pt'>
-  <td rowspan=12 height=354 class=xl65 style='height:263.0pt'>CBA</td>
-  <td class=xl65>Direct</td>
-  <td class=xl65></td>
-  <td class=xl67 width=171 style='width:128pt'>1x1</td>
-  <td class=xl65></td>
-  <td class=xl65>All</td>
-  <td class=xl67 width=132 style='width:99pt'>stride/ padding not supported</td>
- </tr>
- <tr height=23 style='height:17.0pt'>
-  <td rowspan=11 height=291 class=xl65 style='height:216.0pt'>Winograd</td>
-  <td class=xl65>1</td>
-  <td class=xl67 width=171 style='width:128pt'>1x1, 2x2</td>
-  <td class=xl65>N/A</td>
-  <td class=xl65>Relu, Leaky Relu</td>
-  <td class=xl67 width=132 style='width:99pt'>c &gt;= 18</td>
- </tr>
- <tr height=39 style='mso-height-source:userset;height:29.0pt'>
-  <td height=39 class=xl65 style='height:29.0pt'>1</td>
-  <td class=xl67 width=171 style='width:128pt'>3x3</td>
-  <td class=xl65></td>
-  <td class=xl65>Relu, Leaky Relu</td>
-  <td class=xl67 width=132 style='width:99pt'>c &gt;= 18 and c is even</td>
- </tr>
- <tr height=23 style='height:17.0pt'>
-  <td height=23 class=xl65 style='height:17.0pt'>1</td>
-  <td class=xl67 width=171 style='width:128pt'>4x4, 5x5, 6x6</td>
-  <td class=xl65></td>
-  <td class=xl65>Relu, Leaky Relu</td>
-  <td class=xl67 width=132 style='width:99pt'>4 x c &gt;= 18</td>
- </tr>
- <tr height=23 style='height:17.0pt'>
-  <td height=23 class=xl65 style='height:17.0pt'>1</td>
-  <td class=xl67 width=171 style='width:128pt'>7x7, 8x8, 9x9</td>
-  <td class=xl65></td>
-  <td class=xl65>Relu, Leaky Relu</td>
-  <td class=xl67 width=132 style='width:99pt'>12 x c &gt;= 18</td>
- </tr>
- <tr height=23 style='height:17.0pt'>
-  <td height=23 class=xl65 style='height:17.0pt'>1</td>
-  <td class=xl67 width=171 style='width:128pt'>10x10, 11x11, 12x12</td>
-  <td class=xl65></td>
-  <td class=xl65>Relu, Leaky Relu</td>
-  <td class=xl67 width=132 style='width:99pt'>16 x c &gt;= 18</td>
- </tr>
- <tr height=23 style='height:17.0pt'>
-  <td height=23 class=xl65 style='height:17.0pt'>1</td>
-  <td class=xl67 width=171 style='width:128pt'>larger filter sizes</td>
-  <td class=xl65></td>
-  <td class=xl65>Relu, Leaky Relu</td>
-  <td class=xl67 width=132 style='width:99pt'>none</td>
- </tr>
- <tr height=23 style='height:17.0pt'>
-  <td height=23 class=xl65 style='height:17.0pt'>2</td>
-  <td class=xl67 width=171 style='width:128pt'>1x1</td>
-  <td class=xl65></td>
-  <td class=xl65>Relu, Leaky Relu</td>
-  <td class=xl67 width=132 style='width:99pt'>2 x c &gt;= 18</td>
- </tr>
- <tr height=23 style='height:17.0pt'>
-  <td height=23 class=xl65 style='height:17.0pt'>2</td>
-  <td class=xl67 width=171 style='width:128pt'>2x2, 3x3, 4x4, 5x5, 6x6</td>
-  <td class=xl65></td>
-  <td class=xl65>Relu, Leaky Relu</td>
-  <td class=xl67 width=132 style='width:99pt'>4 x c &gt;= 18</td>
- </tr>
- <tr height=23 style='height:17.0pt'>
-  <td height=23 class=xl65 style='height:17.0pt'>2</td>
-  <td class=xl67 width=171 style='width:128pt'>7x7</td>
-  <td class=xl65></td>
-  <td class=xl65>Relu, Leaky Relu</td>
-  <td class=xl67 width=132 style='width:99pt'>12 x c &gt;= 18</td>
- </tr>
- <tr height=45 style='height:34.0pt'>
-  <td height=45 class=xl65 style='height:34.0pt'>2</td>
-  <td class=xl67 width=171 style='width:128pt'>8x8, 9x9, 10x10, 11x11, 12x12</td>
-  <td class=xl65></td>
-  <td class=xl65>Relu, Leaky Relu</td>
-  <td class=xl67 width=132 style='width:99pt'>16 x c &gt;= 18</td>
- </tr>
- <tr height=23 style='height:17.0pt'>
-  <td height=23 class=xl65 style='height:17.0pt'>2</td>
-  <td class=xl67 width=171 style='width:128pt'>larger filter sizes</td>
-  <td class=xl65></td>
-  <td class=xl65>Relu, Leaky Relu</td>
-  <td class=xl67 width=132 style='width:99pt'>none</td>
- </tr>
- <tr height=45 style='height:34.0pt'>
-  <td height=45 class=xl65 style='height:34.0pt'>NA</td>
-  <td class=xl65>-</td>
-  <td class=xl65></td>
-  <td class=xl65>-</td>
-  <td class=xl65>All</td>
-  <td class=xl65>All</td>
-  <td class=xl67 width=132 style='width:99pt'>Padding not supported</td>
- </tr>
-</table>
-*N mode is either spatial, or per activation. For CBA other asymmetric kernels are supported as well, but are not enumerated here for brevity.
-
-<br><br>
-
-
-### Convolution based FP16 Fusion for Inference
-
-<table border=1 cellpadding=0 cellspacing=0 width=714 style='border-collapse:
- collapse;table-layout:fixed;width:535pt'>
- <col width=93 style='mso-width-source:userset;mso-width-alt:2986;width:70pt'>
- <col width=76 style='mso-width-source:userset;mso-width-alt:2432;width:57pt'>
- <col width=51 style='mso-width-source:userset;mso-width-alt:1621;width:38pt'>
- <col width=171 style='mso-width-source:userset;mso-width-alt:5461;width:128pt'>
- <col width=51 style='mso-width-source:userset;mso-width-alt:1621;width:38pt'>
- <col width=140 style='mso-width-source:userset;mso-width-alt:4480;width:105pt'>
- <col width=132 style='mso-width-source:userset;mso-width-alt:4224;width:99pt'>
- <tr height=21 style='height:16.0pt'>
-  <td colspan=7 height=21 class=xl67 width=714 style='height:16.0pt;width:535pt'><center><b>Half
-  Precision Floating Point</td></b></center>
- </tr>
- <tr height=45 style='height:34.0pt'>
-  <td height=45 class=xl66 style='height:34.0pt'><center><b>Combination</b></center></td>
-  <td class=xl66><center><b>Conv Algo</b></center></td>
-  <td class=xl66><center><b>Stride</b></center></td>
-  <td class=xl66><center><b>Filter Dims</b></center></td>
-  <td class=xl68 width=51 style='width:38pt'><center><b>N Mode*</b></center></td>
-  <td class=xl66><center><b>Activations</b></center></td>
-  <td class=xl68 width=132 style='width:99pt'><center><b>Other Constraints</b></center></td>
- </tr>
- <tr height=68 style='height:51.0pt'>
-  <td height=68 style='height:51.0pt'>CBNA</td>
-  <td>Direct</td>
-  <td class=xl69 width=51 style='width:38pt'>1 and 2</td>
-  <td>3x3, 5x5, 7x7, 9x9, 11x11</td>
-  <td>All</td>
-  <td>All</td>
-  <td class=xl68 width=132 style='width:99pt'>stride and padding must be either
-  1 or 2</td>
- </tr>
- <tr height=45 style='height:34.0pt'>
-  <td height=45 class=xl66 style='height:34.0pt'>CBA</td>
-  <td>Direct</td>
-  <td class=xl65></td>
-  <td>1x1</td>
-  <td></td>
-  <td>All</td>
-  <td class=xl68 width=132 style='width:99pt'>stride/ padding not supported</td>
- </tr>
-</table>
-
-*N mode is either spatial, or per activation. 
-<br><br>
-
-
-### Batch Normalization based fusion for FP32 and FP16 for Inference and Training
-<table border=1 cellpadding=0 cellspacing=0 width=713 style='border-collapse:
- collapse;table-layout:fixed;width:534pt'>
- <col width=108 style='mso-width-source:userset;mso-width-alt:3456;width:81pt'>
- <col width=87 style='width:65pt'>
- <col width=221 style='mso-width-source:userset;mso-width-alt:7082;width:166pt'>
- <col width=87 style='width:65pt'>
- <col width=123 style='mso-width-source:userset;mso-width-alt:3925;width:92pt'>
- <col width=87 style='width:65pt'>
- <tr height=45 style='height:34.0pt'>
-  <td height=45 class=xl65 width=108 style='height:34.0pt;width:81pt'><center><b>Combination</b></center></td>
-  <td class=xl65 width=87 style='width:65pt'><center><b>N mode*</b></center></td>
-  <td class=xl65 width=123 style='width:92pt'><center><b>Activations</b></center></td>
-  <td class=xl65 width=87 style='width:65pt'><center><b>Constraints</b></center></td>
- </tr>
- <tr height=45 style='height:34.0pt'>
-  <td height=45 class=xl66 width=108 style='height:34.0pt;width:81pt'>NA for inference</td>
-  <td class=xl66 width=87 style='width:65pt'><center>All</center></td>
-  <td class=xl66 width=123 style='width:92pt'><center>All</center></td>
-  <td class=xl66 width=87 style='width:65pt'>None </td>
- </tr>
- <tr height=45 style='height:34.0pt'>
-  <td height=46 class=xl67 width=108 style='height:34.0pt;width:81pt'>NA forward training</td>
-  <td class=xl66 width=87 style='width:65pt'><center>All</center></td>
-  <td class=xl66 width=123 style='width:92pt'><center>All</center></td>
-  <td class=xl66 width=87 style='width:65pt'>None </td>
- </tr>
-  <tr height=45 style='height:34.0pt'>
-  <td height=46 class=xl67 width=108 style='height:34.0pt;width:81pt'>NA backward training</td>
-  <td class=xl66 width=87 style='width:65pt'><center>All</center></td>
-  <td class=xl66 width=123 style='width:92pt'><center>All</center></td>
-  <td class=xl66 width=87 style='width:65pt'>None </td>
- </tr>
-
-</table>
-*N mode is either spatial, or per activation.
-<br><br>
+![Convolution based fp32 fusion](fp32fusions.png)
 
 
+![Convolution based fp16 fusion](fp16fusions.png)
+
 
 ## Performance Comparison to Non-Fused Kernels
 

diff --git a/doc/src/apireference.rst b/doc/src/apireference.rst
@@ -7,6 +7,7 @@ API Reference
     :maxdepth: 4
     :caption: Contents:
 
+    datatypes
     handle
     tensor
     activation
@@ -16,4 +17,5 @@ API Reference
     lrn
     pooling
     softmax
-    fusion
+    fusion
+    loss