Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SWDEV-389297, SWDEV-389588, SWDEV-389596, SWDEV-389599 - Update HIP documents for Windows SDK #3211

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 19 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,24 +14,27 @@ New projects can be developed directly in the portable HIP C++ language and can

## DISCLAIMER

The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions, and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard versionchanges, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. Any computer system has risks of security vulnerabilities that cannot be completely prevented or mitigated.AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes.THIS INFORMATION IS PROVIDED ‘AS IS.” AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS, OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY RELIANCE, DIRECT, INDIRECT, SPECIAL, OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. AMD, the AMD Arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies.
The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions, and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. Any computer system has risks of security vulnerabilities that cannot be completely prevented or mitigated.AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes.THIS INFORMATION IS PROVIDED ‘AS IS.” AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS, OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY RELIANCE, DIRECT, INDIRECT, SPECIAL, OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. AMD, the AMD Arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies.

© 2021 Advanced Micro Devices, Inc. All Rights Reserved.

## Repository branches:

The HIP repository maintains several branches. The branches that are of importance are:
On Linux, the HIP open source repository maintains several branches. The branches that are of importance are:

* develop branch: This is the default branch, on which the new features are still under development and visible. While this maybe of interest to many, it should be noted that this branch and the features under development might not be stable.
* Main branch: This is the stable branch. It is up to date with the latest release branch, for example, if the latest HIP release is rocm-4.3, main branch will be the repository based on this release.
* Release branches. These are branches corresponding to each ROCM release, listed with release tags, such as rocm-4.2, rocm-4.3, etc.

## Release tagging:
On Windows, however, HIP doesn't have open source.

HIP releases are typically naming convention for each ROCM release to help differentiate them.
## Release tagging:

On Linux, HIP releases are typically naming convention for each ROCM release to help differentiate them.
* rocm x.yy: These are the stable releases based on the ROCM release.
This type of release is typically made once a month.*
This type of release is typically made once a month.

On Windows, HIP is one part of HIP SDK package, aligns with each SDK software release.

## More Info:
- [Installation](INSTALL.md)
Expand Down Expand Up @@ -109,21 +112,23 @@ vector_square(T *C_d, const T *A_d, size_t N)
The HIP Runtime API code and compute kernel definition can exist in the same source file - HIP takes care of generating host and device code appropriately.

## HIP Portability and Compiler Technology
HIP C++ code can be compiled with either,
- On the NVIDIA CUDA platform, HIP provides header file which translate from the HIP runtime APIs to CUDA runtime APIs. The header file contains mostly inlined
HIP open source C++ code can be compiled with either,
- On the NVIDIA CUDA platform
HIP provides header file which translate from the HIP runtime APIs to CUDA runtime APIs. The header file contains mostly inlined
functions and thus has very low overhead - developers coding in HIP should expect the same performance as coding in native CUDA. The code is then
compiled with nvcc, the standard C++ compiler provided with the CUDA SDK. Developers can use any tools supported by the CUDA SDK including the CUDA
profiler and debugger.
- On the AMD ROCm platform, HIP provides a header and runtime library built on top of HIP-Clang compiler. The HIP runtime implements HIP streams, events, and memory APIs,
and is a object library that is linked with the application. The source code for all headers and the library implementation is available on GitHub.
HIP developers on ROCm can use AMD's ROCgdb (https://github.com/ROCm-Developer-Tools/ROCgdb) for debugging and profiling.
compiled with nvcc, the standard C++ compiler provided with the CUDA SDK. Developers can use any tools supported by the CUDA SDK including the CUDA profiler and debugger.
- On the AMD platform
On Linux, HIP provides a header and runtime library built on top of HIP-Clang compiler. The HIP runtime implements HIP streams, events, and memory APIs, and is a object library that is linked with the application.
On Linux, The source code for all headers and the library implementation is available on GitHub. HIP developers on ROCm Linux can use AMD's ROCgdb (https://github.com/ROCm-Developer-Tools/ROCgdb) for debugging and profiling.

On Windows, developers can install HIP SDK and implement their own applications via calling HIP APIs on any C++ development tools, like Microsoft Visual Studio.

Thus HIP source code can be compiled to run on either platform. Platform-specific features can be isolated to a specific platform using conditional compilation. Thus HIP
provides source portability to either platform. HIP provides the _hipcc_ compiler driver which will call the appropriate toolchain depending on the desired platform.


## Examples and Getting Started:

On Linux open source,
* A sample and [blog](https://github.com/ROCm-Developer-Tools/HIP/tree/main/samples/0_Intro/square) that uses any of [HIPIFY](https://github.com/ROCm-Developer-Tools/HIPIFY/blob/master/README.md) tools to convert a simple app from CUDA to HIP:


Expand All @@ -136,7 +141,7 @@ cd samples/01_Intro/square


## More Examples
The GitHub repository [HIP-Examples](https://github.com/ROCm-Developer-Tools/HIP-Examples.git) contains a hipified version of the popular Rodinia benchmark suite.
On Linux open source, the GitHub repository [HIP-Examples](https://github.com/ROCm-Developer-Tools/HIP-Examples.git) contains a hipified version of the popular Rodinia benchmark suite.
The README with the procedures and tips the team used during this porting effort is here: [Rodinia Porting Guide](https://github.com/ROCm-Developer-Tools/HIP-Examples/blob/master/rodinia_3.0/hip/README.hip_porting)

## Tour of the HIP Directories
Expand Down
43 changes: 23 additions & 20 deletions docs/markdown/hip_debugging.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ Table of Contents
* [Kernel Enqueue Serialization](#kernel-enqueue-serialization)
* [Making Device visible](#making-device-visible)
* [Dump code object](#dump-code-object)
* [HSA related environment variables](#HSA-related-environment-variables)
* [HSA related environment variables on Linux](#HSA-related-environment-variables-on-linux)
* [ General Debugging Tips](#general-debugging-tips)

## Debugging tools
Expand Down Expand Up @@ -127,11 +127,11 @@ Breakpoint 1, main ()
```

### Other Debugging Tools
There are also other debugging tools available online developers can google and choose the one best suits the debugging requirements.
There are also other debugging tools available online developers can google and choose the one best suits the debugging requirements. For example, Microsoft Visual Studio and Windgb tools are options on Windows.

## Debugging HIP Applications

Below is an example to show how to get useful information from the debugger while running a simple memory copy test, which caused an issue of segmentation fault.
Below is an example on Linux to show how to get useful information from the debugger while running a simple memory copy test, which caused an issue of segmentation fault.

```
test: simpleTest2<?> numElements=4194304 sizeElements=4194304 bytes
Expand Down Expand Up @@ -191,11 +191,14 @@ Thread 1 "hipMemcpy_simpl" received signal SIGSEGV, Segmentation fault.
...
```

On Windows, debugging HIP applications on IDE like Microsoft Visual Studio tools, are more informative and visible to debug codes, inspect variables, watch multiple details and examine the call stacks.

## Useful Environment Variables
HIP provides some environment variables which allow HIP, hip-clang, or HSA driver to disable some feature or optimization.

HIP provides some environment variables which allow HIP, hip-clang, or HSA driver on Linux to disable some feature or optimization.
These are not intended for production but can be useful diagnose synchronization problems in the application (or driver).

Some of the most useful environment variables are described here. They are supported on the ROCm path.
Some of the most useful environment variables are described here. They are supported on the ROCm path on Linux and Windows as well.

### Kernel Enqueue Serialization
Developers can control kernel command serialization from the host using the environment variable,
Expand Down Expand Up @@ -236,8 +239,8 @@ if (totalDeviceNum > 2) {
Developers can dump code object to analyze compiler related issues via setting environment variable,
GPU_DUMP_CODE_OBJECT

### HSA related environment variables
HSA provides some environment variables help to analyze issues in driver or hardware, for example,
### HSA related environment variables on Linux
On Linux with open source, HSA provides some environment variables help to analyze issues in driver or hardware, for example,

HSA_ENABLE_SDMA=0
It causes host-to-device and device-to-host copies to use compute shader blit kernels rather than the dedicated DMA copy engines.
Expand All @@ -250,23 +253,23 @@ This environment variable can be useful to diagnose interrupt storm issues in th

### Summary of environment variables in HIP

The following is the summary of the most useful environment variables in HIP.
The following is the summary of the most useful environment variables in HIP supporting on Linux and Windows.

| **Environment variable** | **Default value** | **Usage** |
| ---------------------------------------------------------------------------------------------------------------| ----------------- | --------- |
| AMD_LOG_LEVEL <br><sub> Enable HIP log on different Level. </sub> | 0 | 0: Disable log. <br> 1: Enable log on error level. <br> 2: Enable log on warning and below levels. <br> 0x3: Enable log on information and below levels. <br> 0x4: Decode and display AQL packets. |
| AMD_LOG_MASK <br><sub> Enable HIP log on different Level. </sub> | 0x7FFFFFFF | 0x1: Log API calls. <br> 0x02: Kernel and Copy Commands and Barriers. <br> 0x4: Synchronization and waiting for commands to finish. <br> 0x8: Enable log on information and below levels. <br> 0x20: Queue commands and queue contents. <br> 0x40:Signal creation, allocation, pool. <br> 0x80: Locks and thread-safety code. <br> 0x100: Copy debug. <br> 0x200: Detailed copy debug. <br> 0x400: Resource allocation, performance-impacting events. <br> 0x800: Initialization and shutdown. <br> 0x1000: Misc debug, not yet classified. <br> 0x2000: Show raw bytes of AQL packet. <br> 0x4000: Show code creation debug. <br> 0x8000: More detailed command info, including barrier commands. <br> 0x10000: Log message location. <br> 0xFFFFFFFF: Log always even mask flag is zero. |
| HIP_VISIBLE_DEVICES <br><sub> Only devices whose index is present in the sequence are visible to HIP. </sub> | | 0,1,2: Depending on the number of devices on the system. |
| GPU_DUMP_CODE_OBJECT <br><sub> Dump code object. </sub> | 0 | 0: Disable. <br> 1: Enable. |
| AMD_SERIALIZE_KERNEL <br><sub> Serialize kernel enqueue. </sub> | 0 | 1: Wait for completion before enqueue. <br> 2: Wait for completion after enqueue. <br> 3: Both. |
| AMD_SERIALIZE_COPY <br><sub> Serialize copies. </sub> | 0 | 1: Wait for completion before enqueue. <br> 2: Wait for completion after enqueue. <br> 3: Both. |
| HIP_HOST_COHERENT <br><sub> Coherent memory in hipHostMalloc. </sub> | 0 | 0: memory is not coherent between host and GPU. <br> 1: memory is coherent with host. |
| AMD_DIRECT_DISPATCH <br><sub> Enable direct kernel dispatch. </sub> | 1 | 0: Disable. <br> 1: Enable. |
| GPU_MAX_HW_QUEUES <br><sub> The maximum number of hardware queues allocated per device. </sub> | 4 | The variable controls how many independent hardware queues HIP runtime can create per process, per device. If application allocates more HIP streams than this number, then HIP runtime will reuse the same hardware queues for the new streams in round robin manner. Please note, this maximum number does not apply to either hardware queues that are created for CU masked HIP streams, or cooperative queue for HIP Cooperative Groups (there is only one single queue per device). |
| **Environment variable** | **Default value** | **Usage** |
| ---------------------------------------- | ----------------- | ---------------------------------------- |
| AMD_LOG_LEVEL <br><sub> Enable HIP log on different Level. </sub> | 0 | 0: Disable log. <br> 1: Enable log on error level. <br> 2: Enable log on warning and below levels. <br> 0x3: Enable log on information and below levels. <br> 0x4: Decode and display AQL packets. |
| AMD_LOG_MASK <br><sub> Enable HIP log on different Level. </sub> | 0x7FFFFFFF | 0x1: Log API calls. <br> 0x02: Kernel and Copy Commands and Barriers. <br> 0x4: Synchronization and waiting for commands to finish. <br> 0x8: Enable log on information and below levels. <br> 0x20: Queue commands and queue contents. <br> 0x40:Signal creation, allocation, pool. <br> 0x80: Locks and thread-safety code. <br> 0x100: Copy debug. <br> 0x200: Detailed copy debug. <br> 0x400: Resource allocation, performance-impacting events. <br> 0x800: Initialization and shutdown. <br> 0x1000: Misc debug, not yet classified. <br> 0x2000: Show raw bytes of AQL packet. <br> 0x4000: Show code creation debug. <br> 0x8000: More detailed command info, including barrier commands. <br> 0x10000: Log message location. <br> 0xFFFFFFFF: Log always even mask flag is zero. |
| HIP_VISIBLE_DEVICES <br><sub> Only devices whose index is present in the sequence are visible to HIP. </sub> | | 0,1,2: Depending on the number of devices on the system. |
| GPU_DUMP_CODE_OBJECT <br><sub> Dump code object. </sub> | 0 | 0: Disable. <br> 1: Enable. |
| AMD_SERIALIZE_KERNEL <br><sub> Serialize kernel enqueue. </sub> | 0 | 1: Wait for completion before enqueue. <br> 2: Wait for completion after enqueue. <br> 3: Both. |
| AMD_SERIALIZE_COPY <br><sub> Serialize copies. </sub> | 0 | 1: Wait for completion before enqueue. <br> 2: Wait for completion after enqueue. <br> 3: Both. |
| HIP_HOST_COHERENT <br><sub> Coherent memory in hipHostMalloc. </sub> | 0 | 0: memory is not coherent between host and GPU. <br> 1: memory is coherent with host. |
| AMD_DIRECT_DISPATCH <br><sub> Enable direct kernel dispatch (Currently for Linux, under development on Windows. ). </sub> | 1 | 0: Disable. <br> 1: Enable. |
| GPU_MAX_HW_QUEUES <br><sub> The maximum number of hardware queues allocated per device. </sub> | 4 | The variable controls how many independent hardware queues HIP runtime can create per process, per device. If application allocates more HIP streams than this number, then HIP runtime will reuse the same hardware queues for the new streams in round robin manner. Please note, this maximum number does not apply to either hardware queues that are created for CU masked HIP streams, or cooperative queue for HIP Cooperative Groups (there is only one single queue per device). |

## General Debugging Tips
- 'gdb --args' can be used to conveniently pass the executable and arguments to gdb.
- From inside GDB, you can set environment variables "set env". Note the command does not use an '=' sign:
- From inside GDB on Linux, you can set environment variables "set env". Note the command does not use an '=' sign:

```
(gdb) set env AMD_SERIALIZE_KERNEL 3
Expand Down
Loading