Skip to content

Commit

Permalink
fix: fix bug that tanh ptx require cuda11
Browse files Browse the repository at this point in the history
  • Loading branch information
byshiue committed Aug 16, 2022
1 parent e244b7b commit d83204b
Show file tree
Hide file tree
Showing 3 changed files with 6 additions and 7 deletions.
3 changes: 0 additions & 3 deletions docs/bert_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -157,10 +157,7 @@ For those unable to use the NGC container, to set up the required environment or

You can choose the tensorflow version and python version you want. Here, we list some possible images:

- `nvcr.io/nvidia/tensorflow:19.07-py2` contains the TensorFlow 1.14 and python 2.7.
- `nvcr.io/nvidia/tensorflow:20.12-tf1-py3` contains the TensorFlow 1.15 and python 3.8.
- `nvcr.io/nvidia/pytorch:20.03-py3` contains the PyTorch 1.5.0 and python 3.6
- `nvcr.io/nvidia/pytorch:20.07-py3` contains the PyTorch 1.6.0 and python 3.6
- `nvcr.io/nvidia/pytorch:20.12-py3` contains the PyTorch 1.8.0 and python 3.8

To achieve best performance, we recommend to use the latest image. For example, running image `nvcr.io/nvidia/tensorflow:22.04-tf1-py3` by
Expand Down
3 changes: 0 additions & 3 deletions docs/decoder_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -156,10 +156,7 @@ For those unable to use the NGC container, to set up the required environment or

You can choose the tensorflow version and python version you want. Here, we list some possible images:

- `nvcr.io/nvidia/tensorflow:19.07-py2` contains the TensorFlow 1.14 and python 2.7.
- `nvcr.io/nvidia/tensorflow:20.12-tf1-py3` contains the TensorFlow 1.15 and python 3.8.
- `nvcr.io/nvidia/pytorch:20.03-py3` contains the PyTorch 1.5.0 and python 3.6
- `nvcr.io/nvidia/pytorch:20.07-py3` contains the PyTorch 1.6.0 and python 3.6
- `nvcr.io/nvidia/pytorch:20.12-py3` contains the PyTorch 1.8.0 and python 3.8

To achieve best performance, we recommend to use the latest image. For example, running image `nvcr.io/nvidia/tensorflow:20.12-tf1-py3` by
Expand Down
7 changes: 6 additions & 1 deletion src/fastertransformer/kernels/activation_kernels.cu
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,11 @@
#include "src/fastertransformer/kernels/activation_kernels.h"
#include "src/fastertransformer/kernels/bfloat16_fallback_kenrels.cuh"
#include "src/fastertransformer/utils/cuda_utils.h"

#ifndef CUDART_VERSION
#error CUDART_VERSION Undefined!
#endif

namespace fastertransformer {

__forceinline__ __device__ float copysignf_pos(float a, float b)
Expand All @@ -28,7 +33,7 @@ __forceinline__ __device__ float copysignf_pos(float a, float b)

__inline__ __device__ float tanh_opt(float x)
{
#if (__CUDA_ARCH__ >= 750)
#if (__CUDA_ARCH__ >= 750 && CUDART_VERSION >= 11000)
float r;
asm("tanh.approx.f32 %0,%1; \n\t" : "=f"(r) : "f"(x));
return r;
Expand Down

0 comments on commit d83204b

Please sign in to comment.