Merge pull request #65 from Bowen12992/code_format

[format] Add pre-commit and foramat all the code
FlagOpen · Jun 14, 2024 · d3b5121 · d3b5121
2 parents 57d4612 + 168f5ac
commit d3b5121
Show file tree

Hide file tree

Showing 87 changed files with 475 additions and 362 deletions.
diff --git a/.github/workflows/pre-commit.yml b/.github/workflows/pre-commit.yml
@@ -0,0 +1,17 @@
+name: code-format-check
+
+on:
+  push:
+    branches: [ "master" ]
+  pull_request:
+    branches: [ "master" ]
+
+jobs:
+  pre-commit:
+    runs-on: ubuntu-latest
+    steps:
+    - uses: actions/checkout@v4
+    - uses: actions/setup-python@v5
+      with:
+        python-version: '3.11'
+    - uses: pre-commit/[email protected]
diff --git a/.github/workflows/python-test.yaml b/.github/workflows/python-test.yaml
@@ -13,7 +13,7 @@ on:
 jobs:
   container-unit-test:
     runs-on: [self-hosted, docker]
-    timeout-minutes: 30
+    timeout-minutes: 50
     container:
       image: localhost:5000/flag-gems-ci:v1.0
       ports:
@@ -30,7 +30,7 @@ jobs:
           CUDA_VISIBLE_DEVICES=2 pytest -s tests/test_blas_ops.py &
           CUDA_VISIBLE_DEVICES=3 pytest -s tests/test_reduction_ops.py &
           CUDA_VISIBLE_DEVICES=4 pytest -s tests/test_special_ops.py && wait
-  
+
   container-model-test:
     runs-on: [self-hosted, docker]
     timeout-minutes: 5

diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -0,0 +1,30 @@
+repos:
+- repo: https://github.com/pre-commit/pre-commit-hooks
+  rev: v2.3.0
+  hooks:
+  - id: check-yaml
+  - id: end-of-file-fixer
+  - id: trailing-whitespace
+  - id: flake8
+    language_version: python3.11
+    args: ["--ignore=F405,E731,F403,W503,E722,E203", --max-line-length=120]
+    # F405 : Name may be undefined, or defined from star imports: module
+    # E731 : Do not assign a lambda expression, use a def
+    # F403 : 'from module import *' used; unable to detect undefined names
+    # W503 : Line break before binary operator
+    # E722 : Do not use bare 'except'
+    # E203 : Whitespace before ':'
+
+- repo: https://github.com/pycqa/isort
+  rev: 5.12.0
+  hooks:
+    - id: isort
+      language_version: python3.11
+      args: ["--profile", "black"]
+
+- repo: https://github.com/psf/black.git
+  rev: 23.7.0
+  hooks:
+    - id: black
+      language_version: python3.11
+    - id: black-jupyter
diff --git a/LICENSE b/LICENSE
@@ -175,4 +175,4 @@ Copyright © 2024 BAAI. All rights reserved.
       incurred by, or claims asserted against, such Contributor by reason
       of your accepting any such warranty or additional liability.
 
-   END OF TERMS AND CONDITIONS
+   END OF TERMS AND CONDITIONS
diff --git a/README.md b/README.md
@@ -2,9 +2,9 @@
 
 ## Introduction
 
-FlagGems is a high-performance general operator library implemented in [OpenAI Triton](https://github.com/openai/triton). It aims to provide a suite of kernel functions to accelerate LLM training and inference.  
+FlagGems is a high-performance general operator library implemented in [OpenAI Triton](https://github.com/openai/triton). It aims to provide a suite of kernel functions to accelerate LLM training and inference.
 
-By registering with the ATen backend of PyTorch, FlagGems facilitates a seamless transition, allowing users to switch to the Triton function library without the need to modify their model code. Users can still utilize the ATen backend as usual while experiencing significant performance enhancement. The Triton language offers benefits in readability, user-friendliness and performance comparable to CUDA. This convenience allows developers to engage in the development of FlagGems with minimal learning investment.  
+By registering with the ATen backend of PyTorch, FlagGems facilitates a seamless transition, allowing users to switch to the Triton function library without the need to modify their model code. Users can still utilize the ATen backend as usual while experiencing significant performance enhancement. The Triton language offers benefits in readability, user-friendliness and performance comparable to CUDA. This convenience allows developers to engage in the development of FlagGems with minimal learning investment.
 
 
 ## Feature
@@ -49,50 +49,50 @@ def ge(x, y):
 ## Changelog
 
 ### v1.0
-- support BLAS operators: addmm, bmm, mm  
-- support pointwise operators: abs, add, div, dropout, exp, gelu, mul, pow, reciprocal, relu, rsqrt, silu, sub, triu  
-- support reduction operators: cumsum, layernorm, mean, softmax  
+- support BLAS operators: addmm, bmm, mm
+- support pointwise operators: abs, add, div, dropout, exp, gelu, mul, pow, reciprocal, relu, rsqrt, silu, sub, triu
+- support reduction operators: cumsum, layernorm, mean, softmax
 
 ### v2.0
-- support BLAS operator: mv, outer  
-- support pointwise operators: bitwise_and, bitwise_not, bitwise_or, cos, clamp, eq, ge, gt, isinf, isnan, le, lt, ne, neg, or, sin, tanh, sigmoid  
-- support reduction operators: all, any, amax, argmax, max, min, prod, sum, var_mean, vector_norm, cross_entropy_loss, group_norm, log_softmax, rms_norm  
-- support fused operators: skip_rms_norm, skip_layer_norm, gelu_and_mul, silu_and_mul, apply_rotary_position_embedding  
+- support BLAS operator: mv, outer
+- support pointwise operators: bitwise_and, bitwise_not, bitwise_or, cos, clamp, eq, ge, gt, isinf, isnan, le, lt, ne, neg, or, sin, tanh, sigmoid
+- support reduction operators: all, any, amax, argmax, max, min, prod, sum, var_mean, vector_norm, cross_entropy_loss, group_norm, log_softmax, rms_norm
+- support fused operators: skip_rms_norm, skip_layer_norm, gelu_and_mul, silu_and_mul, apply_rotary_position_embedding
 
 ## Quick Start
 
 ### Requirements
 
-1. Triton >= 2.2.0  
-2. PyTorch >= 2.1.2  
-3. Transformers >= 4.40.2  
+1. Triton >= 2.2.0
+2. PyTorch >= 2.1.2
+3. Transformers >= 4.40.2
 
-### Installation  
+### Installation
 
 ```shell
 git clone https://github.com/FlagOpen/FlagGems.git
 cd FlagGems
 pip install .
 ```
 
-## Usage  
+## Usage
 
 ### Import
 
-1. Enable permanently  
+1. Enable permanently
     ```python
     import flag_gems
     flag_gems.enable()
     ```
 
-2. Enable temporarily  
+2. Enable temporarily
     ```python
     import flag_gems
     with flag_gems.use_gems():
         pass
     ```
 
-3. Example  
+3. Example
     ```python
     import torch
     import flag_gems
@@ -106,50 +106,50 @@ pip install .
 
 ### Execute
 
-1. Test Operator Accuracy  
-    - Run reference on cuda  
+1. Test Operator Accuracy
+    - Run reference on cuda
         ```shell
         cd tests
         pytest test_xx_ops.py
         ```
-    - Run reference on cpu  
+    - Run reference on cpu
         ```shell
         cd tests
         pytest test_xx_ops.py --device cpu
         ```
 
-2. Test Model Accuracy  
+2. Test Model Accuracy
     ```shell
     cd examples
     pytest model_xx_test.py
     ```
 
-3. Test Operator Performance  
-    - Test CUDA performance  
+3. Test Operator Performance
+    - Test CUDA performance
         ```shell
         cd benchmark
         pytest test_xx_perf.py -s
         ```
-    - Test end-to-end performance  
+    - Test end-to-end performance
         ```shell
         cd benchmark
         pytest test_xx_perf.py -s --mode cpu
         ```
 
-4. Run tests with logging infomation  
+4. Run tests with logging infomation
     ```shell
     pytest program.py --log-cli-level debug
     ```
-    Not recommended in performance testing.  
+    Not recommended in performance testing.
 
 ## Supported Operators
 
 Operators will be implemented according to [OperatorList.md](https://github.com/FlagOpen/FlagGems/blob/master/OperatorList.md).
 
 ## Supported Models
 
-- Bert-base-uncased  
-- Llama-2-7b  
+- Bert-base-uncased
+- Llama-2-7b
 
 ## Supported Platforms
 

diff --git a/README_cn.md b/README_cn.md
@@ -2,9 +2,9 @@
 
 ## 介绍
 
-FlagGems是一个使用OpenAI推出的[Triton编程语言](https://github.com/openai/triton)实现的高性能通用算子库，旨在为大语言模型提供一系列可应用于PyTorch框架的算子，加速模型的推理与训练。  
+FlagGems是一个使用OpenAI推出的[Triton编程语言](https://github.com/openai/triton)实现的高性能通用算子库，旨在为大语言模型提供一系列可应用于PyTorch框架的算子，加速模型的推理与训练。
 
-FlagGems通过对PyTorch的后端aten算子进行覆盖重写，实现算子库的无缝替换，使用户能够在不修改模型代码的情况下平稳地切换到triton算子库。FlagGems不会影响aten后端的正常使用，并且会带来良好的性能提升。Triton语言为算子库提供了更好的可读性和易用性，同时保持了不逊于CUDA的算子性能，因此开发者只需付出较低的学习成本，即可参与FlagGems的算子开发与建设。  
+FlagGems通过对PyTorch的后端aten算子进行覆盖重写，实现算子库的无缝替换，使用户能够在不修改模型代码的情况下平稳地切换到triton算子库。FlagGems不会影响aten后端的正常使用，并且会带来良好的性能提升。Triton语言为算子库提供了更好的可读性和易用性，同时保持了不逊于CUDA的算子性能，因此开发者只需付出较低的学习成本，即可参与FlagGems的算子开发与建设。
 
 
 ## 特性
@@ -49,50 +49,50 @@ def ge(x, y):
 ## 更新日志
 
 ### v1.0
-- 支持BLAS类算子：addmm, bmm, mm  
-- 支持pointwise类算子：abs, add, div, dropout, exp, gelu, mul, pow, reciprocal, relu, rsqrt, silu, sub, triu  
-- 支持reduction类算子：cumsum, layernorm, mean, softmax  
+- 支持BLAS类算子：addmm, bmm, mm
+- 支持pointwise类算子：abs, add, div, dropout, exp, gelu, mul, pow, reciprocal, relu, rsqrt, silu, sub, triu
+- 支持reduction类算子：cumsum, layernorm, mean, softmax
 
 ### v2.0
-- 支持BLAS类算子: mv, outer  
-- 支持pointwise类算子: bitwise_and, bitwise_not, bitwise_or, cos, clamp, eq, ge, gt, isinf, isnan, le, lt, ne, neg, or, sin, tanh, sigmoid  
-- 支持reduction类算子: all, any, amax, argmax, max, min, prod, sum, var_mean, vector_norm, cross_entropy_loss, group_norm, log_softmax, rms_norm  
-- 支持融合算子: skip_rms_norm, skip_layer_norm, gelu_and_mul, silu_and_mul, apply_rotary_position_embedding  
+- 支持BLAS类算子: mv, outer
+- 支持pointwise类算子: bitwise_and, bitwise_not, bitwise_or, cos, clamp, eq, ge, gt, isinf, isnan, le, lt, ne, neg, or, sin, tanh, sigmoid
+- 支持reduction类算子: all, any, amax, argmax, max, min, prod, sum, var_mean, vector_norm, cross_entropy_loss, group_norm, log_softmax, rms_norm
+- 支持融合算子: skip_rms_norm, skip_layer_norm, gelu_and_mul, silu_and_mul, apply_rotary_position_embedding
 
 ## 快速入门
 
 ### 依赖
 
-1. Triton >= 2.2.0  
-2. PyTorch >= 2.1.2  
-3. Transformers >= 4.40.2  
+1. Triton >= 2.2.0
+2. PyTorch >= 2.1.2
+3. Transformers >= 4.40.2
 
-### 安装  
+### 安装
 
 ```shell
 git clone https://github.com/FlagOpen/FlagGems.git
 cd FlagGems
 pip install .
 ```
 
-## 使用  
+## 使用
 
 ### 导入
 
-1. 在进程中永久启用  
+1. 在进程中永久启用
     ```python
     import flag_gems
     flag_gems.enable()
     ```
 
-2. 暂时启用  
+2. 暂时启用
     ```python
     import flag_gems
     with flag_gems.use_gems():
         pass
     ```
 
-3. 示例  
+3. 示例
     ```python
     import torch
     import flag_gems
@@ -106,49 +106,49 @@ pip install .
 
 ### 执行
 
-1. 算子正确性测试  
-    - 在CUDA上运行参考实现  
+1. 算子正确性测试
+    - 在CUDA上运行参考实现
         ```shell
         cd tests/flag_gems
         pytest op_accu_test.py
         ```
-    - 在CPU上运行参考实现  
+    - 在CPU上运行参考实现
         ```shell
         cd tests
         pytest test_xx_ops.py --device cpu
         ```
-2. 模型正确性测试  
+2. 模型正确性测试
     ```shell
     cd examples
     pytest model_xx_test.py
     ```
 
-3. 算子性能测试  
-    - 测试CUDA性能  
+3. 算子性能测试
+    - 测试CUDA性能
         ```shell
         cd benchmark
         pytest test_xx_perf.py -s
         ```
-    - 测试端到端性能  
+    - 测试端到端性能
         ```shell
         cd benchmark
         pytest test_xx_perf.py -s --mode cpu
         ```
 
-2. 运行时打印日志信息  
+2. 运行时打印日志信息
     ```shell
     pytest program.py --log-cli-level debug
     ```
-    测试性能时不建议打开。  
+    测试性能时不建议打开。
 
 ## 支持算子
 
 算子将按照文档[OperatorList.md](https://github.com/FlagOpen/FlagGems/blob/master/OperatorList.md)的顺序逐步实现。
 
 ## 支持模型
 
-- Bert-base-uncased  
-- Llama-2-7b  
+- Bert-base-uncased
+- Llama-2-7b
 
 ## 支持平台
 

diff --git a/benchmark/performance_utils.py b/benchmark/performance_utils.py
@@ -1,9 +1,11 @@
+import time
+
 import torch
 import triton
-import time
+
 import flag_gems
-from .conftest import CPU_MODE
 
+from .conftest import CPU_MODE
 
 WARMUP = 10
 REPETITION = 1000
@@ -42,8 +44,8 @@ def profile(self, op, *args):
 
     def run(self):
         print(f"Operator {self.op_name} Performance Test ({self.dtype})")
-        print(f"Size        Torch Latency (ms)   Gems Latency (ms)")
-        print(f"--------------------------------------------------")
+        print("Size        Torch Latency (ms)   Gems Latency (ms)")
+        print("--------------------------------------------------")
         for size in self.sizes:
             args = self.arg_func(self.dtype, self.batch, size)
             torch_perf = self.profile(self.torch_op, *args)