Develop Embedding[SiliconFlow] #72

MARD1NO · 2024-06-17T10:27:05Z

测试环境：4090

测试case，Embedding=4096 * 4096 ，输入index大小为32

前向：
triton 耗时 2.82us, torch耗时3.78us

反向：
triton耗时 14.78us，torch耗时 3.78us

torch的反向机制是利用了 warp ballot，一个block读取所有index存到smem，如果有遇到相同index的，则逐出warp，只让一个warp来负责加，triton似乎不存在这种机制

…< 0 logic

StrongSpoon · 2024-07-08T06:31:48Z

why not register in src/flag_gems/init.py

src/flag_gems/ops/embedding.py

Bowen12992 · 2024-07-09T01:59:11Z

src/flag_gems/ops/embedding.py

+
+        BLOCK_SIZE = triton.next_power_of_2(N)
+        indices = indices.contiguous()
+        weight = weight.contiguous()


Is it necessary to serialize here first? because it may cause memory copy overhead

Bowen12992

Great job for such complicated Op! Just a few comment here. I think we can replace torch.nn.functional.embedding in src/flag_gems/__init__.py so we can use it with flag_gems.enable() or with flag_gems.use_gems(): ; I wonder the performance of this op and whether we should do some related tuning work, so would you add a benchmark for this OP?
Also, There are some accuracy issues in the unit test maybe you can pull the master code try to solve. And better to add PR description

StrongSpoon

I suggest implementing performance test as well.
Accuracy tests failed with relative difference equals to 1.0 . What about rebasing on the latest master branch and trying again?

src/flag_gems/ops/embedding.py

tests/test_special_ops.py

src/flag_gems/ops/embedding.py

Bowen12992 · 2024-07-09T03:25:09Z

I suggest implementing performance test as well.

Accuracy tests failed with relative difference equals to 1.0 . What about rebasing on the latest master branch and trying again?

We have verified locally that the problem of unit test accuracy can be solved by pulling the latest code

Bowen12992

LGTM

MARD1NO added 3 commits June 17, 2024 18:26

init

7284344

add backward impl

38e1747

remove not used function, add unittest

e34eddb

MARD1NO marked this pull request as ready for review July 2, 2024 07:10

Merge branch 'master' into dev_embedding

fca6081

MARD1NO requested a review from StrongSpoon July 3, 2024 07:31

MARD1NO added 4 commits July 3, 2024 15:31

fix format

17096a1

Merge branch 'master' into dev_embedding

8ca7c18

remove libentry decorator for atomicadd, inplace op. fix padding_idx …

a45e42d

…< 0 logic

support padding_idx == None

bbfdfad