Multinomial #141

tongxin · 2024-07-30T13:14:11Z

This multinomial is a Triton conversion from the Pytorch counterpart.

Performance figure updated.

benchmark/test_special_perf.py Operator multinomial Performance Test (torch.float16)
Size        Torch Latency (ms)   Gems Latency (ms)
--------------------------------------------------
1024                  0.411968            0.244064
6144                  0.690048            0.449952
11264                 0.931296             0.53248
16384                  1.13818            0.664704
21504                  1.23549            0.677472
26624                  1.39875            0.745216
31744                  1.54067             0.80096
36864                  1.74266            0.934592
41984                   1.8904            0.965376
47104                  2.07027             1.03923
52224                  2.23818             1.08048
57344                  2.45094             1.19658
62464                  2.59354             1.19482
67584                  2.71466             1.29584
72704                  2.92397              1.3409
77824                  3.11814             1.42378
Operator multinomial Performance Test (torch.float32)
Size        Torch Latency (ms)   Gems Latency (ms)
--------------------------------------------------
1024                  0.451392            0.267328
6144                   0.80832            0.466176
11264                  1.08074            0.544896
16384                  1.33142            0.743872
21504                  1.45101            0.778528
26624                  1.65642            0.899328
31744                  1.85642            0.982752
36864                  2.11466             1.14086
41984                  2.31005             1.20314
47104                  2.52592             1.30464
52224                  2.71603             1.37322
57344                  2.94445             1.50336
62464                  3.13027             1.54714
67584                  3.29763             1.67389
72704                  3.50435             1.76499
77824                  3.69504             1.85344
.

…_dev

* exponential added. * Added K-S tests to exponential_, fp64 corrected. * aligned with aten prototype * Exponential_ uses uint64 offsets in Triton kernel. * Update pyproject config for new test dependencies.

1. fix amax, armax and triu, use int64 indexing when the largest tensor's size_in_bytes exceed int32's max; 2. change the tiling scheme for argmax to loop in the reduction dimension, instead of data-size-dependent-tile-size

* libentry now is lock protected. * Add multithreading tests for libentry. * polish code.

[Test] Test for op

…onflicts with master.

StrongSpoon

I don't understand why chi-square proves the accuracy.

tests/test_special_ops.py

benchmark/test_special_perf.py

src/flag_gems/ops/multinomial.py

tests/test_distribution_ops.py

tongxin · 2024-08-19T14:27:05Z

Also added fused_norm_cumsum for better perf.

StrongSpoon

finished

benchmark/test_special_perf.py

tests/test_special_ops.py

StrongSpoon · 2024-08-29T09:31:19Z

src/flag_gems/ops/cumsum.py

+
+def fused_renorm_cumsum(inp, dim=-1):
+    logging.debug("GEMS RENORM_CUMSUM")
+    assert inp.dtype in (torch.float16, torch.float32, torch.float64)


allow inp.dtype to be torch.bfloat16

StrongSpoon

lgtm

tongxin and others added 6 commits July 28, 2024 21:11

WIP: multinomial

1575860

add Ops & UT & Bench

4e4879e

Merge branch 'master' of https://github.com/FlagOpen/FlagGems into op…

00da77e

…_dev

add full zero ones Ops & UT & Bench

22e14de

split normal op

5063551

Adding multinomial.

55987a6

tongxin marked this pull request as draft July 30, 2024 13:14

tongxin added 2 commits July 31, 2024 15:07

fixed one off error in binary search

0f6822a

Added multinomial tests without replacement.

c894eb3

tongxin marked this pull request as ready for review August 1, 2024 04:40

Bowen12992 and others added 20 commits August 1, 2024 14:30

PR comment

521dd05

Merge branch 'master' of https://github.com/FlagOpen/FlagGems into op…

f39ae12

…_dev

split test_special_ops

732edca

updated with_replacement tests

85e93b3

add K-S test

145ed76

split special perf

273efc8

Update to a more reliable without-replacement test

f46da90

Exponential added. (#138)

4a5acdd

* exponential added. * Added K-S tests to exponential_, fp64 corrected. * aligned with aten prototype * Exponential_ uses uint64 offsets in Triton kernel. * Update pyproject config for new test dependencies.

table

2f725c4

resolve conflict

f1b27d1

Use int64 indexing when needed & fix argmax (#146)

9e878c4

1. fix amax, armax and triu, use int64 indexing when the largest tensor's size_in_bytes exceed int32's max; 2. change the tiling scheme for argmax to loop in the reduction dimension, instead of data-size-dependent-tile-size

test for op

371f108

test for op

6fcc8e7

Added multinomial perf tests.

a7feb4b

Making libentry thread safe (#136)

369aa82

* libentry now is lock protected. * Add multithreading tests for libentry. * polish code.

add argparse

423731d

fix desc

0da6086

fix num

de4b098

Update test_specific_ops.py

980c9d7

Merge pull request #151 from Bowen12992/test_for_op

e77db4c

[Test] Test for op

Move multinomial hypothesis tests to test_distribution_ops, resolve c…

376344c

…onflicts with master.

tongxin force-pushed the multinomial branch from c5df6a6 to 376344c Compare August 7, 2024 04:19

StrongSpoon reviewed Aug 15, 2024

View reviewed changes

tests/test_special_ops.py Outdated Show resolved Hide resolved

benchmark/test_special_perf.py Outdated Show resolved Hide resolved

src/flag_gems/ops/multinomial.py Outdated Show resolved Hide resolved

tests/test_distribution_ops.py Outdated Show resolved Hide resolved

fixing multinomial, working in progress.

db5dd1e

StrongSpoon self-assigned this Aug 19, 2024

tongxin added 3 commits August 19, 2024 19:03

Multinomial passes tests.

6c6a612

Enhance multinomial tests and benchmarks.

3d40f7a

resolve conflicts.

134bef1

StrongSpoon reviewed Aug 20, 2024

View reviewed changes

StrongSpoon and others added 10 commits August 20, 2024 17:25

Merge branch 'master' into multinomial

e25e04e

[bugfix] keepdim when samples one

e7b2c5d

[bugfix] fix accu test

47e76ad

fix anomaly behavior in fused_renorm_cumsum

55c84f7

Polish multinomial tests.

aa66384

remove garbage files.

650981d

multinomial result casted to int64 for n_samples==1

71f1860

Merge with master

82e3a4d

bfloat16 added for multinomial, polish without replacement test.

a8f9a47

merged with master.

3f6ec74

StrongSpoon reviewed Aug 29, 2024

View reviewed changes

tongxin added 6 commits August 31, 2024 18:52

Enable two-pass normed cumsum.

afd2c0d

cumsum updated

bfd62e5

normed cumsum complete.

572853c

Fixed multinomial binary search boundary bug

85a472e

fix normed_cumsum bugs.

eafe33d

quick fix dim check.

459bb61

StrongSpoon approved these changes Sep 2, 2024

View reviewed changes

tongxin merged commit 2f191fe into master Sep 2, 2024
4 checks passed

tongxin deleted the multinomial branch September 2, 2024 02:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multinomial #141

Multinomial #141

tongxin commented Jul 30, 2024 •

edited

Loading

StrongSpoon left a comment

tongxin commented Aug 19, 2024

StrongSpoon left a comment

StrongSpoon Aug 29, 2024

tongxin Aug 30, 2024

StrongSpoon left a comment

Multinomial #141

Multinomial #141

Conversation

tongxin commented Jul 30, 2024 • edited Loading

StrongSpoon left a comment

Choose a reason for hiding this comment

tongxin commented Aug 19, 2024

StrongSpoon left a comment

Choose a reason for hiding this comment

StrongSpoon Aug 29, 2024

Choose a reason for hiding this comment

tongxin Aug 30, 2024

Choose a reason for hiding this comment

StrongSpoon left a comment

Choose a reason for hiding this comment

tongxin commented Jul 30, 2024 •

edited

Loading