Add tests for reduction with sum of squares (RSS), compute of RNSNorm, RNSNorm fused with Matmul, FindMaxAttention and Softmax #593

mikepapadim · 2024-11-24T11:52:09Z

Description

This PR adds tests to test the code gen and and vailidity of the results for some common operation required in the LLM architecture.

Reduction with Sum of Squares (RSS): Ensures the correct functionality of reduction operations that compute the sum of squares, which is vital for various numerical algorithms.
Compute of RNSNorm: Validates the implementation of the RNSNorm function, a normalization technique that can enhance the stability and performance of neural networks.
RNSNorm Fused with Matmul: Tests the integration of RNSNorm directly within matrix multiplication operations, aiming to optimize performance by reducing the number of separate computational steps.
FindMaxAttention: Assesses the functionality of the FindMaxAttention operation, which is essential in attention mechanisms commonly used in transformer models for tasks like natural language processing.
Softmax: Verifies the correctness of the Softmax function implementation, a critical component in various machine learning models, particularly in classification tasks.
MultiHeadedAttention: Verifies the correctness of the Mutiheaded attention, it breaks down individual tasks/kernels and uses a refernce Java implmentation very close to LLama3.java.
FFN: Verifies the corectness layer.
Fused with RoPE: verifies corectness .

Backend/s tested

Mark the backends affected by this PR.

OpenCL
PTX
SPIRV

OS tested

Mark the OS where this PR is tested.

Linux
OSx
Windows

Did you check on FPGAs?

If it is applicable, check your changes on FPGAs.

Yes
No

How to test the new patch?

tornado-test -V uk.ac.manchester.tornado.unittests.reductions.TestReductionsFloats#testReduceSumSquares

tornado-test -V uk.ac.manchester.tornado.unittests.llm.TestRNSNormLayer

tornado-test -V uk.ac.manchester.tornado.unittests.llm.TestSoftMaxLayer

tornado-test -V uk.ac.manchester.tornado.unittests.llm.TestFindMaxAttention

tornado-test -V uk.ac.manchester.tornado.unittests.llm.TestParallelFFNLayer

tornado-test -V uk.ac.manchester.tornado.unittests.llm.TestMultiHeadAttention

tornado-test   -V uk.ac.manchester.tornado.unittests.llm.TestFusedLayer

...-unittests/src/main/java/uk/ac/manchester/tornado/unittests/compute/LLMFusedKernelsTest.java

jjfumero · 2024-11-25T08:44:25Z

Include the new test in the test-suite:
tornado-assembly/src/bin/tornado-test

jjfumero · 2024-11-25T08:46:38Z

The new test enters in an infinite loop when running with the SPIR-V backend:

tornado-test -V uk.ac.manchester.tornado.unittests.compute.LLMFusedKernelsTest
/home/juan/tornadovm/TornadoVM/bin/sdk/bin/tornado --jvm "-Xmx6g -Dtornado.recover.bailout=False -Dtornado.unittests.verbose=True "  -m  tornado.unittests/uk.ac.manchester.tornado.unittests.tools.TornadoTestRunner  --params "uk.ac.manchester.tornado.unittests.compute.LLMFusedKernelsTest"

The PTX and OpenCL backends run fine.

stratika

I think we should add the new LLMFusedKernelsTest class in the tornado-test in order to be run when Jenkins runs the unit-tests. In my setup, the tests pass for PTX. But, the tests in the LLMFusedKernelsTest class are not finishing when running with SPIR-V. I guess, that they are not supported for SPIR-V?

...-unittests/src/main/java/uk/ac/manchester/tornado/unittests/compute/LLMFusedKernelsTest.java

stratika · 2024-12-09T11:37:02Z

...-unittests/src/main/java/uk/ac/manchester/tornado/unittests/compute/LLMFusedKernelsTest.java

+
+        @Test
+        public void testRNSNormFusedWithMatMul() throws TornadoExecutionPlanException {
+            final int size = 2048;


I think we should add the following, unless it is supported:

assertNotBackend(TornadoVMBackendType.SPIRV);

...-unittests/src/main/java/uk/ac/manchester/tornado/unittests/compute/LLMFusedKernelsTest.java

jjfumero · 2024-12-10T14:21:23Z

@mikepapadim , is this ready?

…te copyright year

…MatMulWithByteArrays and TestRMSNormLayer, respectively, and update package structure

… reduction methods

mikepapadim · 2025-03-12T10:28:20Z

@jjfumero @stratika @mairooni this is ready and now also includes softmax implementation

jjfumero · 2025-03-12T11:41:50Z

Thanks @mikepapadim. Did you check with SPIR-V. Last time it got stuck in an infinite loop.

mikepapadim · 2025-03-12T12:34:42Z

Thanks @mikepapadim. Did you check with SPIR-V. Last time it got stuck in an infinite loop.

No, I need to switch machines for that

stratika · 2025-03-12T14:42:25Z

I confirm it seems to be something odd with SPIR-V. Let's wait and if it is not supported for any reason we can mark it as unsupported. Currently it seems like it is stuck. Note that we should test with both the OpenCL and Level Zero runtimes.

make BACKEND=spirv
tornado-test --quickPass --verbose --jvm="-Dtornado.unittests.device=0:0"
tornado-test --quickPass --verbose --jvm="-Dtornado.unittests.device=0:1"

…tion score calculations

stratika · 2025-03-14T10:09:43Z

I can help with the issue in SPIR-V. I will have a look next week.

mikepapadim · 2025-03-14T11:14:35Z

TODO: add all to tornado-test script

…test_reduce_square_sum

stratika

Thanks, that's a great addition to the unit-tests. I have added come comments. Basically, the only think is that currently the tests are reporting many logged messages. I am not sure if those messages are necessary when we run the tests. I think the most important is to see how many are failing, so I would focus on using the assertEquals. Let me know of any thought, I will continue with testing the new tests in my machine.

stratika · 2025-03-17T11:21:46Z

...ado-unittests/src/main/java/uk/ac/manchester/tornado/unittests/llm/TestFindMaxAttention.java

+            assertEquals("Mismatch at head " + h, expected, actual, 1e-5f);
+        }
+
+        System.out.println("All results match! Test passed.");


I would suggest to remove the print of the message. If they pass validation they will be passing.

tornado-unittests/src/main/java/uk/ac/manchester/tornado/unittests/llm/TestFusedLayer.java

stratika · 2025-03-17T11:25:52Z

tornado-unittests/src/main/java/uk/ac/manchester/tornado/unittests/llm/TestFusedLayer.java

+        System.out.println("Executing fused layer task graph...");
+        try (TornadoExecutionPlan executionPlan = new TornadoExecutionPlan(immutableTaskGraph)) {
+            executionPlan.withGridScheduler(gridScheduler).execute();
+            System.out.println("Execution completed successfully");


Shouldn't we assert the validation of the results at this stage in order to trigger PASS or FAIL?

stratika · 2025-03-17T13:01:00Z

tornado-unittests/src/main/java/uk/ac/manchester/tornado/unittests/llm/TestRMSNormLayer.java

+
+        for (int i = 0; i < outputSeqLogits.getSize(); i++) {
+            float expected = outputSeqLogits.get(i);           // Expected value from the sequential output
+            float actual = outputLogits.get(i);               // Actual value from the RNS output


we are missing the assertEquals method for the expected and actual value.

...ado-unittests/src/main/java/uk/ac/manchester/tornado/unittests/llm/TestParallelFFNLayer.java

stratika · 2025-03-17T13:03:49Z

tornado-unittests/src/main/java/uk/ac/manchester/tornado/unittests/llm/TestWeightedSum.java

+        Random random = new Random(42);
+
+        // Print head 0, dimension 0 values for clarity
+        System.out.println("Initializing test data:");


I would suggest to remove the printing statement.

stratika · 2025-03-17T13:06:20Z

tornado-unittests/src/main/java/uk/ac/manchester/tornado/unittests/llm/TestWeightedSum.java

+        System.out.println("Attention weights for head 0:");
+        for (int t = 0; t <= pos; t++) {
+            System.out.printf("Position %d: %.6f%n", t, attScores.get(0 * seqLen + t));
+        }
+
+        // Print the first few value vectors for head 0
+        System.out.println("First few values for head 0, dimension 0:");
+        for (int t = 0; t <= Math.min(pos, 3); t++) {
+            int valueOffset = loff + t * kvDim + (0 / kvMul) * headSize + 0;
+            System.out.printf("Position %d: %.6f%n", t, valueCache.get(valueOffset));
+        }


Are this print statements valuable for when we run the unit-tests? or are they added for documentation purposes? If it is for documentation, perhaps we can add one flag that is FALSE by default, in order to reduce the logged output when we run all unit-tests.

In my opinion, when we run the tests we should focus on PASS/FAIL results. What do you think?

stratika · 2025-03-17T13:20:42Z

Also, one interesting note that I discussed with @mikepapadim is that when we run the unit-tests with make fast-tests, this uses the tornado-test --ea --verbose --quickPass command. and the flag --ea which is used to enable assertions at the compiler level adds a delay when we run the following test:

tornado-test  --ea -V uk.ac.manchester.tornado.unittests.llm.TestParallelFFNLayer

…ntial kernel in SPIRV. The problem was triggered when we had an if condition block in a sequential kernel and we had some code out of the if block.

…led due to long compilation time

…tasks

…or the Math class

…value in order to be able to debug

…or a particular test

mikepapadim · 2025-03-19T09:58:51Z

I think with @stratika changes this is good to go now.

jjfumero · 2025-03-21T07:54:41Z

For the SPIR-V I get the following failures:

/home/juan/repos/tornadovm/bin/sdk/bin/tornado --jvm "-Xmx6g -Dtornado.recover.bailout=False -Dtornado.unittests.verbose=True "  -m  tornado.unittests/uk.ac.manchester.tornado.unittests.tools.TornadoTestRunner  --params "uk.ac.manchester.tornado.unittests.llm.TestFindMaxAttention"
WARNING: Using incubator modules: jdk.incubator.vector

Test: class uk.ac.manchester.tornado.unittests.llm.TestFindMaxAttention
	Running test: testFindMaxAttentionScores ................  [FAILED] 
		\_[REASON] Mismatch at head 0 expected:<4.9773192> but was:<0.0>
Test ran: 1, Failed: 1, Unsupported: 0

And:

/home/juan/repos/tornadovm/bin/sdk/bin/tornado --jvm "-Xmx6g -Dtornado.recover.bailout=False -Dtornado.unittests.verbose=True "  -m  tornado.unittests/uk.ac.manchester.tornado.unittests.tools.TornadoTestRunner  --params "uk.ac.manchester.tornado.unittests.llm.TestMultiHeadAttention"
WARNING: Using incubator modules: jdk.incubator.vector

Test: class uk.ac.manchester.tornado.unittests.llm.TestMultiHeadAttention
	Running test: testMultiHeadAttentionLlamaEquivalence ................  [FAILED] 
		\_[REASON] expected:<-0.052507114> but was:<NaN>
	Running test: testAttentionScoresCalculation ................  [PASS] 
	Running test: testFindMaxAttentionScores ................  [FAILED] 
		\_[REASON] Max value mismatch expected:<4.981786> but was:<0.0>
	Running test: testCalculateExpAndSum     ................  [FAILED] 
		\_[REASON] Sum value mismatch expected:<78781.76> but was:<0.0>
	Running test: testNormalizeSoftmax       ................  [PASS] 
	Running test: testComputeWeightedSum     ................  [PASS] 
	Running test: testSingleHeadFullPipeline ................  [FAILED] 
		\_[REASON] Output mismatch at dimension 0 expected:<0.010636929422616959> but was:<NaN>
	Running test: testMultiHeadAttentionFixed ................  [FAILED] 
		\_[REASON] Output mismatch at head 0, dimension 0 expected:<-0.052507114> but was:<NaN>
Test ran: 8, Failed: 5, Unsupported: 0

Both OpenCL and PTX are correct.

jjfumero · 2025-03-21T07:55:55Z

My setup for the SPIR-V:

$ tornado --devices

Number of Tornado drivers: 2
Driver: SPIR-V
  Total number of SPIR-V devices  : 2
  Tornado device=0:0  (DEFAULT)
	SPIRV -- SPIRV OCL - Intel(R) UHD Graphics 770
		Global Memory Size: 28.8 GB
		Local Memory Size: 64.0 KB
		Workgroup Dimensions: 3
		Total Number of Block Threads: [512]
		Max WorkGroup Configuration: [512, 512, 512]
		Device OpenCL C version: OpenCL C 1.2

  Tornado device=0:1
	SPIRV -- SPIRV LevelZero - Intel(R) UHD Graphics 770
		Global Memory Size: 28.8 GB
		Local Memory Size: 64.0 KB
		Workgroup Dimensions: 3
		Total Number of Block Threads: [512]
		Max WorkGroup Configuration: [512, 512, 512]
		Device OpenCL C version:  (LEVEL ZERO) 1.5

mikepapadim added 2 commits November 23, 2024 19:09

Add test to ensure sum of squares reduction code gen works

57faae1

Add test to mimic the computation in llama3 forward method

02186f9

mikepapadim added the tests label Nov 24, 2024

mikepapadim requested review from jjfumero, mairooni and stratika November 24, 2024 11:52

mikepapadim self-assigned this Nov 24, 2024

jjfumero reviewed Nov 25, 2024

View reviewed changes

...-unittests/src/main/java/uk/ac/manchester/tornado/unittests/compute/LLMFusedKernelsTest.java Outdated Show resolved Hide resolved

jjfumero requested changes Nov 25, 2024

View reviewed changes

...-unittests/src/main/java/uk/ac/manchester/tornado/unittests/compute/LLMFusedKernelsTest.java Outdated Show resolved Hide resolved

stratika reviewed Dec 9, 2024

View reviewed changes

mikepapadim added 7 commits March 12, 2025 11:58

Merge branch 'develop' into test/test_reduce_square_sum

d2c14c8

[Refactor] Rename LLMFusedKernelsTest to TestLLMFusedKernels and upda…

7b6eae6

…te copyright year

[refactor] Rename MMwithBytes and LLMFusedKernelsTest classes to Test…

82a6137

…MatMulWithByteArrays and TestRMSNormLayer, respectively, and update package structure

[add] Implement TestSoftMaxLayer with sequential softmax and parallel…

0114e1c

… reduction methods

[add] Implement TestSoftMaxLayer with sequential softmax and parallel…

1999cc8

… reduction methods

Add javadoc in SoftMax

3d18a14

Add SoftMax test in tornado-test

5390a0f

mikepapadim changed the title ~~Add tests for reduction with sum of squares (RSS), compute of RNSNorm, and RNSNorm fused with Matmul~~ Add tests for reduction with sum of squares (RSS), compute of RNSNorm, RNSNorm fused with Matmul, and Softmax Mar 12, 2025

Implement TestFindMaxAttention with sequential and parallel max atten…

72aa06c

…tion score calculations

mikepapadim changed the title ~~Add tests for reduction with sum of squares (RSS), compute of RNSNorm, RNSNorm fused with Matmul, and Softmax~~ Add tests for reduction with sum of squares (RSS), compute of RNSNorm, RNSNorm fused with Matmul, FindMaxAttention and Softmax Mar 12, 2025

mikepapadim added 2 commits March 13, 2025 14:14

Make localmemory allocation to be dynamic

0d9c172

Format code

3016923

mikepapadim added 3 commits March 14, 2025 12:33

Add multihead attention test

0ea8ab4

Refactor multi-head attention tests for improved clarity and performance

368448d

Add TestWeightedSum for weighted sum calculation and debugging

1c5c179

mikepapadim added 5 commits March 14, 2025 13:15

Merge branch 'develop' of github.com:beehive-lab/TornadoVM into test/…

ecbf76f

…test_reduce_square_sum

Add test for FFN layer

7f34de6

Add unit tests for fused layer with RoPE and RMSNorm

02e528c

Add copyright headers to test files

0f4e471

Add additional unit tests for attention and layer functionalities

7976b89

stratika reviewed Mar 17, 2025

View reviewed changes

stratika added 9 commits March 18, 2025 16:17

[fix] Add sqrt plugin for float type in SPIRV

f09620f

[refactor][fix] Refactored class name and resolved the issue of seque…

37334ce

…ntial kernel in SPIRV. The problem was triggered when we had an if condition block in a sequential kernel and we had some code out of the if block.

[test] Added the TestParallelFFNLayer to be run with assertions disab…

ae91eae

…led due to long compilation time

[fix] Replaced the Math functions with TornadoMath functions for the …

12fc5d9

…tasks

[revert] Removed sqrt plugin for float type in SPIRV, since this is f…

b35351d

…or the Math class

[test] Removed print messages and added a flag with False as default …

910eaeb

…value in order to be able to debug

[test] Added the enable assertions check when the script is invoked f…

8eba390

…or a particular test

[test] Fix indentation error

fd4a4f9

[test] Removed unnecessary parameter

ed0d374

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add tests for reduction with sum of squares (RSS), compute of RNSNorm, RNSNorm fused with Matmul, FindMaxAttention and Softmax #593

Add tests for reduction with sum of squares (RSS), compute of RNSNorm, RNSNorm fused with Matmul, FindMaxAttention and Softmax #593

mikepapadim commented Nov 24, 2024 •

edited by stratika

Loading

jjfumero commented Nov 25, 2024

jjfumero commented Nov 25, 2024

stratika left a comment

stratika Dec 9, 2024

jjfumero commented Dec 10, 2024

mikepapadim commented Mar 12, 2025

jjfumero commented Mar 12, 2025

mikepapadim commented Mar 12, 2025

stratika commented Mar 12, 2025

stratika commented Mar 14, 2025

mikepapadim commented Mar 14, 2025

stratika left a comment

stratika Mar 17, 2025

stratika Mar 17, 2025

stratika Mar 17, 2025

stratika Mar 17, 2025

stratika Mar 17, 2025

stratika commented Mar 17, 2025

mikepapadim commented Mar 19, 2025

jjfumero commented Mar 21, 2025

jjfumero commented Mar 21, 2025

Add tests for reduction with sum of squares (RSS), compute of RNSNorm, RNSNorm fused with Matmul, FindMaxAttention and Softmax #593

Are you sure you want to change the base?

Add tests for reduction with sum of squares (RSS), compute of RNSNorm, RNSNorm fused with Matmul, FindMaxAttention and Softmax #593

Conversation

mikepapadim commented Nov 24, 2024 • edited by stratika Loading

Description

Backend/s tested

OS tested

Did you check on FPGAs?

How to test the new patch?

jjfumero commented Nov 25, 2024

jjfumero commented Nov 25, 2024

stratika left a comment

Choose a reason for hiding this comment

stratika Dec 9, 2024

Choose a reason for hiding this comment

jjfumero commented Dec 10, 2024

mikepapadim commented Mar 12, 2025

jjfumero commented Mar 12, 2025

mikepapadim commented Mar 12, 2025

stratika commented Mar 12, 2025

stratika commented Mar 14, 2025

mikepapadim commented Mar 14, 2025

stratika left a comment

Choose a reason for hiding this comment

stratika Mar 17, 2025

Choose a reason for hiding this comment

stratika Mar 17, 2025

Choose a reason for hiding this comment

stratika Mar 17, 2025

Choose a reason for hiding this comment

stratika Mar 17, 2025

Choose a reason for hiding this comment

stratika Mar 17, 2025

Choose a reason for hiding this comment

stratika commented Mar 17, 2025

mikepapadim commented Mar 19, 2025

jjfumero commented Mar 21, 2025

jjfumero commented Mar 21, 2025

mikepapadim commented Nov 24, 2024 •

edited by stratika

Loading