Skip to content

Long vector test plan #421

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

alsepkow
Copy link
Collaborator

@alsepkow alsepkow commented Mar 7, 2025

.

Copy link
Member

@damyanp damyanp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far.

We try and format these specs to 80-columns to make them easier to review in source form.

Are the tables of operations a list of all the things that'll need tests added or is that still being determined?

- For tests where we only add new test cases we do not need to update HLK test GUIDs.
- If we want to add a new HLK requirement, then we will probably want to create new tests which consume that requirement. Individual test cases can not have individual HLK requirements.
- The existing HLK reqs we have look pretty dated. "Device.Graphics.WDDM27.AdapterRender.D3D12.DXILCore.ShaderModel65.CoreRequirement". Add a requirement for SM 6.9? We're requiring Long Vectort support for SM 6.9
3. Do the work in two 'phases'? Add the tests. Then do the HLK side.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should build up a plan that involves multiple deliveries to the IHVs. If we can identify a set of high-priority tests (eg those needed by a particular demo) we could try and get those out first to support IHVs building drivers for the preview release.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about this and if IHVs are OK with it I think the easiest way is to just share stand-alone test binaries/collateral. Can you help me sync up with the right people to figure out what works for IHVs? What have we done in the past?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, we can sort that out, but I'm more interested in what and when we'll deliver these rather than the mechanism used to do it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, given that the DXC repo is public we should be able to just share build/run instructions for the tests once checked in?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's discuss this further offline. I'll make a note to ask you.

@alsepkow
Copy link
Collaborator Author

Looks good so far.

We try and format these specs to 80-columns to make them easier to review in source form.

Are the tables of operations a list of all the things that'll need tests added or is that still being determined?

Will update the formatting. The intrinsics in the table are all of the explicitly listed intrinsics in the hlsl long vector. I think some operators are missing from this table. I'll look to get everything added so we have an exhaustive list.

@alsepkow alsepkow marked this pull request as ready for review March 12, 2025 17:49
'ExecTests' source code in the DXC repo. There is a script in the WinTools repo
which generates and annotates the HLK tests.

There are three test categories we are concerned with:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need special tests for groupshared?

Are there any other types of storage that might be interesting?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, I was thinking about this and I need to learn/explore more. I suspect I'm actually missing details on test case granularity. Does testing across data types matter? Do we care to verify the output of operations for correctness?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably be testing for correctness, yes. Sizes, especially around interesting alignment cases, are probably interesting. It might be interesting to build cases that could conceivably generate overruns and check for them. For example, most GPUs will operate on at least 32-bits at once, so if you have 16-bit values then what happens if you have an odd number of elements - could it accidentally overwrite the next variable if it is assuming alignment?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And so the size of a type, and the element count, gets interesting. I expect you could come up with some interesting boundary cases and reuse that across all the types without having to think hard about each combination individually.

Copy link
Member

@pow2clk pow2clk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just trying to add detail and answer questions where I can.

Copy link
Member

@damyanp damyanp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good - there's still some open issues.

I'm also keen to see a prioritization of which tests will be implemented first.

These operations are good candidates for high-priority tests I think:

  • Initializing a vector with another.
  • Multiply all components of a vector with a scalar value
  • Add all components of a vector with a scalar value
  • Clamp all components of a vector to the range [c, t]
  • Component wise minimum between 2 vectors
  • Component wise maximum between 2 vectors
  • Component wise multiply between 2 vectors
  • Component wise add between 2 vectors
  • Subscript access, vec[i] = c

Comment on lines 25 to 26
vectors across buffer types. __TODO: Add details about which types of
buffers to test and why.__
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Add details about which types of buffers to test and why

This comment here to remind us that there's a TODO. If we want to complete the PR without doing the TODO then we should get a follow-up issue filed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this include vector load/stores as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to fill in the details before completing this PR.

Comment on lines 133 to 134
| atan2 | Atan | CreateFDiv, CreateFAdd, CreateFSub, CreateFCmpOLT,
||| CreateFCpmOEQ, CreateFCmpOGE, CreateFCmpOLT, CreateAnd, CreateSelect | |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing that these "CreateXXX" function names came from the C++ code. The LLVM instructions are things like fdiv, fadd, fcmp, and, select etc.

I also note that if I try atan2 in godbolt it doesn't actually generate fsub. https://godbolt.org/z/7xqz6448P

Copy link
Collaborator Author

@alsepkow alsepkow Mar 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they're the C++ helper functions. I was intending to distill down further on a subsequent iteration. Wanted to sanity check that they do what I would expect (always return the llvm instruction in the name).

I'll have to peel a little more in TranslateAtan2, for this 'inventory' I went through the various Translate* functions in HLSLOperationLower and noted the possible operations and instructions. For TranslateAtan2 it looks like FSub is always used, so it must be optimized or simplified out along the way?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Asked for some help. The FSub is optimized out to normalize to FAdd. If you pass '-O0' instead you'll see that the FSub is there.

But this makes me wonder about the right testing approach to ensure coverage. Given that the goal is to test that DXIL Ops and LLVM Instructions are functional with vectors, maybe we want to disable optimizations for the test cases?

@alsepkow alsepkow force-pushed the user/alsepkow/Long-Vector-TestPlan2 branch from 4568301 to 518753e Compare March 25, 2025 03:03
@damyanp damyanp moved this to Active in HLSL Support Apr 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

6 participants