Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit pocl to <6 (test revert numpy type promotion) #1055

Merged
merged 6 commits into from
Sep 9, 2024
Merged

Conversation

matthiasdiener
Copy link
Member

@matthiasdiener matthiasdiener commented Aug 20, 2024

Related: inducer/pytato#538, inducer/pytato#510

Questions for the review:

  • Is the scope and purpose of the PR clear?
    • The PR should have a description.
    • The PR should have a guide if needed (e.g., an ordering).
  • Is every top-level method and class documented? Are things that should be documented actually so?
  • Is the interface understandable? (I.e. can someone figure out what stuff does?) Is it well-defined?
  • Does the implementation do what the docstring claims?
  • Is everything that is implemented covered by tests?
  • Do you see any immediate risks or performance disadvantages with the design? Example: what do interface normals attach to?

@matthiasdiener matthiasdiener self-assigned this Aug 20, 2024
@matthiasdiener matthiasdiener changed the title Limit pocl to <6 Limit pocl to <6, revert numpy type promotion Aug 21, 2024
@inducer
Copy link
Contributor

inducer commented Aug 23, 2024

What's wrong with pocl 6? This is something I've seen, but it should not affect mirgecom.

@matthiasdiener
Copy link
Member Author

matthiasdiener commented Aug 23, 2024

What's wrong with pocl 6? This is something I've seen, but it should not affect mirgecom.

We have seen substantial slowdowns in our prediction cases with both the new type promotion code as well as pocl-5 vs pocl-6:
(1 rank, wall time per step [s]):

smoke_test_ks_3d Porter/GPU Lassen/GPU M1/CPU CI-Linux/CPU
pocl5, old numpy type promo code 0.17 0.18 0.6 1.05
pocl6, old numpy type promo code 0.58 0.47 0.65 1.41
pocl5, new numpy type promo code 0.21 0.19 6.5
pocl6, new numpy type promo code 0.58 0.5 6.5 16.1

For some reason, pocl-6 seems to affect mostly the CUDA devices, while the type promotion code mostly affects the CPU runs.

@inducer
Copy link
Contributor

inducer commented Aug 23, 2024

Yikes. Could you hunt for the simplest example that exhibits the issue and compare the PTX for both? This tool may be able to help. Also, could you file an issue for this? (Since I'm not sure this PR is the best place for the discussion.)

@matthiasdiener matthiasdiener marked this pull request as ready for review September 9, 2024 18:49
Copy link
Member

@MTCam MTCam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👎

@matthiasdiener matthiasdiener enabled auto-merge (squash) September 9, 2024 19:03
@matthiasdiener matthiasdiener changed the title Limit pocl to <6, revert numpy type promotion Limit pocl to <6 (test revert numpy type promotion) Sep 9, 2024
@matthiasdiener matthiasdiener merged commit eac58ef into main Sep 9, 2024
13 checks passed
@matthiasdiener matthiasdiener deleted the limit-pocl branch September 9, 2024 21:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants