chore: change default compiler to clang #2887

eddyxu · 2024-09-16T15:39:30Z

codecov-commenter · 2024-09-16T16:04:48Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 77.83%. Comparing base (9c361fe) to head (6b43cfb).

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2887      +/-   ##
==========================================
- Coverage   77.98%   77.83%   -0.15%     
==========================================
  Files         231      231              
  Lines       70643    70082     -561     
  Branches    70643    70082     -561     
==========================================
- Hits        55090    54550     -540     
- Misses      12424    12649     +225     
+ Partials     3129     2883     -246

Flag	Coverage Δ
unittests	`77.83% <100.00%> (-0.15%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

wjones127

This PR seems to drop support for gcc when fp16kernels feature is enabled. I'm 👎 on that.

You seem to be changing testing jobs and the source-only the release jobs. If we want to ship fast binaries, we should be changing the Python / Java build jobs, right?

wjones127 · 2024-09-16T16:31:39Z

rust/lance-linalg/build.rs

+        // We use clang #pragma to yields better vectorization
+        // See https://github.com/lancedb/lance/pull/2885
+        .compiler("clang")


We can recommend clang, but it seems wrong to force users to use it. Newer versions of GCC should work okay (even if they aren't the fastest), and I think we should let users do that if they want.

Well, this is a 3-5X performance difference across the board, see #2885, so I don't think this is the point where people should pick a compiler.

.github/workflows/cargo-publish.yml

eddyxu · 2024-09-16T16:39:53Z

This PR seems to drop support for gcc when fp16kernels feature is enabled. I'm 👎 on that.

I have not finished this PR yet

Manylinux 2014 is [EOL on June 2024](https://github.com/pypa/manylinux).

eddyxu · 2024-09-17T16:09:47Z

This PR seems to drop support for gcc when fp16kernels feature is enabled.

lance/rust/lance-linalg/src/simd/f16.c

Line 65 in 78d90a9

#pragma clang loop unroll(enable) interleave(enable) vectorize(enable)

f16.c uses #pragma clang, a clang specific #pragma to hint at auto-vectorization in the first place.

GCC ignores it, thus auto-vectorization does not happen in many platforms as seen in #2885. (and a lot of compile warnings are generated if you try it locally).mWe can definitely write a gcc-friendly vectorized C code using a few extra days. Before that, using GCC with fp16kernels does not deliver major benefits.

eddyxu · 2024-09-17T17:51:21Z

Ready for review @wjones127

Pipy build passed https://github.com/lancedb/lance/actions/runs/10907625461

wjones127

It's pretty crazy the peformance difference we see here. For others reference, here is the assembly for the core loop body in clang-18:

movzx   ecx, word ptr [rdi + 2*rdx]
        vmovd   xmm1, ecx
        vcvtph2ps       xmm1, xmm1
        vfmadd213ss     xmm1, xmm1, xmm0
        movzx   ecx, word ptr [rdi + 2*rdx + 2]
        vmovd   xmm0, ecx
        vcvtph2ps       xmm0, xmm0
        vfmadd213ss     xmm0, xmm0, xmm1
        movzx   ecx, word ptr [rdi + 2*rdx + 4]
        vmovd   xmm1, ecx
        vcvtph2ps       xmm1, xmm1
        movzx   ecx, word ptr [rdi + 2*rdx + 6]
        vmovd   xmm2, ecx
        vcvtph2ps       xmm2, xmm2
        vfmadd213ss     xmm1, xmm1, xmm0
        vfmadd213ss     xmm2, xmm2, xmm1
        movzx   ecx, word ptr [rdi + 2*rdx + 8]
        vmovd   xmm0, ecx
        vcvtph2ps       xmm0, xmm0
        vfmadd213ss     xmm0, xmm0, xmm2
        movzx   ecx, word ptr [rdi + 2*rdx + 10]
        vmovd   xmm1, ecx
        vcvtph2ps       xmm1, xmm1
        vfmadd213ss     xmm1, xmm1, xmm0
        movzx   ecx, word ptr [rdi + 2*rdx + 12]
        vmovd   xmm0, ecx
        vcvtph2ps       xmm2, xmm0
        movzx   ecx, word ptr [rdi + 2*rdx + 14]
        vmovd   xmm0, ecx
        vcvtph2ps       xmm0, xmm0
        vfmadd213ss     xmm2, xmm2, xmm1
        vfmadd213ss     xmm0, xmm0, xmm2
        add     rdx, 8
        cmp     rax, rdx
        jne     .LBB0_10

versus in GCC 13:

vpxor   xmm5, xmm5, xmm5
        vpinsrw xmm8, xmm5, WORD PTR [rdi], 0
        add     rdi, 16
        vpinsrw xmm11, xmm5, WORD PTR [rdi-12], 0
        vpinsrw xmm13, xmm5, WORD PTR [rdi-10], 0
        vcvtph2ps       xmm9, xmm8
        vfmadd132ss     xmm9, xmm0, xmm9
        vpinsrw xmm0, xmm5, WORD PTR [rdi-14], 0
        vpinsrw xmm15, xmm5, WORD PTR [rdi-8], 0
        vcvtph2ps       xmm12, xmm11
        vcvtph2ps       xmm14, xmm13
        vpinsrw xmm1, xmm5, WORD PTR [rdi-6], 0
        vpinsrw xmm7, xmm5, WORD PTR [rdi-4], 0
        vcvtph2ps       xmm10, xmm0
        vcvtph2ps       xmm6, xmm15
        vpinsrw xmm4, xmm5, WORD PTR [rdi-2], 0
        vcvtph2ps       xmm2, xmm1
        vcvtph2ps       xmm3, xmm7
        vcvtph2ps       xmm0, xmm4
        vfmadd231ss     xmm9, xmm10, xmm10
        vfmadd231ss     xmm9, xmm12, xmm12
        vfmadd231ss     xmm9, xmm14, xmm14
        vfmadd231ss     xmm9, xmm6, xmm6
        vfmadd231ss     xmm9, xmm2, xmm2
        vfmadd231ss     xmm9, xmm3, xmm3
        vfmadd132ss     xmm0, xmm9, xmm0
        cmp     rdx, rdi
        jne     .L3

I feel like we should look into one day why there is such a performance gap

change clang

aeca878

github-actions bot added the chore label Sep 16, 2024

wjones127 requested changes Sep 16, 2024

View reviewed changes

eddyxu added 3 commits September 16, 2024 10:17

chore!: remove manylinux 2014 build (#2891)

357735a

Manylinux 2014 is [EOL on June 2024](https://github.com/pypa/manylinux).

install clang

2e6e649

do a test run of python wheel build

29f2632

eddyxu marked this pull request as draft September 16, 2024 17:22

Merge branch 'main' into lei/clang

1e77c82

eddyxu added 8 commits September 17, 2024 09:10

Merge branch 'main' into lei/clang

ed91e2d

pass CXX

0420d2e

do not specify clang

c735499

skip setting cc for manylinux 2014

c89adbe

a

c4fd738

arch protoc

4981062

revert aarch64 build

e6095e0

revert pypi publish

5e14411

eddyxu marked this pull request as ready for review September 17, 2024 17:51

eddyxu added 3 commits September 17, 2024 10:51

Merge branch 'main' into lei/clang

9a9beee

Merge branch 'main' into lei/clang

955ba6c

Merge branch 'main' into lei/clang

5b97742

eddyxu requested review from westonpace, wjones127 and chebbyChefNEQ September 18, 2024 17:38

Merge branch 'main' into lei/clang

6b43cfb

wjones127 approved these changes Sep 18, 2024

View reviewed changes

eddyxu merged commit 739545f into main Sep 18, 2024
25 checks passed

eddyxu deleted the lei/clang branch September 18, 2024 20:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: change default compiler to clang #2887

chore: change default compiler to clang #2887

eddyxu commented Sep 16, 2024

codecov-commenter commented Sep 16, 2024 •

edited

Loading

wjones127 left a comment

wjones127 Sep 16, 2024

eddyxu Sep 16, 2024

eddyxu commented Sep 16, 2024

eddyxu commented Sep 17, 2024

eddyxu commented Sep 17, 2024

wjones127 left a comment

chore: change default compiler to clang #2887

chore: change default compiler to clang #2887

Conversation

eddyxu commented Sep 16, 2024

codecov-commenter commented Sep 16, 2024 • edited Loading

Codecov Report

wjones127 left a comment

Choose a reason for hiding this comment

wjones127 Sep 16, 2024

Choose a reason for hiding this comment

eddyxu Sep 16, 2024

Choose a reason for hiding this comment

eddyxu commented Sep 16, 2024

eddyxu commented Sep 17, 2024

eddyxu commented Sep 17, 2024

wjones127 left a comment

Choose a reason for hiding this comment

codecov-commenter commented Sep 16, 2024 •

edited

Loading