-
Notifications
You must be signed in to change notification settings - Fork 5k
[RISC-V] Use auipc for all code addresses #116780
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
RISC-V Release-CLR-VF2: 9084 / 9114 (99.67%)
report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-CLR-QEMU: 9083 / 9113 (99.67%)
report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-FX-VF2: 306856 / 308579 (99.44%)
report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-FX-QEMU: 274559 / 275631 (99.61%)
report.xml, report.md, failures.xml, testclr_details.tar.zst Build information and commandsGIT: |
No regressions. Diffs on bigger corpus coming soon. Diffs are based on 12,289 contexts (10,113 MinOpts, 2,176 FullOpts). Overall (-9,576 bytes)
MinOpts (-6,616 bytes)
FullOpts (-2,960 bytes)
Example diffstest.mch-12 (-9.68%) : 9004.dasm - Microsoft.CodeAnalysis.CSharp.Syntax.InternalSyntax.BinaryExpressionSyntax:GetSlot(int):Microsoft.CodeAnalysis.GreenNode:this (Tier1)@@ -28,19 +28,16 @@ G_M16219_IG02: ; bbWeight=1, gcrefRegs=0400 {a0}, byrefRegs=0000 {}, byre
addi ra, zero, 0xD1FFAB1E
bltu ra, t6, G_M16219_IG07
zext.w a1, a1
- auipc t6, 0xD1FFAB1E
- addi a2, t6, 0xD1FFAB1E
+ auipc a2, 0xD1FFAB1E
+ addi a2, a2, 0xD1FFAB1E
slli a3, a1, 2
add a2, a2, a3
lw a2, 0xD1FFAB1E(a2)
- lui t6, 0xD1FFAB1E
- addi t6, t6, 0xD1FFAB1E
- lui a3, 0xD1FFAB1E
- slli a3, a3, 20
- add a3, a3, t6
+ auipc a3, 0xD1FFAB1E
+ addi a3, a3, 0xD1FFAB1E
add a2, a2, a3
jr a2
- ;; size=64 bbWeight=1 PerfScore 19.00
+ ;; size=52 bbWeight=1 PerfScore 14.00
G_M16219_IG03: ; bbWeight=0.50, gcrefRegs=0400 {a0}, byrefRegs=0000 {}, byref
ld a0, 0xD1FFAB1E(a0)
;; size=4 bbWeight=0.50 PerfScore 1.00
@@ -68,7 +65,7 @@ RWD00 dd G_M16219_IG03 - G_M16219_IG02
dd G_M16219_IG05 - G_M16219_IG02
-; Total bytes of code 124, prolog size 16, PerfScore 38.25, instruction count 26, allocated bytes for code 124 (MethodHash=9308c0a4) for method Microsoft.CodeAnalysis.CSharp.Syntax.InternalSyntax.BinaryExpressionSyntax:GetSlot(int):Microsoft.CodeAnalysis.GreenNode:this (Tier1)
+; Total bytes of code 112, prolog size 16, PerfScore 33.25, instruction count 26, allocated bytes for code 112 (MethodHash=9308c0a4) for method Microsoft.CodeAnalysis.CSharp.Syntax.InternalSyntax.BinaryExpressionSyntax:GetSlot(int):Microsoft.CodeAnalysis.GreenNode:this (Tier1)
; ============================================================
Unwind Info:
@@ -79,7 +76,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 31 (0x0001f) Actual length = 124 (0x00007c)
+ Function Length : 28 (0x0001c) Actual length = 112 (0x000070)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) -12 (-9.09%) : 9477.dasm - Microsoft.CodeAnalysis.CSharp.Syntax.InternalSyntax.BlockSyntax:GetSlot(int):Microsoft.CodeAnalysis.GreenNode:this (Tier1)@@ -28,19 +28,16 @@ G_M31535_IG02: ; bbWeight=1, gcrefRegs=0400 {a0}, byrefRegs=0000 {}, byre
addi ra, zero, 0xD1FFAB1E
bltu ra, t6, G_M31535_IG08
zext.w a1, a1
- auipc t6, 0xD1FFAB1E
- addi a2, t6, 0xD1FFAB1E
+ auipc a2, 0xD1FFAB1E
+ addi a2, a2, 0xD1FFAB1E
slli a3, a1, 2
add a2, a2, a3
lw a2, 0xD1FFAB1E(a2)
- lui t6, 0xD1FFAB1E
- addi t6, t6, 0xD1FFAB1E
- lui a3, 0xD1FFAB1E
- slli a3, a3, 20
- add a3, a3, t6
+ auipc a3, 0xD1FFAB1E
+ addi a3, a3, 0xD1FFAB1E
add a2, a2, a3
jr a2
- ;; size=64 bbWeight=1 PerfScore 19.00
+ ;; size=52 bbWeight=1 PerfScore 14.00
G_M31535_IG03: ; bbWeight=0.56, gcrefRegs=0400 {a0}, byrefRegs=0000 {}, byref
ld a0, 0xD1FFAB1E(a0)
;; size=4 bbWeight=0.56 PerfScore 1.11
@@ -73,7 +70,7 @@ RWD00 dd G_M31535_IG03 - G_M31535_IG02
dd G_M31535_IG05 - G_M31535_IG02
-; Total bytes of code 132, prolog size 16, PerfScore 38.17, instruction count 28, allocated bytes for code 132 (MethodHash=959684d0) for method Microsoft.CodeAnalysis.CSharp.Syntax.InternalSyntax.BlockSyntax:GetSlot(int):Microsoft.CodeAnalysis.GreenNode:this (Tier1)
+; Total bytes of code 120, prolog size 16, PerfScore 33.17, instruction count 28, allocated bytes for code 120 (MethodHash=959684d0) for method Microsoft.CodeAnalysis.CSharp.Syntax.InternalSyntax.BlockSyntax:GetSlot(int):Microsoft.CodeAnalysis.GreenNode:this (Tier1)
; ============================================================
Unwind Info:
@@ -84,7 +81,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 33 (0x00021) Actual length = 132 (0x000084)
+ Function Length : 30 (0x0001e) Actual length = 120 (0x000078)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) -12 (-8.57%) : 1988.dasm - Microsoft.CodeAnalysis.CSharp.Syntax.InternalSyntax.CompilationUnitSyntax:GetSlot(int):Microsoft.CodeAnalysis.GreenNode:this (Tier1)@@ -28,19 +28,16 @@ G_M20649_IG02: ; bbWeight=1, gcrefRegs=0400 {a0}, byrefRegs=0000 {}, byre
addi ra, zero, 0xD1FFAB1E
bltu ra, t6, G_M20649_IG09
zext.w a1, a1
- auipc t6, 0xD1FFAB1E
- addi a2, t6, 0xD1FFAB1E
+ auipc a2, 0xD1FFAB1E
+ addi a2, a2, 0xD1FFAB1E
slli a3, a1, 2
add a2, a2, a3
lw a2, 0xD1FFAB1E(a2)
- lui t6, 0xD1FFAB1E
- addi t6, t6, 0xD1FFAB1E
- lui a3, 0xD1FFAB1E
- slli a3, a3, 20
- add a3, a3, t6
+ auipc a3, 0xD1FFAB1E
+ addi a3, a3, 0xD1FFAB1E
add a2, a2, a3
jr a2
- ;; size=64 bbWeight=1 PerfScore 19.00
+ ;; size=52 bbWeight=1 PerfScore 14.00
G_M20649_IG03: ; bbWeight=0.36, gcrefRegs=0400 {a0}, byrefRegs=0000 {}, byref
ld a0, 0xD1FFAB1E(a0)
;; size=4 bbWeight=0.36 PerfScore 0.73
@@ -78,7 +75,7 @@ RWD00 dd G_M20649_IG05 - G_M20649_IG02
dd G_M20649_IG06 - G_M20649_IG02
-; Total bytes of code 140, prolog size 16, PerfScore 38.45, instruction count 30, allocated bytes for code 140 (MethodHash=a2b0af56) for method Microsoft.CodeAnalysis.CSharp.Syntax.InternalSyntax.CompilationUnitSyntax:GetSlot(int):Microsoft.CodeAnalysis.GreenNode:this (Tier1)
+; Total bytes of code 128, prolog size 16, PerfScore 33.45, instruction count 30, allocated bytes for code 128 (MethodHash=a2b0af56) for method Microsoft.CodeAnalysis.CSharp.Syntax.InternalSyntax.CompilationUnitSyntax:GetSlot(int):Microsoft.CodeAnalysis.GreenNode:this (Tier1)
; ============================================================
Unwind Info:
@@ -89,7 +86,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 35 (0x00023) Actual length = 140 (0x00008c)
+ Function Length : 32 (0x00020) Actual length = 128 (0x000080)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) +0 (0.00%) : 12080.dasm - IDEAEncryption:de_key_idea(char[],char[]) (Instrumented Tier0)@@ -778,8 +778,8 @@ G_M48916_IG08: ; bbWeight=1, extend
;; size=32 bbWeight=1 PerfScore 20.00
G_M48916_IG09: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
; byrRegs -[a0]
- auipc t6, 0xD1FFAB1E
- ld a0, 0xD1FFAB1E(t6)
+ auipc a0, 0xD1FFAB1E
+ ld a0, 0xD1FFAB1E(a0)
lui a1, 0xD1FFAB1E
addiw a1, a1, 0xD1FFAB1E
slli a1, a1, 32 +0 (0.00%) : 12048.dasm - AssignJagged:Run():double:this (Tier1)@@ -322,8 +322,8 @@ G_M27632_IG17: ; bbWeight=15.53, gcVars=0000200001000000 {V00 V01}, gcref
bnez t6, G_M27632_IG40
;; size=84 bbWeight=15.53 PerfScore 450.49
G_M27632_IG18: ; bbWeight=15.53, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
- auipc t6, 0xD1FFAB1E
- ld a0, 0xD1FFAB1E(t6)
+ auipc a0, 0xD1FFAB1E
+ ld a0, 0xD1FFAB1E(a0)
mulh a0, a0, s3
srli a1, a0, 63
srai a0, a0, 7 +0 (0.00%) : 12032.dasm - AssignJagged:second_assignments(int[][],short[][]) (Instrumented Tier0)@@ -132,8 +132,8 @@ G_M6376_IG06: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
j G_M6376_IG08
;; size=24 bbWeight=1 PerfScore 12.00
G_M6376_IG07: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
- auipc t6, 0xD1FFAB1E
- ld a0, 0xD1FFAB1E(t6)
+ auipc a0, 0xD1FFAB1E
+ ld a0, 0xD1FFAB1E(a0)
lui a1, 0xD1FFAB1E
addiw a1, a1, 0xD1FFAB1E
slli a1, a1, 32
@@ -230,8 +230,8 @@ G_M6376_IG12: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
j G_M6376_IG17
;; size=124 bbWeight=1 PerfScore 49.00
G_M6376_IG13: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
- auipc t6, 0xD1FFAB1E
- ld a0, 0xD1FFAB1E(t6)
+ auipc a0, 0xD1FFAB1E
+ ld a0, 0xD1FFAB1E(a0)
lui a1, 0xD1FFAB1E
addiw a1, a1, 0xD1FFAB1E
slli a1, a1, 32
@@ -296,8 +296,8 @@ G_M6376_IG17: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
;; size=84 bbWeight=1 PerfScore 33.00
G_M6376_IG18: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
; byrRegs -[a0]
- auipc t6, 0xD1FFAB1E
- ld a0, 0xD1FFAB1E(t6)
+ auipc a0, 0xD1FFAB1E
+ ld a0, 0xD1FFAB1E(a0)
lui a1, 0xD1FFAB1E
addiw a1, a1, 0xD1FFAB1E
slli a1, a1, 32
@@ -424,8 +424,8 @@ G_M6376_IG26: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
;; size=132 bbWeight=1 PerfScore 54.00
G_M6376_IG27: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
; byrRegs -[a0]
- auipc t6, 0xD1FFAB1E
- ld a0, 0xD1FFAB1E(t6)
+ auipc a0, 0xD1FFAB1E
+ ld a0, 0xD1FFAB1E(a0)
lui a1, 0xD1FFAB1E
addiw a1, a1, 0xD1FFAB1E
slli a1, a1, 32
@@ -475,8 +475,8 @@ G_M6376_IG31: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byre
; gcr arg pop 0
;; size=40 bbWeight=0.50 PerfScore 6.00
G_M6376_IG32: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
- auipc t6, 0xD1FFAB1E
- ld a0, 0xD1FFAB1E(t6)
+ auipc a0, 0xD1FFAB1E
+ ld a0, 0xD1FFAB1E(a0)
lui a1, 0xD1FFAB1E
addiw a1, a1, 0xD1FFAB1E
slli a1, a1, 32
@@ -600,8 +600,8 @@ G_M6376_IG37: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
sw a0, -52(fp)
;; size=192 bbWeight=1 PerfScore 80.50
G_M6376_IG38: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
- auipc t6, 0xD1FFAB1E
- ld a0, 0xD1FFAB1E(t6)
+ auipc a0, 0xD1FFAB1E
+ ld a0, 0xD1FFAB1E(a0)
lui a1, 0xD1FFAB1E
addiw a1, a1, 0xD1FFAB1E
slli a1, a1, 32
@@ -651,8 +651,8 @@ G_M6376_IG42: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byre
; gcr arg pop 0
;; size=40 bbWeight=0.50 PerfScore 6.00
G_M6376_IG43: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
- auipc t6, 0xD1FFAB1E
- ld a0, 0xD1FFAB1E(t6)
+ auipc a0, 0xD1FFAB1E
+ ld a0, 0xD1FFAB1E(a0)
lui a1, 0xD1FFAB1E
addiw a1, a1, 0xD1FFAB1E
slli a1, a1, 32
@@ -711,8 +711,8 @@ G_M6376_IG47: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byre
j G_M6376_IG22
;; size=44 bbWeight=0.50 PerfScore 6.75
G_M6376_IG48: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
- auipc t6, 0xD1FFAB1E
- ld a0, 0xD1FFAB1E(t6)
+ auipc a0, 0xD1FFAB1E
+ ld a0, 0xD1FFAB1E(a0)
lui a1, 0xD1FFAB1E
addiw a1, a1, 0xD1FFAB1E
slli a1, a1, 32
@@ -813,8 +813,8 @@ G_M6376_IG50: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
sw a0, -44(fp)
;; size=204 bbWeight=1 PerfScore 86.50
G_M6376_IG51: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
- auipc t6, 0xD1FFAB1E
- ld a0, 0xD1FFAB1E(t6)
+ auipc a0, 0xD1FFAB1E
+ ld a0, 0xD1FFAB1E(a0)
lui a1, 0xD1FFAB1E
addiw a1, a1, 0xD1FFAB1E
slli a1, a1, 32
@@ -864,8 +864,8 @@ G_M6376_IG55: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byre
; gcr arg pop 0
;; size=40 bbWeight=0.50 PerfScore 6.00
G_M6376_IG56: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
- auipc t6, 0xD1FFAB1E
- ld a0, 0xD1FFAB1E(t6)
+ auipc a0, 0xD1FFAB1E
+ ld a0, 0xD1FFAB1E(a0)
lui a1, 0xD1FFAB1E
addiw a1, a1, 0xD1FFAB1E
slli a1, a1, 32
@@ -949,8 +949,8 @@ G_M6376_IG62: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
sext.w t6, a0
addi ra, zero, 0xD1FFAB1E
beq t6, ra, G_M6376_IG63
- auipc t6, 0xD1FFAB1E
- ld a0, 0xD1FFAB1E(t6)
+ auipc a0, 0xD1FFAB1E
+ ld a0, 0xD1FFAB1E(a0)
lui a1, 0xD1FFAB1E
addiw a1, a1, 0xD1FFAB1E
slli a1, a1, 32
@@ -1029,8 +1029,8 @@ G_M6376_IG66: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
blt t6, ra, G_M6376_IG62
;; size=16 bbWeight=1 PerfScore 6.50
G_M6376_IG67: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
- auipc t6, 0xD1FFAB1E
- ld a0, 0xD1FFAB1E(t6)
+ auipc a0, 0xD1FFAB1E
+ ld a0, 0xD1FFAB1E(a0)
lui a1, 0xD1FFAB1E
addiw a1, a1, 0xD1FFAB1E
slli a1, a1, 32
@@ -1113,8 +1113,8 @@ G_M6376_IG73: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
sext.w t6, a0
addi ra, zero, 0xD1FFAB1E
bne t6, ra, G_M6376_IG74
- auipc t6, 0xD1FFAB1E
- ld a0, 0xD1FFAB1E(t6)
+ auipc a0, 0xD1FFAB1E
+ ld a0, 0xD1FFAB1E(a0)
lui a1, 0xD1FFAB1E
addiw a1, a1, 0xD1FFAB1E
slli a1, a1, 32
@@ -1193,8 +1193,8 @@ G_M6376_IG77: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
blt t6, ra, G_M6376_IG73
;; size=16 bbWeight=1 PerfScore 6.50
G_M6376_IG78: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
- auipc t6, 0xD1FFAB1E
- ld a0, 0xD1FFAB1E(t6)
+ auipc a0, 0xD1FFAB1E
+ ld a0, 0xD1FFAB1E(a0)
lui a1, 0xD1FFAB1E
addiw a1, a1, 0xD1FFAB1E
slli a1, a1, 32
@@ -1242,8 +1242,8 @@ G_M6376_IG82: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
sext.w t6, a0
addi ra, zero, 0xD1FFAB1E
blt t6, ra, G_M6376_IG72
- auipc t6, 0xD1FFAB1E
- ld a0, 0xD1FFAB1E(t6)
+ auipc a0, 0xD1FFAB1E
+ ld a0, 0xD1FFAB1E(a0)
lui a1, 0xD1FFAB1E
addiw a1, a1, 0xD1FFAB1E
slli a1, a1, 32 DetailsSize improvements/regressions per collection
PerfScore improvements/regressions per collection
Context information
jit-analyze output |
What would it take to synthetically reproduce a scenario like that? |
Just an artificially large test, I guess. The more interesting question is whether supporting >2Gb code has any added value to the end user (if such test wasn't developed yet, I would guess not). |
Sizes throughout the code manager and the JIT are limited to 32-bit. For example: runtime/src/coreclr/inc/corjit.h Lines 82 to 86 in f1a4b89
|
For example, runtime/src/coreclr/jit/emit.cpp Lines 4474 to 4479 in 5951ad2
|
Make all code segment addresses (for jumps, branches, data-constants, labels, etc) PC-relative.
Since the code size is limited to about 2 Gb 1, a canonical
auipc
+ I-type instruction combo with 32-bit range should reach anywhere. Treating code addresses as absolute even if known at compile-time is counter-effective because RISC-V needed to emit "myriad sequences" (more instructions) to synthesize long constants. It also simplifies the emitter.Part of #84834, cc @dotnet/samsung
Footnotes
See this discussion. If we ever hit such failures, either the absolute addressing paths can be brought back, or some other solution like branch islands can be worked out. Either way, code sizes >2 Gb aren't likely, assuming them as default complicates the output asm. ↩