Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

By-length slicing with sentinel in first slice doesn't do a by-length slice #22371

Open
dweiller opened this issue Dec 31, 2024 · 2 comments
Open
Labels
bug Observed behavior contradicts documented or intended behavior

Comments

@dweiller
Copy link
Contributor

dweiller commented Dec 31, 2024

Zig Version

0.14.0-dev.2571+01081cc8e

Steps to Reproduce and Observed Behavior

Put this snippet in by-length.zig

test {
    var foo: [2:0] = .{ 1, 2 };
    _ = foo[0.. :0][0..2];
}

and (with a debug build of the compiler) run

zig ast-check -t by-length.zig` | grep "slice_"

to see that the above zig code is producing slice_sentinel and slice_end ZIR nodes rather than a slice_length ZIR node.

Expected Behavior

I believe the above pattern should produce a slice_length ZIR node. This behaviour was changed in #22344 which disabled the by-length slice optimisation when performing a slice such as a[0.. :sentinel][0..len]. From the description of the PR it's unclear whether this was intended or not as the mentioned motivation for the change was to prevent allowing nonsense like

test {
    var foo: [2:0] = .{ 1, 2 };
    _ = foo[0.. :42][0..2];
}

Unfortunately I didn't document in the original by-length slice PR the reason why a sentinel in the first slice above was previously ignored, but if I remember correctly it was because it doesn't have any bearing on the result or type of the whole slicing expression, and I suppose I didn't consider that the above snippet would have needed an extra sentinel check.

If we want the allow

test {
    var foo: [2:0] = .{ 1, 2 };
    _ = foo[0.. :0][0..2];
}

To benefit from by-length slicing then we will need to rework the AstGen fix from #22344 to produce ZIR like it did previously and rather add extra sentinel checks in Sema when either when by_length is true in analyzeSlice or in zirSliceLength. As a side note it may be worth seeing if there is way to to simplify analyzeSlice as I think it's getting a bit hard to follow (especially with the mentioned sentinel checks added).

If the plan is to disable by-length slicing in the above example then it may be worth mentioning this edge case in the language reference and 0.14 release notes, but I believe this would be an unnecessary edge case in the language.

cc @Techatrix

@dweiller dweiller added the bug Observed behavior contradicts documented or intended behavior label Dec 31, 2024
@Techatrix
Copy link
Contributor

I was aware that it would no longer emit .slice_length for a[0.. :sentinel][0..len]. I did not think much of it since I doubt that code like this gets written outside of compiler tests.

To verify my previous statement, I modified AstGen to panic when encountering a[0.. :sentinel][0..len] and found exactly zero occurrences of this pattern in the source code of Zig, Bun, Mach, ZLS, Tigerbeetle and Ghostty. The exception being test/behavior/slice.zig. @dweiller are you aware of any usage of this pattern?

Patch to detect a[0.. :sentinel][0..len]
diff --git a/lib/std/zig/AstGen.zig b/lib/std/zig/AstGen.zig
index 6e424b597f..f396efcde3 100644
--- a/lib/std/zig/AstGen.zig
+++ b/lib/std/zig/AstGen.zig
@@ -882,10 +882,17 @@ fn expr(gz: *GenZir, scope: *Scope, ri: ResultInfo, node: Ast.Node.Index) InnerE
         => {
             const full = tree.fullSlice(node).?;
             if (full.ast.end != 0 and
-                node_tags[full.ast.sliced] == .slice_open and
+                (node_tags[full.ast.sliced] == .slice_open or node_tags[full.ast.sliced] == .slice_sentinel) and
                 nodeIsTriviallyZero(tree, full.ast.start))
-            {
-                const lhs_extra = tree.sliceOpen(full.ast.sliced).ast;
+            blk: {
+                const lhs_extra = tree.fullSlice(full.ast.sliced).?.ast;
+                if (node_tags[full.ast.sliced] == .slice_sentinel) {
+                    if (tree.sliceSentinel(full.ast.sliced).ast.end == 0) {
+                        @panic("bad");
+                    } else {
+                        break :blk; // a[0..end][0..len] or a[0..end :sentinel][0..len]
+                    }
+                }
 
                 const lhs = try expr(gz, scope, .{ .rl = .ref }, lhs_extra.sliced);
                 const start = try expr(gz, scope, .{ .rl = .{ .coerced_ty = .usize_type } }, lhs_extra.start);
build/stage4-fmt/bin/zig fmt --check --ast-check ~/repos/zig/src ~/repos/zig/lib ~/repos/bun ~/repos/mach ~/repos/zls ~/repos/tigerbeetle/ ~/repos/ghostty/ 2> /dev/null

AFAICT, the .slice_length case is only meant as an code optimization that causes no semantic changes. I don't see the benefit of putting in effort to optimize for non existent code.

@dweiller
Copy link
Contributor Author

dweiller commented Dec 31, 2024

I'm not particularly aware of real-world usecases - the place noticed it was in the behaviour tests for slicing. I would also expect that this pattern is rare in real code, the reason being that a sentinel in the first slicing operation can't affect the result/type of the expression and provides no useful information to the compiler (other than a clue that the author may have made an error). I would imagine the way this pattern would emerge in a code base is a pre-existing, open sentinel-terminated slice gets reworked to be done by-length by adding the second slice operation.

AFAICT, the .slice_length case is only meant as an code optimization that causes no semantic changes. I don't see the benefit of putting in effort to optimize for non existent code.

Yes - it shouldn't have semantic effects, the semantic effects should be in moving from s[a.. a + len] to s[a..][0..len]. It just seemed odd to me to disable the optimization in this circumstance, but I haven't looked into how much complication would be needed in Sema to re-enable it. If it is only an optimization for the compiler I would agree that it's probably not worth changing. If it's going to affect generated code I would rather re-enable the optimization (or simply making s[a.. :b][0..len] a compile error) to remove an unnecessary performance edge case. I'm less familiar with the backends though, so I'd have to read through the code to figure out how/if it affects generated code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Observed behavior contradicts documented or intended behavior
Projects
None yet
Development

No branches or pull requests

2 participants