Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How many trailing semicolons for a multi-dim array literal? #26858

Open
arifthpe opened this issue Mar 6, 2025 · 8 comments
Open

How many trailing semicolons for a multi-dim array literal? #26858

arifthpe opened this issue Mar 6, 2025 · 8 comments

Comments

@arifthpe
Copy link
Contributor

arifthpe commented Mar 6, 2025

#26782 introduces multi-dimensional rectangular array literals using repeated semicolons to separate entries in different dimensions. It only allows the use of 0 or 1 trailing semicolons after all elements, even if the number of semicolons being used to separate the highest dimension is greater; e.g., for an N-dim array, that number would be N-1. This makes the last hyper-row in an array a special case syntactically, as @bradcray points out.

Some potential options for number of trailing semicolons:

(a) 0
(b) 1
(c) 0 or 1
(d) 0 or N-1
(e) 0 _to_ N-1 inclusive
(f) N-1

I lean towards (d) to start with, because I view it as the N-dim analogue of allowing 0 or 1 trailing commas for a 1-dim array, since a chunk of semicolons is a "heavy comma".

@arifthpe
Copy link
Contributor Author

arifthpe commented Mar 6, 2025

cc @jabraham17

@bradcray
Copy link
Member

bradcray commented Mar 6, 2025

Thanks for filing this, Anna!

(d) also feels like a natural starting point to me. At least, I haven't come up with a case where I feel like a different number is justifiable, currently—and it feels like it'd be unfortunate style at the very least.

@jabraham17
Copy link
Member

I think my concern was with supporting e, where the examples from #8864 use either no trailing semicolon, 1 trailing semicolon, or N-1 trailing semicolons (and nothing in between). I am concerned about the semantics of a "ragged" number of trailing semicolons. For example

  var arr4d: [{11..12, -1..1, 1..3, 0..3}] int = [
    1, 2, 3, 4 ;
    5, 6, 7, 8 ;
    9, 10, 11, 12 ;
    ;
    13, 14, 15, 16 ;
    17, 18, 19, 20 ;
    21, 22, 23, 24 ;
    ;
    25, 26, 27, 28 ;
    29, 30, 31, 32 ;
    33, 34, 35, 36 ;
    ;;
    37, 38, 39, 40 ;
    41, 42, 43, 44 ;
    45, 46, 47, 48 ;
    ;
    49, 50, 51, 52 ;
    53, 54, 55, 56 ;
    57, 58, 59, 60 ;
    ;
    61, 62, 63, 64 ;
    65, 66, 67, 68 ;
    69, 70, 71, 72 ;
    ;
  ];

This is option e and to me just looks wrong. Which maybe implies option d.

But I would actually argue for an option g, which is c plus f. That is 0, 1, or N-1 trailing semicolons.

I am sympathetic to the argument that 0 or N-1 trailing semicolons for an N-D array is analogous to 0 or 1 trailing commas for a 1-D array. But most of the code samples from #8864 only had 1 trailing semicolon, even for higher dimensions.

@bradcray
Copy link
Member

bradcray commented Mar 6, 2025

@jabraham17 : Since #8864's a massive issue, are there specific comments/examples you're referring to here? I'm curious to refer back to them (and wondering if I wrote them and still believe in them).

@jabraham17
Copy link
Member

Here are ones I pulled out that use only a single trailing heavyweight comma, where I am also considering examples that use other variants of the heavyweight comma like \\

Note that while going back through the full issues, I actually could not find a single case of using N-1 trailing semicolons. Granted there aren't that many examples (lots of discussion and other syntaxes) and I was going fast, so I could be wrong.

I did also look back at my slides, there are no examples of N-1 semicolons. Only 0 or 1 (although there are few >3 dimensions represented). And I'd have to double check the notes, but I don't recall us ever discussing anything other than 0 or 1 trailing semicolons

@bradcray
Copy link
Member

bradcray commented Mar 6, 2025

Here are ones I pulled out that use only a single trailing heavyweight comma

Hmm, that's interesting. Those are more compelling than I expected them to be—in the sense that I could imagine myself typing them and would feel bad requiring a user to add another semicolon (or remove one). And would feel irritated if the compiler made me type another one (or remove one).

I don't recall us ever discussing anything other than 0 or 1 trailing semicolons

I agree, and am not trying to suggest that you've diverged from the plan. Mostly, I think we didn't really talk much about trailing semicolons in general or what the plan was. IIRC, you and I exchanged some 1:1 messages about them when I was trying to get the lay of the land and what your PR supported, and I think the slide they came up on was added to the second meeting's deck towards the end of the edit process(?) and didn't get much attention or discussion. Basically, I think we mostly ignored the details rather than deciding on anything intentionally.

Again, my main argument for permitting N-1 is the same as my rationale for supporting trailing commas at all: That it's much easier to write a [nested] loop that prints out myElem [comma] or post-loop [semicolon] without having an "am I the last element?" check to squash the punctuation (which may not even be possible in all cases, say you're streaming data or something like that). On top of that, I think that:

var A = [1, 2;
         3, 4;
         ;
         5, 6;
         7, 8;
         ;
        ];       

var B = [1, 2;
         3, 4;;
         5, 6;
         7, 8;;
        ];

are not at all unreasonable array literals.

My main hesitancy with (g) is that it feels very arbitrary, in the "one, two, many" sense. Like (e) seems more principled even though, stylistically, I'd definitely balk if someone used 2 semicolons to terminate a 4D array (though at the same time, I think "heaven help whoever's trying to type out a 4D array anyway, we should give that person a break."). I'd also like to think that (e) would result in a more consistent / less special-casey grammar/checks than (g), just thinking in terms of recursive grammars.

It also seems notable that for the common cases of 1D-3D arrays, (e) and (g) are the same (which I think is a big part of why we haven't had examples of it before… we've written virtually no 4D arrays, thank goodness).

Based on this, I think that, left on a desert island as the only Chapel developer, I'd implement (e) but then have the parser, or better, the linter, give a style warning if using anything outside of (g).

@lydia-duncan
Copy link
Member

I agree with the direction here, though I think we should be sure to link to this issue from #8864 in case someone there isn't following all issues opened on the repo

@arifthpe
Copy link
Contributor Author

arifthpe commented Mar 6, 2025

@lydia-duncan good idea, done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants