Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Named wildcards (extending valid identifier syntax) #54

Open
Skyb0rg007 opened this issue Oct 28, 2024 · 22 comments
Open

Named wildcards (extending valid identifier syntax) #54

Skyb0rg007 opened this issue Oct 28, 2024 · 22 comments

Comments

@Skyb0rg007
Copy link

Skyb0rg007 commented Oct 28, 2024

Giving names to variables is a good way of documenting your code for future readers.
For example, you may have a function MyObj.fold : (int * int * 'a -> 'a) -> 'a -> MyObj.t -> 'a that passes the function a triple of (index, value, accumulator).

fun mySum obj = MyObj.fold (fn (idx, value, acc) => value + acc) 0 obj

If you want to document the fact that idx is not used, it is sometimes good practice to not bind the variable name.
This way a linter can warn for unused variables.

fun mySum obj = MyObj.fold (fn (_, value, acc) => value + acc) 0 obj

But now there is no documentation on what that missing parameter actually meant, since we no longer give it a name.

One option is to add a comment. This is somewhat popular in C++:

fun mySum obj = MyObj.fold (fn (_ (* idx *), value, acc) => value + acc) 0 obj

But a better option is what Haskell, OCaml, Erlang, Prolog, JavaScript and other languages allow: prefix the identifier with an underscore, and make the "unused variable" linter ignore variables with leading underscores.

fun mySum obj = MyObj.fold (fn (_idx, value, acc) => value + acc) 0 obj

This way you maintain the benefits of naming for documentation purposes, but also improves the readability of your code because the reader can immediately know that the variable is not used.

Changes

The simplest method to support this change is to change the wildcard syntax to _[a-zA-Z0-9']*, where _foo is considered a named wildcard no different from _.

Potential Incompatibilities

Currently MLton uses this syntax for language extensions: _address, _build_const, _command_line_const, _const, _export, _import, _overload, _prim, _symbol. These uses seem possible to work around, because each of these extensions only make sense in expression or definition context and these special tokens can fallback to wildcards in pattern syntax.

@MatthewFluet
Copy link
Contributor

As you say, the leading underscore is only a convention to be used by the "unused variable" linter. One less invasive option would be to make the SML convention be a trailing underscore.

But, it seems that rather than expanding the tokens that serve as the wildcard pattern, it might be better to expand the regular expression for identifiers to allow for a leading underscore.

@dmacqueen
Copy link

What is the "unused variable" linter? I've never heard of it.

This seems like a mild change (expanding the definition of the wildcard pattern symbol, though it doesn't do much (at least for me). So I would not object. My focus on language changes has shifted to my NewFrontEnd project (repository dmacqueen/NewFrontEnd) where I have proposed some micro-design changes for my own version of Successor ML (MsML, or MacQueen's successor ML). See the files language.txt and dbm-on-rwh1.txt. I could include this change with respect to the wildcard pattern, though my motivation would not be strong.

@MatthewFluet
Copy link
Contributor

What is the "unused variable" linter? I've never heard of it.

I don't think that it is any specific tool, just the general expectation that a number of tools (e.g., MLton's "unusedVariable warn" MLB annotation, an SML LSP server) provide a warning on unused variables.

@ratmice
Copy link
Contributor

ratmice commented Nov 24, 2024

My preference here is to keep dissallowing _ as a variable prefix, but allow _foo to be matched the same as _ is currently, at least in languages like rust _foo has the behavior of disabling the lint which complains when a variable is not used, but it is still a bound variable named _foo and can still be used.

I fail to see the point of binding it if it is unused, nor the point of declaring it okay to go unused when it is in fact used.
As such, I would rather _foo just being a more descriptive alias for _

@Skyb0rg007
Copy link
Author

Skyb0rg007 commented Nov 24, 2024

It's important to avoid leading underscore identifiers to have maximal compatibility with systems that use underscore-prefixed identifiers for language extensions.

For example, _import is used by MLton for FFI.
If _import was a valid identifier, then code such as:

open M
val x = _import "g" : unit -> unit;

would be ambiguous, as the interpretation would depend on M's signature (M._import : string -> unit -> unit would make this a valid SML snippet under that proposed rule).

@MatthewFluet
Copy link
Contributor

Well, I would just consider _import another reserved word, no different than if or structure. But, admittedly, that's a MLton-specific interpretation.

@ratmice
Copy link
Contributor

ratmice commented Nov 25, 2024

While it doesn't really help for these existing incompatibilities, perhaps there could be a reserved prefix like __ double underscores which could be carved out for implementation specific usage like __import with double underscores.

Could MLton consider deprecating these existing reserved words in favor of e.g. double underscore prefixed ones, if the definition could guarantee that double underscore was carved out for such purposes?

@dmacqueen
Copy link

Given that the proposal appears to be incompatibile with MLton, and given its limited utility (in my opinion), I think that this issue should be dropped (i.e. closed with no action recommended).

@Skyb0rg007
Copy link
Author

This proposal is not incompatible with MLton as it exists currently.
It is easy to interpret _import as a wildcard in patterns, much like how * is treated differently in type and expression contexts.
The incompatibility was suggested by @ratmice, which is not what I proposed.

@ratmice
Copy link
Contributor

ratmice commented Nov 26, 2024

I don't see how what I proposed is different from what you proposed in the following text:

Changes

The simplest method to support this change is to change the wildcard syntax to _[a-zA-Z0-9']*, where _foo is considered a named wildcard no different from _.

What I was responding to was Matthew Fluet's idea

But, it seems that rather than expanding the tokens that serve as the wildcard pattern, it might be better to expand the regular expression for identifiers to allow for a leading underscore.

Essentially saying that I prefer your original proposal over expanding the valid list of variable names to allow a leading _.
Because using other languages with this lint (where I find it useful and helpful in making me aware of unused code), I never see the point of allowing an allowed-to-be-unused variable to be used.

So in this regard, I am confused by the idea that I suggested an incompatibility. Hopefully this clears it up.

@ratmice
Copy link
Contributor

ratmice commented Nov 26, 2024

Another reason I like this proposal is the potential for a future destructuring expansion on records:
Consider the following code:

val {a, b = _} = {a = 0, b=false};

It would be nice if val {a, _b} = {a = 0, b = false}; were available as a shorthand notation for the above.
The proposal to expand _ to _foo allows this sort of pattern matching shorthand.
While allowing leading _ on variable bindings wouldn't allow it since _b could now actually be a member of a record, such as {a = 0, _b = false}. This is probably worth considering as a separate proposal on top of this one.

Edit: Tried to clarify original comment...

@Skyb0rg007
Copy link
Author

Apologies, I tagged the wrong commenter - @MatthewFluet was the one who suggested

it might be better to expand the regular expression for identifiers to allow for a leading underscore

which would cause an incompatibility with MLton, as it would have a keyword that conflicted with valid Standard ML programs.

@rossberg
Copy link
Member

FWIW,

fun f _import = import

is valid SML today, so technically, MLton is already in violation of the standard, and officially allowing leading underscores would only make it reject more valid programs. On the other hand, the above also implies that, technically, this would a breaking change, because the same program would start to lex differently and become invalid. Although I highly doubt that this breakage matters in practice.

Warning about unused identifiers is super useful in my experience (e.g. in OCaml), as it regularly catches subtle mistakes quickly, like accidentally consuming x instead of x'. So I would strongly support such a change.

@MatthewFluet
Copy link
Contributor

@rossberg good point about _import being lexically valid as _ followed by import. Maybe for a different discussion item, but it isn't clear to me what would be a good notation for a language extension that doesn't overlap with existing tokens/syntax.

@dmacqueen
Copy link

dmacqueen commented Nov 26, 2024

Good point, Andreas.

This discussion sounds to me like a typical "language lawyer" discussion. My personal preference is not to introduce minor changes in syntax (or semantics) unless there is a fairly obvious and significant payoff for the programmer. The proposal of introducing new wildcard notation doesn't seem to have much of a payoff, and as a programmer I would not be likely to use it (but I may be too set in my ways and may be missing the point).

But I am curious about how some "lint" process or tool would make use of this change? If you have a bound but unused variable "foo", it is clearly useful to report this to the programmer, and it seems like a straightforward feature to implement as part of the compiler static analysis (e.g. the elaboration phase in SML/NJ). I don't promise to implement this in SML/NJ, but I will certainly include it in my NewFrontEnd project. But I don't see any connection with the suggested named wildcard notation. Could someone explain it to me?

@rossberg
Copy link
Member

@dmacqueen, the way this works in other languages is that unused variables are warned about, unless they start with an underscore. Thereby the programmer makes clear the intention that it's essentially a wildcard, while still writing out a name for documentation purposes. Personally, I have come to find this quite useful for readability, especially when you'd otherwise have lots of non-descriptive wildcards, like with larger tuples.

I actually like @ratmice's suggestion to make these proper wildcards, although I also have to admit that I occasionally take benefit of the fact that such variables can still be used, e.g., when quickly inserting some temporary printf for debugging.

@dmacqueen
Copy link

@rossberg thanks for the explanation. I think I see the point. So if one binds a variable like _x, it is implied that _x should have no applied occurrences (i.e occurrences within its scope), and a checker for bound variables without applied occurrences would ignore it. Yet, if _x should actually have an applied occurrence, no harm done. Right?

@ratmice
Copy link
Contributor

ratmice commented Nov 28, 2024

@dmacqueen That sounds right to me, but I find the "should have no applied occurrences, but should it no harm done" description awkward, the way I would compare the two different proposals in play is this:

in other languages that currently implement this, from the perspective of the linter, variable bindings are separated into one of two domains. Those that start with a leading _ like _x which may be USED or UNUSED (or in other words ignored entirely by the linter), and all other variables which must be USED or the linter will complain.

In the Named Wildcards proposal the link such lints is admittedly less clear, for there is no USED or UNUSED domain,
since wildcards do not actually bind variables...

One question I have which I don't immediately see syntax conflicts with is choosing a different character than leading _ would work, such as leading . or .x but I haven't done a careful examination of that choice of character.

Edit: perhaps it isn't worth thinking about leading . which is probably hopelessly confusing because of values in structures...

@rossberg
Copy link
Member

@dmacqueen, yes, exactly.

@ratmice, I think the status of such "named wildcards" would remain different from regular wildcards, since you'd still want to disallow multiple occurrences of the same one within a single pattern. Technically, it seems easier to still treat them as regular variables, but the linter would simply complain about any use site of a variable that starts with an underscore. That is sort of the dual to the binding site check for variables that do not start with an underscore.

@ratmice
Copy link
Contributor

ratmice commented Nov 28, 2024

@rossberg Sure, that isn't an interpretation I likely would have considered, but it seems semantically equivalent,
however I do wonder why "you'd still want to disallow multiple occurrences of the same one within a single pattern."

Unless I am misunderstanding something, that would seem to preclude the following extension,
and I don't quite see the benefit of disallowing it.

val ({a = a_left, _b}, {a = a_right, _b}) = ({...}, {...});

@rossberg
Copy link
Member

rossberg commented Nov 28, 2024

It would seem confusing and misleading, and inconsistent to me if fun f _x _x was allowed.

In your example, is _b supposed to be short for b = _b? If so, that would require an additional (and I would argue, questionable) decomposition of identifiers/labels themselves, and consequently, you could just as well desugar it into b = _, which doesn't depend on allowing multiple occurrences.

@ratmice
Copy link
Contributor

ratmice commented Nov 28, 2024

Yeah _b was intended as syntax sugar for b = _ as in an extension to named wildcards mentioned in one of my earlier comments, with exactly that kind of decomposition of identifiers into wildcards.

I guess to me it doesn't seem inconsistent because fun f _ _ is allowed, but these interpretations are pretty mutually exclusive, and I guess the crux of whether one considers them variables, or wildcards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants