Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: If statement initializers #23

Open
wants to merge 12 commits into
base: master
Choose a base branch
from
Open
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
94 changes: 94 additions & 0 deletions docs/syntax-if-statements-initializers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
# Initializers in if statements

# Summary

Introduce an initializer expression that declares and initializes variables in if statements.

# Motivation

Initializers can improve code clarity and reduce scope pollution by allowing developers to declare variables in the conditions of if statements. ⁤⁤ Declaring variables at the point of use in if statements for the scope of the if statement's block simplifies code, leading to better readability and understanding of program logic. By limiting the scope of variables to the condition's pertaining block, the risk of unintended variable reuse and naming conflicts is reduced. ⁤

The reduced scope pollution improves register space in extreme cases (or auto-generated code) where developers have many variables defined and have to work around register limits by reducing the register size. In some scenarios, especially on Roblox, initializing variables within if statements can lead to improved performance and reduced complexity by avoiding unnecessary calls by developers. ⁤ A common paradigm used by Roblox developers is to use `Instance:FindFirstChild` in their condition and then access it afterwards instead of keeping an existing variable around in the new scope, polluting the existing scope.

# Design

If statements with initializers must match (following the Luau grammar) `'if' 'local' bindinglist ['=' explist] 'then'` and `'local' bindinglist ['=' explist] where exp 'then'` syntax. Parenthesis are also allowed around the initializer for consistency with other expressions. The variables declared by the initializer are only available to the block of that condition and will be undefined to the `elseif` conditions and blocks as well as the `else` block; any code after the if statement won't have the variables defined either.
vegorov-rbx marked this conversation as resolved.
Show resolved Hide resolved

In the former case, the value of the first declared variable will be checked.

Example:

```lua
local function foo()
return true
end

if local b = foo() then
print(b)
else
print("false")
end
```

`Output: true`

In the latter case, the `exp` condition is checked rather than the initializer.

Example:

```lua
local function foo()
return true
end

if local b = foo() where b == false then
print(b)
else
print("true")
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually gave me a motivating reason why it might be useful for b to be visible in the else branch. Say you have some animal:

type Dog = { tag: "dog", pet: true, name: string }
type Cat = { tag: "cat", pet: true, name: string }
type Sheep = { tag: "sheep", pet: false }
type Animal = Dog | Cat | Sheep

if local animal = getAnimal() where animal.pet then
  print(`Hello {animal.name}!`)
else
  print(`{animal.tag} is not a pet`) -- generalizes to all current and future `animal.tag`s
end

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another case:

if local success, msg = pcall(f) where success then
    print(`all went well`)
else
    print(`f failed with: {msg}`)
end

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually gave me a motivating reason why it might be useful for b to be visible in the else branch. Say you have some animal:

type Dog = { tag: "dog", pet: true, name: string }
type Cat = { tag: "cat", pet: true, name: string }
type Sheep = { tag: "sheep", pet: false }
type Animal = Dog | Cat | Sheep

if local animal = getAnimal() where animal.pet then
  print(`Hello {animal.name}!`)
else
  print(`{animal.tag} is not a pet`) -- generalizes to all current and future `animal.tag`s
end

Another case:

if local success, msg = pcall(f) where success then
    print(`all went well`)
else
    print(`f failed with: {msg}`)
end

Seeing it now allowing b to be visible in the else branch (and most likely elseif branches) would be useful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO pcall is the most compelling reason to scope the locals this way, but this scoping behavior is a consistent footgun in C++ where bindings in conditions are scoped to the whole if statement and you can trivially end up dereferencing null pointers accidentally using them in the wrong branch. I think, outside of pcall, scoping the bindings to the whole block means making more bugs with accidentally using nil possible, and I see little benefit since you can already write the following to have this scoping behavior.

do local success, msg = pcall(f)
    if success then
        print(`all went well`)
    else
        print(`f failed with: {msg}`)
    end
end

```

`Output: true`

When declaring multiple values inside of a condition, all of the variables will be evaluated.
TheGreatSageEqualToHeaven marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for clarification, how would the variables be combined when being tested? I misread the example at first, so I assumed all variables are combined with or, but it actually implies using and.

if local a, b = true, false then
  print("pass")
else
  print("fail")
end

-- same as this?
if local a, b = true, false where a and b then
  print("pass")
else
  print("fail")
end

I want to argue that this evaluation is ambiguous, since there is no explicit operation between the variables.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're both the same, it isn't ambiguous at all. Requiring that only one value is truthy removes a lot of the uses it brings and requires you to include checks to make sure your variables aren't false (or true).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've come to agree with @vegorov-rbx on making where clause required if more than two locals are involved in the initializer. Good example is found here: https://github.com/luau-lang/rfcs/pull/23/files#r1492770178.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think where clauses should just always be a requirement 😰

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I commented this elsewhere, but I realized this is pretty much a good point to say it in. The desirable behavior IMO is that any variable initialized in an if local statement will be tested to be non-nil. The direct argument would be that putting the initialization in a condition is an indication that you want it to be bound to some sort of non-nil value in the block following it. The forward-looking advantage of this is that it's consistent with how destructuring should behave, e.g. if we allow destructuring tables in local definitions (#24), you could write something like:

if local {.x, .y, .z} = point then
    -- x, y, z are defined and non-`nil`
end

The main downside is that it means the case where you specifically want to test a result for truthiness, rather than non-nilness will require you to write the condition clause, i.e.

if local cond = bool() then
    -- cond is not `nil`, but could be `true` or `false`
end

if local cond = bool() in cond then
    -- cond is not `nil`, and has been tested `true` by the condition clause
end

if local cond = bool() in not cond then
    -- cond is not `nil`, and has been tested `false` by the condition clause
end

I think this is a reasonable decision to make. Requiring in is a more conservative choice that is forwards-compatible with this evolution though.


Example:

```lua
local function foo()
return true, false
end

if local a,b = foo() then
else
print'Hello World, from Luau!'
end
```

`Output: Hello World, from Luau!`

# Drawbacks

Parser recovery may be more fragile due to the `local` keyword.

Initializers increase the complexity of the language syntax and may obscure control flow in complex conditions.

# Alternatives

A different keyword or token can be used in place of `where`.

Introducing a new contextual keyword can introduce complexity in the parser, and a different existing keyword like `in` or `do` could be used instead.

```lua
if local a, b = foo() in b > a then
print'Hello World, from Luau!'
end
```

While Luau is a verbose language that uses keywords for the majority of its syntax, another approach is using semicolons as a separator. This can work well because statements can use semicolons as a separator, which will retain consistency with the language. The same can be said for the comma, which would be consistent with for loop syntax.

```lua
if local a, b = foo(); b > a then
print'Hello World, from Luau!'
end
```