Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[compiler] Stay in SSA form through entire pipeline #30573

Merged
merged 28 commits into from
Aug 6, 2024

Conversation

josephsavona
Copy link
Contributor

@josephsavona josephsavona commented Aug 1, 2024

Stack from ghstack (oldest at bottom):

This PR updates to use SSA form through the entire compilation pipeline. This means that in both HIR form and ReactiveFunction form, Identifier instances map 1:1 to IdentifierId values. If two identifiers have the same IdentifierId, they are the same instance. What this means is that all our passes can use this more precise information to determine if two particular identifiers are not just the same variable, but the same SSA "version" of that variable.

However, some parts of our analysis really care about program variables as opposed to SSA versions, and were relying on LeaveSSA to reset identifiers such that all Identifier instances for a particular program variable would have the same IdentifierId (though not necessarily the same Identifier instance). With LeaveSSA removed, those analysis passes can now use DeclarationId instead to uniquely identify a program variable.

Note that this PR surfaces some opportunties to improve edge-cases around reassigned values being declared/reassigned/depended-upon across multiple scopes. Several passes could/should use IdentifierId to more precisely identify exactly which values are accessed - for example, a scope that reassigns x but doesn't use x prior to reassignment doesn't have to take a dependency on x. But today we take a dependnecy.

My approach for these cases was to add a "TODO LeaveSSA" comment with notes and the name of the fixture demonstrating the difference, but to intentionally preserve the existing behavior (generally, switching to use DeclarationId when IdentifierId would have been more precise).

Beyond updating passes to use DeclarationId instead of Identifier/IdentifierId, the other change here is to extract out the remaining necessary bits of LeaveSSA into a new pass that rewrites InstructionKind (const/let/reassign/etc) based on whether a value is actually const or has reassignments and should be let.

Copy link

vercel bot commented Aug 1, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
react-compiler-playground ✅ Ready (Inspect) Visit Preview 💬 Add feedback Aug 6, 2024 6:29pm

josephsavona added a commit that referenced this pull request Aug 1, 2024
ghstack-source-id: b3c84364193cc58d405edda235252336d0aefab5
Pull Request resolved: #30573
@facebook-github-bot facebook-github-bot added CLA Signed React Core Team Opened by a member of the React Core Team labels Aug 1, 2024
josephsavona added a commit that referenced this pull request Aug 1, 2024
ghstack-source-id: d15a711d164cac86e5c80148c759b259ce45a810
Pull Request resolved: #30573
josephsavona added a commit that referenced this pull request Aug 1, 2024
ghstack-source-id: b6ec82a461d95283dea3b040e81f100b29c7cc4e
Pull Request resolved: #30573
josephsavona added a commit that referenced this pull request Aug 1, 2024
ghstack-source-id: 9962038ebeeb650d5a07146308c9ee1cfb26853b
Pull Request resolved: #30573
josephsavona added a commit that referenced this pull request Aug 1, 2024
ghstack-source-id: 7197375fff0345a5b4d2520ef863488bb1dc1d15
Pull Request resolved: #30573
josephsavona added a commit that referenced this pull request Aug 1, 2024
ghstack-source-id: 1b653587414b4103f33f415ab6574e8dbedd90b1
Pull Request resolved: #30573
josephsavona added a commit that referenced this pull request Aug 1, 2024
ghstack-source-id: 27d08cc1ee0cd4459078e25415da0c54518759ee
Pull Request resolved: #30573
josephsavona added a commit that referenced this pull request Aug 1, 2024
ghstack-source-id: e54a9d44a89a11cd9d65939da17222d85be05712
Pull Request resolved: #30573
josephsavona added a commit that referenced this pull request Aug 1, 2024
ghstack-source-id: 1b4da810d5086a6b90d43effcfedb2ecbb15e573
Pull Request resolved: #30573
@josephsavona
Copy link
Contributor Author

Nice find, i'll add a fixture for that and fix

@mofeiZ
Copy link
Contributor

mofeiZ commented Aug 6, 2024

Compare the memoization in master and this pr.

I was curious so looked into this (I'm sure someone else probably has already found the bug). I think the issue is that we're not checking that operands are mutable when extending scope ranges (fbtScope.range.start = Math.min(...))

This PR updates to use SSA form through the entire compilation pipeline. This means that in both HIR form and ReactiveFunction form, `Identifier` instances map 1:1 to `IdentifierId` values. If two identifiers have the same IdentifierId, they are the same instance. What this means is that all our passes can use this more precise information to determine if two particular identifiers are not just the same variable, but the same SSA "version" of that variable.

However, some parts of our analysis really care about program variables as opposed to SSA versions, and were relying on LeaveSSA to reset identifiers such that all Identifier instances for a particular program variable would have the same IdentifierId (though not necessarily the same Identifier instance). With LeaveSSA removed, those analysis passes can now use DeclarationId instead to uniquely identify a program variable.

Note that this PR surfaces some opportunties to improve edge-cases around reassigned values being declared/reassigned/depended-upon across multiple scopes. Several passes could/should use IdentifierId to more precisely identify exactly which values are accessed - for example, a scope that reassigns `x` but doesn't use `x` prior to reassignment doesn't have to take a dependency on `x`. But today we take a dependnecy.

My approach for these cases was to add a "TODO LeaveSSA" comment with notes and the name of the fixture demonstrating the difference, but to intentionally preserve the existing behavior (generally, switching to use DeclarationId when IdentifierId would have been more precise).

Beyond updating passes to use DeclarationId instead of Identifier/IdentifierId, the other change here is to extract out the remaining necessary bits of LeaveSSA into a new pass that rewrites InstructionKind (const/let/reassign/etc) based on whether a value is actually const or has reassignments and should be let.

[ghstack-poisoned]
josephsavona added a commit that referenced this pull request Aug 6, 2024
This PR updates to use SSA form through the entire compilation pipeline. This means that in both HIR form and ReactiveFunction form, `Identifier` instances map 1:1 to `IdentifierId` values. If two identifiers have the same IdentifierId, they are the same instance. What this means is that all our passes can use this more precise information to determine if two particular identifiers are not just the same variable, but the same SSA "version" of that variable.

However, some parts of our analysis really care about program variables as opposed to SSA versions, and were relying on LeaveSSA to reset identifiers such that all Identifier instances for a particular program variable would have the same IdentifierId (though not necessarily the same Identifier instance). With LeaveSSA removed, those analysis passes can now use DeclarationId instead to uniquely identify a program variable.

Note that this PR surfaces some opportunties to improve edge-cases around reassigned values being declared/reassigned/depended-upon across multiple scopes. Several passes could/should use IdentifierId to more precisely identify exactly which values are accessed - for example, a scope that reassigns `x` but doesn't use `x` prior to reassignment doesn't have to take a dependency on `x`. But today we take a dependnecy.

My approach for these cases was to add a "TODO LeaveSSA" comment with notes and the name of the fixture demonstrating the difference, but to intentionally preserve the existing behavior (generally, switching to use DeclarationId when IdentifierId would have been more precise).

Beyond updating passes to use DeclarationId instead of Identifier/IdentifierId, the other change here is to extract out the remaining necessary bits of LeaveSSA into a new pass that rewrites InstructionKind (const/let/reassign/etc) based on whether a value is actually const or has reassignments and should be let.

ghstack-source-id: 4b9b1d5f83b4421f6a49ed3ccf5c46da01cfa5f6
Pull Request resolved: #30573
@josephsavona
Copy link
Contributor Author

I think the issue is that we're not checking that operands are mutable when extending scope ranges

Thanks @mofeiZ for saving me some time debugging, that was it! I updated that pass to use a new helper which only extends the range if the new range has a non-zero start

This PR updates to use SSA form through the entire compilation pipeline. This means that in both HIR form and ReactiveFunction form, `Identifier` instances map 1:1 to `IdentifierId` values. If two identifiers have the same IdentifierId, they are the same instance. What this means is that all our passes can use this more precise information to determine if two particular identifiers are not just the same variable, but the same SSA "version" of that variable.

However, some parts of our analysis really care about program variables as opposed to SSA versions, and were relying on LeaveSSA to reset identifiers such that all Identifier instances for a particular program variable would have the same IdentifierId (though not necessarily the same Identifier instance). With LeaveSSA removed, those analysis passes can now use DeclarationId instead to uniquely identify a program variable.

Note that this PR surfaces some opportunties to improve edge-cases around reassigned values being declared/reassigned/depended-upon across multiple scopes. Several passes could/should use IdentifierId to more precisely identify exactly which values are accessed - for example, a scope that reassigns `x` but doesn't use `x` prior to reassignment doesn't have to take a dependency on `x`. But today we take a dependnecy.

My approach for these cases was to add a "TODO LeaveSSA" comment with notes and the name of the fixture demonstrating the difference, but to intentionally preserve the existing behavior (generally, switching to use DeclarationId when IdentifierId would have been more precise).

Beyond updating passes to use DeclarationId instead of Identifier/IdentifierId, the other change here is to extract out the remaining necessary bits of LeaveSSA into a new pass that rewrites InstructionKind (const/let/reassign/etc) based on whether a value is actually const or has reassignments and should be let.

[ghstack-poisoned]
josephsavona added a commit that referenced this pull request Aug 6, 2024
This PR updates to use SSA form through the entire compilation pipeline. This means that in both HIR form and ReactiveFunction form, `Identifier` instances map 1:1 to `IdentifierId` values. If two identifiers have the same IdentifierId, they are the same instance. What this means is that all our passes can use this more precise information to determine if two particular identifiers are not just the same variable, but the same SSA "version" of that variable.

However, some parts of our analysis really care about program variables as opposed to SSA versions, and were relying on LeaveSSA to reset identifiers such that all Identifier instances for a particular program variable would have the same IdentifierId (though not necessarily the same Identifier instance). With LeaveSSA removed, those analysis passes can now use DeclarationId instead to uniquely identify a program variable.

Note that this PR surfaces some opportunties to improve edge-cases around reassigned values being declared/reassigned/depended-upon across multiple scopes. Several passes could/should use IdentifierId to more precisely identify exactly which values are accessed - for example, a scope that reassigns `x` but doesn't use `x` prior to reassignment doesn't have to take a dependency on `x`. But today we take a dependnecy.

My approach for these cases was to add a "TODO LeaveSSA" comment with notes and the name of the fixture demonstrating the difference, but to intentionally preserve the existing behavior (generally, switching to use DeclarationId when IdentifierId would have been more precise).

Beyond updating passes to use DeclarationId instead of Identifier/IdentifierId, the other change here is to extract out the remaining necessary bits of LeaveSSA into a new pass that rewrites InstructionKind (const/let/reassign/etc) based on whether a value is actually const or has reassignments and should be let.

ghstack-source-id: 69afdaee5fadf3fdc98ce97549da805f288218b4
Pull Request resolved: #30573
@josephsavona
Copy link
Contributor Author

Latest diff is P1515916566, looks like the fbt issues are fixed, i went through the majority of the remaining differences manually. They are nearly all due to ternaries or optionals. There were a couple where deps changed and i debugged them and confirmed that the previous results were unnecessarily including a dependency that wasn't actually reactive.

@josephsavona josephsavona merged commit 512d6f4 into gh/josephsavona/34/base Aug 6, 2024
19 checks passed
josephsavona added a commit that referenced this pull request Aug 6, 2024
This PR updates to use SSA form through the entire compilation pipeline. This means that in both HIR form and ReactiveFunction form, `Identifier` instances map 1:1 to `IdentifierId` values. If two identifiers have the same IdentifierId, they are the same instance. What this means is that all our passes can use this more precise information to determine if two particular identifiers are not just the same variable, but the same SSA "version" of that variable.

However, some parts of our analysis really care about program variables as opposed to SSA versions, and were relying on LeaveSSA to reset identifiers such that all Identifier instances for a particular program variable would have the same IdentifierId (though not necessarily the same Identifier instance). With LeaveSSA removed, those analysis passes can now use DeclarationId instead to uniquely identify a program variable.

Note that this PR surfaces some opportunties to improve edge-cases around reassigned values being declared/reassigned/depended-upon across multiple scopes. Several passes could/should use IdentifierId to more precisely identify exactly which values are accessed - for example, a scope that reassigns `x` but doesn't use `x` prior to reassignment doesn't have to take a dependency on `x`. But today we take a dependnecy.

My approach for these cases was to add a "TODO LeaveSSA" comment with notes and the name of the fixture demonstrating the difference, but to intentionally preserve the existing behavior (generally, switching to use DeclarationId when IdentifierId would have been more precise).

Beyond updating passes to use DeclarationId instead of Identifier/IdentifierId, the other change here is to extract out the remaining necessary bits of LeaveSSA into a new pass that rewrites InstructionKind (const/let/reassign/etc) based on whether a value is actually const or has reassignments and should be let.

ghstack-source-id: 69afdaee5fadf3fdc98ce97549da805f288218b4
Pull Request resolved: #30573
@josephsavona josephsavona deleted the gh/josephsavona/34/head branch August 6, 2024 19:03
josephsavona added a commit that referenced this pull request Aug 8, 2024
I forgot to clean this up before landing #30573.

[ghstack-poisoned]
josephsavona added a commit that referenced this pull request Aug 8, 2024
I forgot to clean this up before landing #30573.

ghstack-source-id: 2141471912e410aa12545dcf7989f45447007ba9
Pull Request resolved: #30632
josephsavona added a commit that referenced this pull request Aug 9, 2024
I forgot to clean this up before landing #30573.

ghstack-source-id: 2141471912e410aa12545dcf7989f45447007ba9
Pull Request resolved: #30632
mofeiZ added a commit that referenced this pull request Sep 6, 2024
Resubmission of #30079 -- core logic unchanged, but needed to rebase past #30573

### Quick background
#### Temporaries

The compiler currently treats temporaries and named variables (e.g. `x`) differently in this pass.
- named variables may be reassigned (in fact, since we're running after LeaveSSA, a single named identifier's IdentifierId may map to multiple `Identifier` instances -- each with its own scope and mutable range)
- temporaries are replaced with their represented expressions during codegen. This is correct (mostly correct, see #29878) as we're careful to always lower the correct evaluation semantics. However, since we rewrite reactive scopes entirely (to if/else blocks), we need to track temporaries that a scope produces in `ReactiveScope.declarations` and later promote them to named variables.
In the same example, $4, $5, and $6 need to be promoted: $2 ->`t0`,  $5 ->`t1`, and $6 ->`t2`.
```js
[1] $2 = LoadGlobal(global) foo
[2] $3 = LoadLocal bar$1
scope 0:
  [3] $4 = Call $2(<unknown> $3)
scope 1:
  [4] $5 = Object {  }
scope 2:
  [5] $6 = Object { a: $4, b: $5 }
[6] $8 = StoreLocal Const x$7 = $6
```

#### Dependencies
`ReactiveScope.dependencies` records the set of (read-only) values that a reactive scope is dependent on. This is currently limited to just variables (named variables from source and promoted temporaries) and property-loads.
All dependencies we record need to be hoistable -- i.e. reordered to just before the ReactiveScope begins. Not all PropertyLoads are hoistable.

In this example, we should not evaluate `obj.a.b` without before creating x and checking `objIsNull`.
```js
// reduce-reactive-deps/no-uncond.js
function useFoo({ obj, objIsNull }) {
  const x = [];
  if (isFalse(objIsNull)) {
    x.push(obj.a.b);
  }
  return x;
}
```

While other memoization strategies with different constraints exist, the current compiler requires that `ReactiveScope.dependencies` be re-orderable to the beginning of the reactive scope. But.. `PropertyLoad`s from null values will throw `TypeError`. This means that evaluating hoisted dependencies should throw if and only if the source program throws. (It is also a bug if source throws and compiler output does not throw. See facebook/react-forget#2709)

---
### Rough high level overview
1. Pass 1
Walk over instructions to gather every temporary used outside of its defining scope (same as ReactiveFunction version). These determine the sidemaps we produce, as temporaries used outside of their declaring scopes get promoted to named variables later (and are not considered hoistable rvals).
2. Pass 2 (collectHoistablePropertyLoads)
Walk over instructions to generate sidemaps:
  a. temporary identifier -> named variable and property path (e.g. `$3 -> {obj: props, path: ["a", "b"]}`)
  b. block -> accessed variables and properties (e.g. `bb0 -> [ {obj: props, path: ["a", "b"]} ]`)
  c. Walk over control flow graph to understand the set of object and property paths that can be read by each basic block. This analysis:
    - relies on post-dominator trees
    - traverses the CFG from entry (producing the set of variables/paths unconditionally evaluated *before* a block).
    - traverses the CFG from exit (producing the set of variables/paths unconditionally evaluated *after* a block).
4. Pass 3: (collectDependencies)
Walks over instructions again to record dependencies and declarations, using the previously produced sidemaps. We do not record any control-flow here
5. Merge every scope's recorded dependencies with the set of hoistable PropertyLoads

Tested by syncing internally and (1) checking compilation output differences ([internal link](https://www.internalfb.com/intern/everpaste/?handle=GPCfUBt_HCoy_S4EAJDVFJyJJMR0bsIXAAAB)), running internally e2e tests ([internal link](https://fburl.com/sandcastle/cs5mlkxq))

---
### Followups:
1. Rewrite function expression deps
This change produces much more optimal output as the compiler now uses the function CFG to understand which variables / paths are assumed to be non-null. However, it may exacerbate [this function-expr hoisting bug](https://github.com/facebook/react/blob/main/compiler/packages/babel-plugin-react-compiler/src/__tests__/fixtures/compiler/bug-invalid-hoisting-functionexpr.ts). A short term fix here is to simply call some form of `collectNonNullObjects` on every function expression to find hoistable variable / paths. In the longer term, we should refactor out `FunctionExpression.deps`.

2. Enable optional paths
(a) don't count optional load temporaries as dependencies (e.g. `collectOptionalLoadRValues(...)`).
(b) record optional paths in both collectHoistablePropertyLoads and dependency collection

ghstack-source-id: 51e0759c68c931c19fee02665bd37a7274a45572
Pull Request resolved: #30894
mofeiZ added a commit that referenced this pull request Sep 11, 2024
Resubmission of #30079 -- core logic unchanged, but needed to rebase past #30573

### Quick background
#### Temporaries

The compiler currently treats temporaries and named variables (e.g. `x`) differently in this pass.
- named variables may be reassigned (in fact, since we're running after LeaveSSA, a single named identifier's IdentifierId may map to multiple `Identifier` instances -- each with its own scope and mutable range)
- temporaries are replaced with their represented expressions during codegen. This is correct (mostly correct, see #29878) as we're careful to always lower the correct evaluation semantics. However, since we rewrite reactive scopes entirely (to if/else blocks), we need to track temporaries that a scope produces in `ReactiveScope.declarations` and later promote them to named variables.
In the same example, $4, $5, and $6 need to be promoted: $2 ->`t0`,  $5 ->`t1`, and $6 ->`t2`.
```js
[1] $2 = LoadGlobal(global) foo
[2] $3 = LoadLocal bar$1
scope 0:
  [3] $4 = Call $2(<unknown> $3)
scope 1:
  [4] $5 = Object {  }
scope 2:
  [5] $6 = Object { a: $4, b: $5 }
[6] $8 = StoreLocal Const x$7 = $6
```

#### Dependencies
`ReactiveScope.dependencies` records the set of (read-only) values that a reactive scope is dependent on. This is currently limited to just variables (named variables from source and promoted temporaries) and property-loads.
All dependencies we record need to be hoistable -- i.e. reordered to just before the ReactiveScope begins. Not all PropertyLoads are hoistable.

In this example, we should not evaluate `obj.a.b` without before creating x and checking `objIsNull`.
```js
// reduce-reactive-deps/no-uncond.js
function useFoo({ obj, objIsNull }) {
  const x = [];
  if (isFalse(objIsNull)) {
    x.push(obj.a.b);
  }
  return x;
}
```

While other memoization strategies with different constraints exist, the current compiler requires that `ReactiveScope.dependencies` be re-orderable to the beginning of the reactive scope. But.. `PropertyLoad`s from null values will throw `TypeError`. This means that evaluating hoisted dependencies should throw if and only if the source program throws. (It is also a bug if source throws and compiler output does not throw. See facebook/react-forget#2709)

---
### Rough high level overview
1. Pass 1
Walk over instructions to gather every temporary used outside of its defining scope (same as ReactiveFunction version). These determine the sidemaps we produce, as temporaries used outside of their declaring scopes get promoted to named variables later (and are not considered hoistable rvals).
2. Pass 2 (collectTemporariesSidemap)
Walk over instructions to generate a sidemap of temporary identifier -> named variable and property path (e.g. `$3 -> {obj: props, path: ["a", "b"]}`)
2. Pass 2 (collectHoistablePropertyLoads)
  a. Build a sidemap of block -> accessed variables and properties (e.g. `bb0 -> [ {obj: props, path: ["a", "b"]} ]`)
  b. Propagate "non-nullness" i.e. variables and properties for which we can safely evaluate `PropertyLoad`.
  A basic block can unconditionally read from identifier X if any of the following applies:
    - the block itself reads from identifier X
    - all predecessors of the block read from identifier X
    - all successors of the block read from identifier X
4. Pass 3: (collectDependencies)
Walks over instructions again to record dependencies and declarations, using the previously produced sidemaps. We do not record any control-flow here
5. Merge every scope's recorded dependencies with the set of hoistable PropertyLoads

Tested by syncing internally and (1) checking compilation output differences ([internal link](https://www.internalfb.com/intern/everpaste/?handle=GPCfUBt_HCoy_S4EAJDVFJyJJMR0bsIXAAAB)), running internally e2e tests ([internal link](https://fburl.com/sandcastle/cs5mlkxq))

---
### Followups:
1. Rewrite function expression deps
This change produces much more optimal output as the compiler now uses the function CFG to understand which variables / paths are assumed to be non-null. However, it may exacerbate [this function-expr hoisting bug](https://github.com/facebook/react/blob/main/compiler/packages/babel-plugin-react-compiler/src/__tests__/fixtures/compiler/bug-invalid-hoisting-functionexpr.tsx). A short term fix here is to simply call some form of `collectNonNullObjects` on every function expression to find hoistable variable / paths. In the longer term, we should refactor out `FunctionExpression.deps`.

2. Enable optional paths
(a) don't count optional load temporaries as dependencies (e.g. `collectOptionalLoadRValues(...)`).
(b) record optional paths in both collectHoistablePropertyLoads and dependency collection

ghstack-source-id: 470ab175a901772567139bd0bb93b3e3c71c9dc7
Pull Request resolved: #30894
mofeiZ added a commit that referenced this pull request Sep 11, 2024
Resubmission of #30079 -- core logic unchanged, but needed to rebase past #30573

### Quick background
#### Temporaries

The compiler currently treats temporaries and named variables (e.g. `x`) differently in this pass.
- named variables may be reassigned (in fact, since we're running after LeaveSSA, a single named identifier's IdentifierId may map to multiple `Identifier` instances -- each with its own scope and mutable range)
- temporaries are replaced with their represented expressions during codegen. This is correct (mostly correct, see #29878) as we're careful to always lower the correct evaluation semantics. However, since we rewrite reactive scopes entirely (to if/else blocks), we need to track temporaries that a scope produces in `ReactiveScope.declarations` and later promote them to named variables.
In the same example, $4, $5, and $6 need to be promoted: $2 ->`t0`,  $5 ->`t1`, and $6 ->`t2`.
```js
[1] $2 = LoadGlobal(global) foo
[2] $3 = LoadLocal bar$1
scope 0:
  [3] $4 = Call $2(<unknown> $3)
scope 1:
  [4] $5 = Object {  }
scope 2:
  [5] $6 = Object { a: $4, b: $5 }
[6] $8 = StoreLocal Const x$7 = $6
```

#### Dependencies
`ReactiveScope.dependencies` records the set of (read-only) values that a reactive scope is dependent on. This is currently limited to just variables (named variables from source and promoted temporaries) and property-loads.
All dependencies we record need to be hoistable -- i.e. reordered to just before the ReactiveScope begins. Not all PropertyLoads are hoistable.

In this example, we should not evaluate `obj.a.b` without before creating x and checking `objIsNull`.
```js
// reduce-reactive-deps/no-uncond.js
function useFoo({ obj, objIsNull }) {
  const x = [];
  if (isFalse(objIsNull)) {
    x.push(obj.a.b);
  }
  return x;
}
```

While other memoization strategies with different constraints exist, the current compiler requires that `ReactiveScope.dependencies` be re-orderable to the beginning of the reactive scope. But.. `PropertyLoad`s from null values will throw `TypeError`. This means that evaluating hoisted dependencies should throw if and only if the source program throws. (It is also a bug if source throws and compiler output does not throw. See facebook/react-forget#2709)

---
### Rough high level overview
1. Pass 1
Walk over instructions to gather every temporary used outside of its defining scope (same as ReactiveFunction version). These determine the sidemaps we produce, as temporaries used outside of their declaring scopes get promoted to named variables later (and are not considered hoistable rvals).
2. Pass 2 (collectTemporariesSidemap)
Walk over instructions to generate a sidemap of temporary identifier -> named variable and property path (e.g. `$3 -> {obj: props, path: ["a", "b"]}`)
2. Pass 2 (collectHoistablePropertyLoads)
  a. Build a sidemap of block -> accessed variables and properties (e.g. `bb0 -> [ {obj: props, path: ["a", "b"]} ]`)
  b. Propagate "non-nullness" i.e. variables and properties for which we can safely evaluate `PropertyLoad`.
  A basic block can unconditionally read from identifier X if any of the following applies:
    - the block itself reads from identifier X
    - all predecessors of the block read from identifier X
    - all successors of the block read from identifier X
4. Pass 3: (collectDependencies)
Walks over instructions again to record dependencies and declarations, using the previously produced sidemaps. We do not record any control-flow here
5. Merge every scope's recorded dependencies with the set of hoistable PropertyLoads

Tested by syncing internally and (1) checking compilation output differences ([internal link](https://www.internalfb.com/intern/everpaste/?handle=GPCfUBt_HCoy_S4EAJDVFJyJJMR0bsIXAAAB)), running internally e2e tests ([internal link](https://fburl.com/sandcastle/cs5mlkxq))

---
### Followups:
1. Rewrite function expression deps
This change produces much more optimal output as the compiler now uses the function CFG to understand which variables / paths are assumed to be non-null. However, it may exacerbate [this function-expr hoisting bug](https://github.com/facebook/react/blob/main/compiler/packages/babel-plugin-react-compiler/src/__tests__/fixtures/compiler/bug-invalid-hoisting-functionexpr.tsx). A short term fix here is to simply call some form of `collectNonNullObjects` on every function expression to find hoistable variable / paths. In the longer term, we should refactor out `FunctionExpression.deps`.

2. Enable optional paths
(a) don't count optional load temporaries as dependencies (e.g. `collectOptionalLoadRValues(...)`).
(b) record optional paths in both collectHoistablePropertyLoads and dependency collection

ghstack-source-id: e1fe5d80076df15c5277c6761138f45f267e11dd
Pull Request resolved: #30894
mofeiZ added a commit that referenced this pull request Sep 11, 2024
Resubmission of #30079 -- core logic unchanged, but needed to rebase past #30573

### Quick background
#### Temporaries

The compiler currently treats temporaries and named variables (e.g. `x`) differently in this pass.
- named variables may be reassigned (in fact, since we're running after LeaveSSA, a single named identifier's IdentifierId may map to multiple `Identifier` instances -- each with its own scope and mutable range)
- temporaries are replaced with their represented expressions during codegen. This is correct (mostly correct, see #29878) as we're careful to always lower the correct evaluation semantics. However, since we rewrite reactive scopes entirely (to if/else blocks), we need to track temporaries that a scope produces in `ReactiveScope.declarations` and later promote them to named variables.
In the same example, $4, $5, and $6 need to be promoted: $2 ->`t0`,  $5 ->`t1`, and $6 ->`t2`.
```js
[1] $2 = LoadGlobal(global) foo
[2] $3 = LoadLocal bar$1
scope 0:
  [3] $4 = Call $2(<unknown> $3)
scope 1:
  [4] $5 = Object {  }
scope 2:
  [5] $6 = Object { a: $4, b: $5 }
[6] $8 = StoreLocal Const x$7 = $6
```

#### Dependencies
`ReactiveScope.dependencies` records the set of (read-only) values that a reactive scope is dependent on. This is currently limited to just variables (named variables from source and promoted temporaries) and property-loads.
All dependencies we record need to be hoistable -- i.e. reordered to just before the ReactiveScope begins. Not all PropertyLoads are hoistable.

In this example, we should not evaluate `obj.a.b` without before creating x and checking `objIsNull`.
```js
// reduce-reactive-deps/no-uncond.js
function useFoo({ obj, objIsNull }) {
  const x = [];
  if (isFalse(objIsNull)) {
    x.push(obj.a.b);
  }
  return x;
}
```

While other memoization strategies with different constraints exist, the current compiler requires that `ReactiveScope.dependencies` be re-orderable to the beginning of the reactive scope. But.. `PropertyLoad`s from null values will throw `TypeError`. This means that evaluating hoisted dependencies should throw if and only if the source program throws. (It is also a bug if source throws and compiler output does not throw. See facebook/react-forget#2709)

---
### Rough high level overview
1. Pass 1
Walk over instructions to gather every temporary used outside of its defining scope (same as ReactiveFunction version). These determine the sidemaps we produce, as temporaries used outside of their declaring scopes get promoted to named variables later (and are not considered hoistable rvals).
2. Pass 2 (collectTemporariesSidemap)
Walk over instructions to generate a sidemap of temporary identifier -> named variable and property path (e.g. `$3 -> {obj: props, path: ["a", "b"]}`)
2. Pass 2 (collectHoistablePropertyLoads)
  a. Build a sidemap of block -> accessed variables and properties (e.g. `bb0 -> [ {obj: props, path: ["a", "b"]} ]`)
  b. Propagate "non-nullness" i.e. variables and properties for which we can safely evaluate `PropertyLoad`.
  A basic block can unconditionally read from identifier X if any of the following applies:
    - the block itself reads from identifier X
    - all predecessors of the block read from identifier X
    - all successors of the block read from identifier X
4. Pass 3: (collectDependencies)
Walks over instructions again to record dependencies and declarations, using the previously produced sidemaps. We do not record any control-flow here
5. Merge every scope's recorded dependencies with the set of hoistable PropertyLoads

Tested by syncing internally and (1) checking compilation output differences ([internal link](https://www.internalfb.com/intern/everpaste/?handle=GPCfUBt_HCoy_S4EAJDVFJyJJMR0bsIXAAAB)), running internally e2e tests ([internal link](https://fburl.com/sandcastle/cs5mlkxq))

---
### Followups:
1. Rewrite function expression deps
This change produces much more optimal output as the compiler now uses the function CFG to understand which variables / paths are assumed to be non-null. However, it may exacerbate [this function-expr hoisting bug](https://github.com/facebook/react/blob/main/compiler/packages/babel-plugin-react-compiler/src/__tests__/fixtures/compiler/bug-invalid-hoisting-functionexpr.tsx). A short term fix here is to simply call some form of `collectNonNullObjects` on every function expression to find hoistable variable / paths. In the longer term, we should refactor out `FunctionExpression.deps`.

2. Enable optional paths
(a) don't count optional load temporaries as dependencies (e.g. `collectOptionalLoadRValues(...)`).
(b) record optional paths in both collectHoistablePropertyLoads and dependency collection

ghstack-source-id: e1fe5d80076df15c5277c6761138f45f267e11dd
Pull Request resolved: #30894
mofeiZ added a commit that referenced this pull request Sep 11, 2024
Resubmission of #30079 -- core logic unchanged, but needed to rebase past #30573

### Quick background
#### Temporaries

The compiler currently treats temporaries and named variables (e.g. `x`) differently in this pass.
- named variables may be reassigned (in fact, since we're running after LeaveSSA, a single named identifier's IdentifierId may map to multiple `Identifier` instances -- each with its own scope and mutable range)
- temporaries are replaced with their represented expressions during codegen. This is correct (mostly correct, see #29878) as we're careful to always lower the correct evaluation semantics. However, since we rewrite reactive scopes entirely (to if/else blocks), we need to track temporaries that a scope produces in `ReactiveScope.declarations` and later promote them to named variables.
In the same example, $4, $5, and $6 need to be promoted: $2 ->`t0`,  $5 ->`t1`, and $6 ->`t2`.
```js
[1] $2 = LoadGlobal(global) foo
[2] $3 = LoadLocal bar$1
scope 0:
  [3] $4 = Call $2(<unknown> $3)
scope 1:
  [4] $5 = Object {  }
scope 2:
  [5] $6 = Object { a: $4, b: $5 }
[6] $8 = StoreLocal Const x$7 = $6
```

#### Dependencies
`ReactiveScope.dependencies` records the set of (read-only) values that a reactive scope is dependent on. This is currently limited to just variables (named variables from source and promoted temporaries) and property-loads.
All dependencies we record need to be hoistable -- i.e. reordered to just before the ReactiveScope begins. Not all PropertyLoads are hoistable.

In this example, we should not evaluate `obj.a.b` without before creating x and checking `objIsNull`.
```js
// reduce-reactive-deps/no-uncond.js
function useFoo({ obj, objIsNull }) {
  const x = [];
  if (isFalse(objIsNull)) {
    x.push(obj.a.b);
  }
  return x;
}
```

While other memoization strategies with different constraints exist, the current compiler requires that `ReactiveScope.dependencies` be re-orderable to the beginning of the reactive scope. But.. `PropertyLoad`s from null values will throw `TypeError`. This means that evaluating hoisted dependencies should throw if and only if the source program throws. (It is also a bug if source throws and compiler output does not throw. See facebook/react-forget#2709)

---
### Rough high level overview
1. Pass 1
Walk over instructions to gather every temporary used outside of its defining scope (same as ReactiveFunction version). These determine the sidemaps we produce, as temporaries used outside of their declaring scopes get promoted to named variables later (and are not considered hoistable rvals).
2. Pass 2 (collectTemporariesSidemap)
Walk over instructions to generate a sidemap of temporary identifier -> named variable and property path (e.g. `$3 -> {obj: props, path: ["a", "b"]}`)
2. Pass 2 (collectHoistablePropertyLoads)
  a. Build a sidemap of block -> accessed variables and properties (e.g. `bb0 -> [ {obj: props, path: ["a", "b"]} ]`)
  b. Propagate "non-nullness" i.e. variables and properties for which we can safely evaluate `PropertyLoad`.
  A basic block can unconditionally read from identifier X if any of the following applies:
    - the block itself reads from identifier X
    - all predecessors of the block read from identifier X
    - all successors of the block read from identifier X
4. Pass 3: (collectDependencies)
Walks over instructions again to record dependencies and declarations, using the previously produced sidemaps. We do not record any control-flow here
5. Merge every scope's recorded dependencies with the set of hoistable PropertyLoads

Tested by syncing internally and (1) checking compilation output differences ([internal link](https://www.internalfb.com/intern/everpaste/?handle=GPCfUBt_HCoy_S4EAJDVFJyJJMR0bsIXAAAB)), running internally e2e tests ([internal link](https://fburl.com/sandcastle/cs5mlkxq))

---
### Followups:
1. Rewrite function expression deps
This change produces much more optimal output as the compiler now uses the function CFG to understand which variables / paths are assumed to be non-null. However, it may exacerbate [this function-expr hoisting bug](https://github.com/facebook/react/blob/main/compiler/packages/babel-plugin-react-compiler/src/__tests__/fixtures/compiler/bug-invalid-hoisting-functionexpr.tsx). A short term fix here is to simply call some form of `collectNonNullObjects` on every function expression to find hoistable variable / paths. In the longer term, we should refactor out `FunctionExpression.deps`.

2. Enable optional paths
(a) don't count optional load temporaries as dependencies (e.g. `collectOptionalLoadRValues(...)`).
(b) record optional paths in both collectHoistablePropertyLoads and dependency collection

ghstack-source-id: 6c62db14213284bc61869b8db57198546c399f7f
Pull Request resolved: #30894
mofeiZ added a commit that referenced this pull request Sep 12, 2024
Resubmission of #30079 -- core logic unchanged, but needed to rebase past #30573

### Quick background
#### Temporaries

The compiler currently treats temporaries and named variables (e.g. `x`) differently in this pass.
- named variables may be reassigned (in fact, since we're running after LeaveSSA, a single named identifier's IdentifierId may map to multiple `Identifier` instances -- each with its own scope and mutable range)
- temporaries are replaced with their represented expressions during codegen. This is correct (mostly correct, see #29878) as we're careful to always lower the correct evaluation semantics. However, since we rewrite reactive scopes entirely (to if/else blocks), we need to track temporaries that a scope produces in `ReactiveScope.declarations` and later promote them to named variables.
In the same example, $4, $5, and $6 need to be promoted: $2 ->`t0`,  $5 ->`t1`, and $6 ->`t2`.
```js
[1] $2 = LoadGlobal(global) foo
[2] $3 = LoadLocal bar$1
scope 0:
  [3] $4 = Call $2(<unknown> $3)
scope 1:
  [4] $5 = Object {  }
scope 2:
  [5] $6 = Object { a: $4, b: $5 }
[6] $8 = StoreLocal Const x$7 = $6
```

#### Dependencies
`ReactiveScope.dependencies` records the set of (read-only) values that a reactive scope is dependent on. This is currently limited to just variables (named variables from source and promoted temporaries) and property-loads.
All dependencies we record need to be hoistable -- i.e. reordered to just before the ReactiveScope begins. Not all PropertyLoads are hoistable.

In this example, we should not evaluate `obj.a.b` without before creating x and checking `objIsNull`.
```js
// reduce-reactive-deps/no-uncond.js
function useFoo({ obj, objIsNull }) {
  const x = [];
  if (isFalse(objIsNull)) {
    x.push(obj.a.b);
  }
  return x;
}
```

While other memoization strategies with different constraints exist, the current compiler requires that `ReactiveScope.dependencies` be re-orderable to the beginning of the reactive scope. But.. `PropertyLoad`s from null values will throw `TypeError`. This means that evaluating hoisted dependencies should throw if and only if the source program throws. (It is also a bug if source throws and compiler output does not throw. See facebook/react-forget#2709)

---
### Rough high level overview
1. Pass 1
Walk over instructions to gather every temporary used outside of its defining scope (same as ReactiveFunction version). These determine the sidemaps we produce, as temporaries used outside of their declaring scopes get promoted to named variables later (and are not considered hoistable rvals).
2. Pass 2 (collectTemporariesSidemap)
Walk over instructions to generate a sidemap of temporary identifier -> named variable and property path (e.g. `$3 -> {obj: props, path: ["a", "b"]}`)
2. Pass 2 (collectHoistablePropertyLoads)
  a. Build a sidemap of block -> accessed variables and properties (e.g. `bb0 -> [ {obj: props, path: ["a", "b"]} ]`)
  b. Propagate "non-nullness" i.e. variables and properties for which we can safely evaluate `PropertyLoad`.
  A basic block can unconditionally read from identifier X if any of the following applies:
    - the block itself reads from identifier X
    - all predecessors of the block read from identifier X
    - all successors of the block read from identifier X
4. Pass 3: (collectDependencies)
Walks over instructions again to record dependencies and declarations, using the previously produced sidemaps. We do not record any control-flow here
5. Merge every scope's recorded dependencies with the set of hoistable PropertyLoads

Tested by syncing internally and (1) checking compilation output differences ([internal link](https://www.internalfb.com/intern/everpaste/?handle=GPCfUBt_HCoy_S4EAJDVFJyJJMR0bsIXAAAB)), running internally e2e tests ([internal link](https://fburl.com/sandcastle/cs5mlkxq))

---
### Followups:
1. Rewrite function expression deps
This change produces much more optimal output as the compiler now uses the function CFG to understand which variables / paths are assumed to be non-null. However, it may exacerbate [this function-expr hoisting bug](https://github.com/facebook/react/blob/main/compiler/packages/babel-plugin-react-compiler/src/__tests__/fixtures/compiler/bug-invalid-hoisting-functionexpr.tsx). A short term fix here is to simply call some form of `collectNonNullObjects` on every function expression to find hoistable variable / paths. In the longer term, we should refactor out `FunctionExpression.deps`.

2. Enable optional paths
(a) don't count optional load temporaries as dependencies (e.g. `collectOptionalLoadRValues(...)`).
(b) record optional paths in both collectHoistablePropertyLoads and dependency collection

ghstack-source-id: 2507f6ea751dce09ad1dccd353ae6fc7cf411582
Pull Request resolved: #30894
mofeiZ added a commit that referenced this pull request Sep 12, 2024
Resubmission of #30079 -- core logic unchanged, but needed to rebase past #30573

### Quick background
#### Temporaries

The compiler currently treats temporaries and named variables (e.g. `x`) differently in this pass.
- named variables may be reassigned (in fact, since we're running after LeaveSSA, a single named identifier's IdentifierId may map to multiple `Identifier` instances -- each with its own scope and mutable range)
- temporaries are replaced with their represented expressions during codegen. This is correct (mostly correct, see #29878) as we're careful to always lower the correct evaluation semantics. However, since we rewrite reactive scopes entirely (to if/else blocks), we need to track temporaries that a scope produces in `ReactiveScope.declarations` and later promote them to named variables.
In the same example, $4, $5, and $6 need to be promoted: $2 ->`t0`,  $5 ->`t1`, and $6 ->`t2`.
```js
[1] $2 = LoadGlobal(global) foo
[2] $3 = LoadLocal bar$1
scope 0:
  [3] $4 = Call $2(<unknown> $3)
scope 1:
  [4] $5 = Object {  }
scope 2:
  [5] $6 = Object { a: $4, b: $5 }
[6] $8 = StoreLocal Const x$7 = $6
```

#### Dependencies
`ReactiveScope.dependencies` records the set of (read-only) values that a reactive scope is dependent on. This is currently limited to just variables (named variables from source and promoted temporaries) and property-loads.
All dependencies we record need to be hoistable -- i.e. reordered to just before the ReactiveScope begins. Not all PropertyLoads are hoistable.

In this example, we should not evaluate `obj.a.b` without before creating x and checking `objIsNull`.
```js
// reduce-reactive-deps/no-uncond.js
function useFoo({ obj, objIsNull }) {
  const x = [];
  if (isFalse(objIsNull)) {
    x.push(obj.a.b);
  }
  return x;
}
```

While other memoization strategies with different constraints exist, the current compiler requires that `ReactiveScope.dependencies` be re-orderable to the beginning of the reactive scope. But.. `PropertyLoad`s from null values will throw `TypeError`. This means that evaluating hoisted dependencies should throw if and only if the source program throws. (It is also a bug if source throws and compiler output does not throw. See facebook/react-forget#2709)

---
### Rough high level overview
1. Pass 1
Walk over instructions to gather every temporary used outside of its defining scope (same as ReactiveFunction version). These determine the sidemaps we produce, as temporaries used outside of their declaring scopes get promoted to named variables later (and are not considered hoistable rvals).
2. Pass 2 (collectTemporariesSidemap)
Walk over instructions to generate a sidemap of temporary identifier -> named variable and property path (e.g. `$3 -> {obj: props, path: ["a", "b"]}`)
2. Pass 2 (collectHoistablePropertyLoads)
  a. Build a sidemap of block -> accessed variables and properties (e.g. `bb0 -> [ {obj: props, path: ["a", "b"]} ]`)
  b. Propagate "non-nullness" i.e. variables and properties for which we can safely evaluate `PropertyLoad`.
  A basic block can unconditionally read from identifier X if any of the following applies:
    - the block itself reads from identifier X
    - all predecessors of the block read from identifier X
    - all successors of the block read from identifier X
4. Pass 3: (collectDependencies)
Walks over instructions again to record dependencies and declarations, using the previously produced sidemaps. We do not record any control-flow here
5. Merge every scope's recorded dependencies with the set of hoistable PropertyLoads

Tested by syncing internally and (1) checking compilation output differences ([internal link](https://www.internalfb.com/intern/everpaste/?handle=GPCfUBt_HCoy_S4EAJDVFJyJJMR0bsIXAAAB)), running internally e2e tests ([internal link](https://fburl.com/sandcastle/cs5mlkxq))

---
### Followups:
1. Rewrite function expression deps
This change produces much more optimal output as the compiler now uses the function CFG to understand which variables / paths are assumed to be non-null. However, it may exacerbate [this function-expr hoisting bug](https://github.com/facebook/react/blob/main/compiler/packages/babel-plugin-react-compiler/src/__tests__/fixtures/compiler/bug-invalid-hoisting-functionexpr.tsx). A short term fix here is to simply call some form of `collectNonNullObjects` on every function expression to find hoistable variable / paths. In the longer term, we should refactor out `FunctionExpression.deps`.

2. Enable optional paths
(a) don't count optional load temporaries as dependencies (e.g. `collectOptionalLoadRValues(...)`).
(b) record optional paths in both collectHoistablePropertyLoads and dependency collection

ghstack-source-id: 2507f6ea751dce09ad1dccd353ae6fc7cf411582
Pull Request resolved: #30894
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed React Core Team Opened by a member of the React Core Team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants