Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

27.0-rc1 compiler takes ~8x longer on single large module #8140

Closed
okeuday opened this issue Feb 16, 2024 · 4 comments · Fixed by #8162
Closed

27.0-rc1 compiler takes ~8x longer on single large module #8140

okeuday opened this issue Feb 16, 2024 · 4 comments · Fixed by #8162
Assignees
Labels
bug Issue is reported as a bug team:VM Assigned to OTP team VM

Comments

@okeuday
Copy link
Contributor

okeuday commented Feb 16, 2024

Describe the bug
The compiler appears slower which is more noticeable on large modules. The specific example below has compilation going from 3m23.649s (when using OTP-26.2.2) to 27m0.187s (when using OTP-27.0-rc1). This bug could be a different way of observing the behavior in the issue #8132 and is reported here to provide a separate way of testing (it is unclear how much overlap exists between these two issues).

To Reproduce

git clone https://github.com/CloudI/CloudI.git
cd CloudI/src
time erlc -DTEST -Ilib/cloudi_core/include lib/cloudi_core/src/cloudi_statistics.erl

When using the machine described below with Ubuntu 20.04.6 LTS

Dell PowerEdge R710
Xeon X5650 2.66GHz 2 cpu, 6 core/cpu, 2 hts/core
L1:384KB L2:1.5MB L3:12MB RAM:128GB:DDR3-1333MT/s
Westmere-EP (LGA 1366/Socket B)

I obtained the results below

$ # when using OTP-26.2.2
$ time erlc -DTEST -Ilib/cloudi_core/include lib/cloudi_core/src/cloudi_statistics.erl 

real    3m23.649s
user    3m0.257s
sys     0m27.194s

$ # when using OTP-27.0-rc1
$ time erlc -DTEST -Ilib/cloudi_core/include lib/cloudi_core/src/cloudi_statistics.erl 

real    27m0.187s
user    22m47.023s
sys     4m48.055s

Expected behavior
The 3 minute execution time was expected, despite the module cloudi_statistics.erl having 278472 total lines.

Affected versions
OTP-27.0-rc1

Additional context
Most of the cloudi_statistics.erl module is a list of floats used for the eunit testing and that should somehow be responsible for the slowdown.

@okeuday okeuday added the bug Issue is reported as a bug label Feb 16, 2024
@IngelaAndin IngelaAndin added the team:VM Assigned to OTP team VM label Feb 19, 2024
@frej
Copy link
Contributor

frej commented Feb 19, 2024

Does compiling with +no_ssa_opt_alias +no_ssa_opt_destructive_update bring back the compilation time to normal? If so it's probably the same issue. Anyway I'm on it.

@frej
Copy link
Contributor

frej commented Feb 19, 2024

Does compiling with +no_ssa_opt_alias +no_ssa_opt_destructive_update bring back the compilation time to normal?

It does.

frej added a commit to frej/otp that referenced this issue Feb 20, 2024
The alias analysis pass tracks the structure and aliasing status of
values given as arguments and returned from functions. The information
is kept in the graph provided by the beam_ssa_ss module and is built
when the alias analysis processes instructions constructing
terms. When a literal is given as an argument to a function, the graph
is extended to have the same structure as if the literal was
constructed programmatically.

To handle self-recursive functions which, for example builds a list,
there is a cutoff which limits the depth to which the graph is
constructed. This allows a fixpoint to be reached without implementing
a solver for recurrence equations about the structure of the
arguments, but also limits the size of the graph.

This patch corrects an omission in the beam_ssa_ss:merge_in_arg/4
function where the cutoff was not enforced for list literals.

Closes erlang#8140
@frej
Copy link
Contributor

frej commented Feb 20, 2024

The fix is in #8162, combine it with #8153 and you should be close to the OTP-26.2.2 compile time, @okeuday.

By the way, if you change your input file a little (please ignore the broken indentation of random_value_list/0, doing it correctly would make the patch enormous), you can reduce the compilation time from the OTP-26 baseline of ~3m20 to ~13s. The gain is from tricking the compiler to not inline the ginormous list literal:

diff --git a/src/lib/cloudi_core/src/cloudi_statistics.erl b/src/lib/cloudi_core/src/cloudi_statistics.erl
index b2393ef8..f4383923 100644
--- a/src/lib/cloudi_core/src/cloudi_statistics.erl
+++ b/src/lib/cloudi_core/src/cloudi_statistics.erl
@@ -673,7 +673,7 @@ module_test_() ->
 %
 % Created with:
 % file:write_file("random_value.txt", io_lib:format("~80p", [lists:foldl(fun(_, L) -> [cloudi_x_quickrand:strong_float() | L] end, [], lists:seq(1, 1000000))])).
--define(RANDOM_VALUE_LIST,
+random_value_list() ->
 [0.7779093128442551,0.5602545451350247,0.7766852376495581,0.9456642438377318,
  0.5990680481881936,0.047728379072267346,0.6586301304420381,
  0.012254783431982719,0.043885545402663206,0.6954762542392063,
@@ -278147,7 +278147,7 @@ module_test_() ->
  0.059138035555632684,0.6680978116580759,0.42114572935754124,
  0.3686091664879384,0.6016074865504191,0.9221404144761567,0.6575440311922187,
  0.03963151998105385,0.5753451392396954,0.5852318676826616,
- 0.39998303185334383]).
+ 0.39998303185334383].
 
 uncache(Result) ->
     element(1, Result).
@@ -278179,10 +278179,10 @@ normal_distribution_value([Uniform1, Uniform2 | L], Mean, StandardDeviation) ->
 normal_distribution() ->
     Mean = 0,
     StandardDeviation = 1.0,
-    normal_distribution_value(?RANDOM_VALUE_LIST, Mean, StandardDeviation).
+    normal_distribution_value(random_value_list(), Mean, StandardDeviation).
 
 t_random_value() ->
-    L = ?RANDOM_VALUE_LIST,
+    L = random_value_list(),
     % standard deviation of a uniform distribution [a .. b]
     %   with a = 0.0 and b = 1.0 : ((b - a)^2 / 12)^0.5 = 0.288675...
     % skewness of a uniform distribution is 0.0

@okeuday
Copy link
Contributor Author

okeuday commented Feb 20, 2024

@frej Thank you for the fixes! I confirmed #8162 combined with #8153 resolves the issue and the use of the random_value_list/0 function improves compilation speed further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue is reported as a bug team:VM Assigned to OTP team VM
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants