Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scalafmt fails with Build without the bytes #1525

Open
gergelyfabian opened this issue Nov 6, 2023 · 20 comments
Open

Scalafmt fails with Build without the bytes #1525

gergelyfabian opened this issue Nov 6, 2023 · 20 comments

Comments

@gergelyfabian
Copy link
Contributor

gergelyfabian commented Nov 6, 2023

When running a target like this:

# Important to clean, as the files can stay from a previous bazel build with a lower Bazel version.
bazel clean
bazel run //utils/smth:library.format-test

I receive:

utils/smth/src/main/scala/com/dummy/smth/Smth.scala
diff: /home/gege/.cache/bazel/_bazel_user/7f83a0b1f0940f88aec1f4530f927169/execroot/repo/bazel-out/k8-fastbuild/bin/utils/smth/utils/smth/src/main/scala/com/dummy/smth/Smth.scala.fmt.output: No such file or directory

It seems the action responsible for creating this file executes (I have tested disabling writing the output there, but then obviously it fails due to not generating an output), but at a later step the action running scalafmt from the manifest cannot see the scalafmt outputs.

One difference I can see in the bazel-bin folder between Bazel 6.4.0 and Bazel 7.0rc2 (after running bazel run //utils/smth:library.format-test):

# 6.4.0:
ls bazel-bin/utils/smth/utils/smth/src/main/scala/com/dummy/smth/
Smth.scala.fmt.output
# 7.0rc2:
ls bazel-bin/utils/smth/utils/smth/src/main/scala/com/dummy/smth/
[empty]
@gergelyfabian
Copy link
Contributor Author

Is this a Bazel bug or a rules_scala one?

@gergelyfabian
Copy link
Contributor Author

Added reproduction in https://github.com/gergelyfabian/bazel-scala-example/ (master branch):

# Change .bazelversion to 7.0.0rc2 and then run:
bazel run //example-maven:example-maven.format-test

@gergelyfabian
Copy link
Contributor Author

gergelyfabian commented Nov 6, 2023

Checking whether this is a bazel issue:

export BAZELISK_CLEAN=true
bazel --bisect=6.4.0..release-7.0.0rc2 run //example-maven:example-maven.format-test

@gergelyfabian
Copy link
Contributor Author

gergelyfabian commented Nov 6, 2023

Checking whether this is a bazel issue:

export BAZELISK_CLEAN=true
bazel --bisect=6.4.0..release-7.0.0rc2 run //example-maven:example-maven.format-test

Trick is that I had to turn off using a java toolchain and remotejdk_17 first, as remotejdk_17 does not work for some commits as bisecting.

@gergelyfabian
Copy link
Contributor Author

--- Bisect Result

first bad commit is https://github.com/bazelbuild/bazel/commit/9c96529fca4a135c162e35ce2882834b879438fb

That is:

Build without the bytes by default

by changing the default value of `--remote_download_outputs` to `toplevel`.

@gergelyfabian
Copy link
Contributor Author

Thank you, bazelisk, indeed this seems to be the answer.
This works:

bazel run --remote_download_all //example-maven:example-maven.format-test

@gergelyfabian
Copy link
Contributor Author

Forming the question again, scalafmt internal outputs not working with "build without the bytes" is a bug of Bazel or rules_scala?

@gergelyfabian
Copy link
Contributor Author

Starting from Bazel 7, the default download mode will be changed from --remote_download_all to --remote_download_toplevel. That is, Bazel will no longer download intermediate outputs of remote actions into the local output base. In return, your remote builds will be faster.

"Using a remote cache or even a disk cache will make actions "remote" from this point of view." (thanks @fmeum)

In my case, disk cache is enabled.

@gergelyfabian
Copy link
Contributor Author

The information I received on the Bazel slack was that if rules_scala emitted correct information (DefaultInfo(runfiles = ...)) for the outputs it generates (the .fmt.output files), then those should be visible also when using Build without the bytes.
Unfortunately I cannot find yet whether the scalafmt phase is generating proper DefaultInfo.

@liucijus
Copy link
Collaborator

liucijus commented Nov 7, 2023

The information I received on the Bazel slack was that if rules_scala emitted correct information (DefaultInfo(runfiles = ...)) for the outputs it generates (the .fmt.output files), then those should be visible also when using Build without the bytes. Unfortunately I cannot find yet whether the scalafmt phase is generating proper DefaultInfo.

@gergelyfabian could you check if anything might be missing here: https://github.com/bazelbuild/rules_scala/blob/master/scala/private/phases/phase_scalafmt.bzl#L18?

@gergelyfabian
Copy link
Contributor Author

The current state where I'm at is this. phase_scalafmt seems to be adding the proper information (on the line you quoted):

        # Return a depset containing all the relevant files, so a wrapping `sh_test` can successfully access them.
        return struct(runfiles = depset([manifest] + files + srcs))

Then, we go to the phase_default_info, and runfiles properly includes the .fmt.output files there (added a print before that):

https://github.com/bazelbuild/rules_scala/blob/master/scala/private/phases/phase_default_info.bzl#L46

But, when api.bzl runs (https://github.com/bazelbuild/rules_scala/blob/master/scala/private/phases/api.bzl#L85) it does not get the DefaultInfo that phase_default_info was supposed to return to it. I don't get why.

@gergelyfabian
Copy link
Contributor Author

Debug output like this:

DEBUG: /home/user/.cache/bazel/_bazel_user/64527517dacf7170f7d914889f83481c/external/io_bazel_rules_scala/scala/private/phases/phase_default_info.bzl:38:10: Returning DefaultInfo with runfiles: depset([<generated file example-maven/example-maven.jar>, <generated file external/maven/com/twitter/algebird-core_2.13/0.13.7/processed_algebird-core_2.13-0.13.7.jar>, <generated file external/maven/com/googlecode/javaewah/JavaEWAH/1.1.7/processed_JavaEWAH-1.1.7.jar>, <generated file external/maven/org/scala-lang/modules/scala-collection-compat_2.13/2.1.6/processed_scala-collection-compat_2.13-2.1.6.jar>, <generated file external/maven/org/scala-lang/scala-library/2.13.12/processed_scala-library-2.13.12.jar>, <generated file external/maven/org/scala-lang/scala-reflect/2.13.12/processed_scala-reflect-2.13.12.jar>, <generated file external/maven/org/typelevel/algebra_2.13/2.0.0/processed_algebra_2.13-2.0.0.jar>, <generated file external/maven/org/typelevel/cats-kernel_2.13/2.0.0/processed_cats-kernel_2.13-2.0.0.jar>, <generated file example-maven/format/example-maven/manifest.txt>, <generated file example-maven/example-maven/src/main/scala/mypackage/Maven.scala.fmt.output>, <source file example-maven/src/main/scala/mypackage/Maven.scala>])
DEBUG: /home/user/.cache/bazel/_bazel_user/64527517dacf7170f7d914889f83481c/external/io_bazel_rules_scala/scala/private/phases/api.bzl:85:22: Detected external providers: {"DefaultInfo": struct(data_runfiles = None, default_runfiles = <unknown object com.google.devtools.build.lib.analysis.Runfiles>, files = depset([<generated file example-maven/example-maven.jar>]), files_to_run = None)}

Debugging code is here: https://github.com/gergelyfabian/rules_scala/tree/bazel-7.0-scalafmt-bug

@gergelyfabian
Copy link
Contributor Author

Btw. the following command also fails for Bazel 6.4.0: bazel run --remote_download_toplevel //example-maven:example-maven.format-test

@gergelyfabian gergelyfabian changed the title Scalafmt fails on Bazel 7.0rc2 Scalafmt fails with Build without the bytes Nov 8, 2023
@gergelyfabian
Copy link
Contributor Author

Maybe the necessary information is lost here:

    print("DefaultInfo with runfiles: %s" % depset(direct = direct, transitive = runfiles))
    print("DefaultInfo runfiles after wrapping: %s" % ctx.runfiles(transitive_files = depset(direct = direct, transitive = runfiles), collect_data = True))

Added these debug lines just before the return statement in phase_default_info.bzl.

Prints:

DEBUG: /home/user/.cache/bazel/_bazel_user/64527517dacf7170f7d914889f83481c/external/io_bazel_rules_scala/scala/private/phases/phase_default_info.bzl:38:10: DefaultInfo with runfiles: depset([<generated file example-maven/example-maven.jar>, <generated file external/maven/com/twitter/algebird-core_2.13/0.13.7/processed_algebird-core_2.13-0.13.7.jar>, <generated file external/maven/com/googlecode/javaewah/JavaEWAH/1.1.7/processed_JavaEWAH-1.1.7.jar>, <generated file external/maven/org/scala-lang/modules/scala-collection-compat_2.13/2.1.6/processed_scala-collection-compat_2.13-2.1.6.jar>, <generated file external/maven/org/scala-lang/scala-library/2.13.12/processed_scala-library-2.13.12.jar>, <generated file external/maven/org/scala-lang/scala-reflect/2.13.12/processed_scala-reflect-2.13.12.jar>, <generated file external/maven/org/typelevel/algebra_2.13/2.0.0/processed_algebra_2.13-2.0.0.jar>, <generated file external/maven/org/typelevel/cats-kernel_2.13/2.0.0/processed_cats-kernel_2.13-2.0.0.jar>, <generated file example-maven/format/example-maven/manifest.txt>, <generated file example-maven/example-maven/src/main/scala/mypackage/Maven.scala.fmt.output>, <source file example-maven/src/main/scala/mypackage/Maven.scala>])
DEBUG: /home/user/.cache/bazel/_bazel_user/64527517dacf7170f7d914889f83481c/external/io_bazel_rules_scala/scala/private/phases/phase_default_info.bzl:39:10: DefaultInfo runfiles after wrapping: <unknown object com.google.devtools.build.lib.analysis.Runfiles>

@gergelyfabian
Copy link
Contributor Author

gergelyfabian commented Nov 8, 2023

I'm wondering whether collect_data does what we expect it to do, especially if I read the comment:

https://github.com/bazelbuild/rules_scala/blob/master/scala/private/phases/phase_default_info.bzl#L43-L46

@gergelyfabian
Copy link
Contributor Author

gergelyfabian commented Nov 8, 2023

I've added some more debugging and I think api.bzl is returning proper DefaultInfo for the rules.

Having https://github.com/bazelbuild/rules_scala/blob/master/scala/private/phases/api.bzl#L91 (that sets up the final rule output):

I have added a debug line just before that:

print("DefaultInfo before the end: %s" % acculmulated_external_providers["DefaultInfo"].default_runfiles.files)

This prints for me:

DEBUG: /home/user/.cache/bazel/_bazel_user/64527517dacf7170f7d914889f83481c/external/io_bazel_rules_scala/scala/private/phases/api.bzl:93:10: DefaultInfo before the end: depset([<generated file example-maven/example-maven.jar>, <generated file external/maven/com/twitter/algebird-core_2.13/0.13.7/processed_algebird-core_2.13-0.13.7.jar>, <generated file external/maven/com/googlecode/javaewah/JavaEWAH/1.1.7/processed_JavaEWAH-1.1.7.jar>, <generated file external/maven/org/scala-lang/modules/scala-collection-compat_2.13/2.1.6/processed_scala-collection-compat_2.13-2.1.6.jar>, <generated file external/maven/org/scala-lang/scala-library/2.13.12/processed_scala-library-2.13.12.jar>, <generated file external/maven/org/scala-lang/scala-reflect/2.13.12/processed_scala-reflect-2.13.12.jar>, <generated file external/maven/org/typelevel/algebra_2.13/2.0.0/processed_algebra_2.13-2.0.0.jar>, <generated file external/maven/org/typelevel/cats-kernel_2.13/2.0.0/processed_cats-kernel_2.13-2.0.0.jar>, <generated file example-maven/format/example-maven/manifest.txt>, <generated file example-maven/example-maven/src/main/scala/mypackage/Maven.scala.fmt.output>, <source file example-maven/src/main/scala/mypackage/Maven.scala>], order = "postorder")

So, this includes the files that I think should be there, specifically: Maven.scala.fmt.output

So I think the rule definition returns a DefaultInfo that contains what is necessary. Even though it's under default_runfiles.files.
When setting up ctx.runfiles (that ends up in DefaultInfo) I tried adding these files into ctx.runfiles(files = ...) , but still it did not help, in either way those files end up in .default_runfiles.files

Is the DefaultInfo constructed by rules_scala faulty, or it's a Bazel bug?

Updated debugging code in https://github.com/gergelyfabian/rules_scala/tree/bazel-7.0-scalafmt-bug.

@gergelyfabian
Copy link
Contributor Author

Opened Bazel bug: bazelbuild/bazel#20099

@gergelyfabian
Copy link
Contributor Author

Quoting a response from the bazel issue:

There are a couple of odd things going on here, and I'm not sure that any of them are Bazel's fault.

It appears that you're invoking bazel run on a generated file, not an executable rule:

$ bazel query --output=label_kind //example-maven:example-maven.format-test
generated file //example-maven:example-maven.format-test

I don't know why bazel run allows a plain file as an argument, but it seems reasonable to me that running a file won't give access to the runfiles; files have no associated runfiles, rules do.

That brings us to another problem, which is that scala_library isn't an executable rule, so it's not possible to bazel run it either:

$ bazel query --output=label_kind //example-maven:example-maven
scala_library rule //example-maven:example-maven
$ bazel run //example-maven:example-maven
ERROR: Cannot run target //example-maven:example-maven: Not executable

So my suggestion would be to (1) mark scala_library as an executable rule, (2) return the formatter in the DefaultInfo.executable field alongside the runfiles, (3) make sure the formatter looks for the runfiles in the right location (perhaps by using a runfiles library for your implementation language), and finally (4) bazel run the scala_library directly.

As to why you're only running into this now: as it currently stands, your bazel run command is effectively non-hermetic, but bazel run(unlike build actions) doesn't provide a safeguard against non-hermeticity; if Bazel downloads all build outputs, your program will find them anyway, but Bazel has no idea that the dependency is there, and won't realize that they must be downloaded when building without the bytes.

Originally posted by @tjgq in bazelbuild/bazel#20099 (comment)

@alexeagle
Copy link

Hey @gergelyfabian I'm curious if you've tried this with https://github.com/aspect-build/rules_lint
We support scalafmt and shouldn't have any of these bugs that Tiago pointed out there.

@gergelyfabian
Copy link
Contributor Author

I experienced some issues when trying to apply the latest version of rules_lint, and also it seems it's ~3 times slower on our monorepo than running rules_scala's scalafmt (with parallelized targets), so I think we'll be staying with that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants