Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: copy_directory copies .DS_Store leading to remote cache misses #887

Open
calebmer opened this issue Jul 25, 2024 · 8 comments
Open
Assignees
Labels
bug Something isn't working

Comments

@calebmer
Copy link

calebmer commented Jul 25, 2024

What happened?

.DS_Store is a file sometimes added to a directory on MacOS by the operating system (it has to do with file system indexing I think). Sometimes these .DS_Store sneak themselves into npm package repositories (generated by rules_js) then copied with this action here:

https://github.com/aspect-build/rules_js/blob/d0ff155c73e3c7fee5d72485e00775bca1fde10a/npm/private/npm_package_store.bzl#L222-L232

I’m some debugging remote cache misses on MacOS and when I look at the execution log, I’m seeing .DS_Store appearing in the diff. Example from one of the execution log diffs I generated following “Debugging Remote Cache Hits for Remote Execution”:

...

@@ -2229270,14 +2229002,6 @@
     hash_function_name: "SHA-256"
   }
   is_tool: true
-}
-inputs {
-  path: "external/npm__phosphor-react__1.4.1__-440667795/package/.DS_Store"
-  digest {
-    hash: "14dcaaf35f98cb101936646924d50e812d07e260ef2ccea177c1d268e1561200"
-    size_bytes: 6148
-    hash_function_name: "SHA-256"
-  }
 }
 inputs {
   path: "external/npm__phosphor-react__1.4.1__-440667795/package/LICENSE"
@@ -2262980,14 +2262704,6 @@
 cacheable: true
 mnemonic: "CopyDirectory"
 actual_outputs {
-  path: "bazel-out/darwin_arm64-fastbuild/bin/node_modules/.aspect_rules_js/[email protected]_-440667795/node_modules/phosphor-react/.DS_Store"
-  digest {
-    hash: "14dcaaf35f98cb101936646924d50e812d07e260ef2ccea177c1d268e1561200"
-    size_bytes: 6148
-    hash_function_name: "SHA-256"
-  }
-}
-actual_outputs {
   path: "bazel-out/darwin_arm64-fastbuild/bin/node_modules/.aspect_rules_js/[email protected]_-440667795/node_modules/phosphor-react/LICENSE"
   digest {
     hash: "bd618db104d07526fe78a9c28dbc2cc4c4286d756aa8b428e51b82de0f8d6aeb"

...

I know this is copy_directory related since when I go to the log file this is from and look for the command I see this snippet:

...

---------------------------------------------------------

command_args: "external/copy_directory_darwin_arm64/copy_directory"
command_args: "external/npm__phosphor-react__1.4.1__-440667795/package"
command_args: "bazel-out/darwin_arm64-fastbuild/bin/node_modules/.aspect_rules_js/[email protected]_-440667795/node_modules/phosphor-react"
platform {
}
inputs {
  path: "external/copy_directory_darwin_arm64/copy_directory"
  digest {
    hash: "eddc1d4e6a5142106850e1466e4cd384a343ca8be04579dc0add52e5d4f63ac7"
    size_bytes: 1507746
    hash_function_name: "SHA-256"
  }
  is_tool: true
}
inputs {
  path: "external/npm__phosphor-react__1.4.1__-440667795/package/.DS_Store"
  digest {
    hash: "14dcaaf35f98cb101936646924d50e812d07e260ef2ccea177c1d268e1561200"
    size_bytes: 6148
    hash_function_name: "SHA-256"
  }
}
inputs {
  path: "external/npm__phosphor-react__1.4.1__-440667795/package/LICENSE"
  digest {
    hash: "bd618db104d07526fe78a9c28dbc2cc4c4286d756aa8b428e51b82de0f8d6aeb"
    size_bytes: 1092
    hash_function_name: "SHA-256"
  }
}

...

How could I configure copy_directory() to ignore .DS_Store files if they’re present to prevent remote caching misses? Ideally I think copy_directory() should probably ignore .DS_Store files automatically since it’s very unlikely they’re contributing to the build.

I haven’t proven this is the incompatibility between the two MacOS machines which is causing the remote cache miss but figured it may be worth addressing whether or not it’s the incompatibility causing my remote caching misses.

Version

Development (host) and target OS/architectures: MacOS arm64

Output of bazel --version:

Bazelisk version: v1.20.0
Build label: 7.2.0
Build target: @@//src/main/java/com/google/devtools/build/lib/bazel:BazelServer
Build time: Mon Jun 10 13:04:55 2024 (1718024695)
Build timestamp: 1718024695
Build timestamp as int: 1718024695

Version of the Aspect rules, or other relevant rules from your
WORKSPACE or MODULE.bazel file:

  • aspect_bazel_lib v1.42.3
  • rules_nodejs v5.8.4
  • aspect_rules_js v1.42.3

Language(s) and/or frameworks involved: Node.js, JavaScript, pnpm

@calebmer calebmer added the bug Something isn't working label Jul 25, 2024
@gregmagolan
Copy link
Collaborator

Interesting one. It does look like that .DS_Store file snuck into that the external repository directory external/npm__phosphor-react__1.4.1__-440667795/package/ as it is not found in the package archive: https://unpkg.com/browse/[email protected]/

I could see some blanket ignores in the copy_directory rule to avoid copying files such as .DS_Store that are automatically injected by the OS... In the latest rules_js this should be less of an issue since most npm packages should be extracted directly into the output tree from the download tar unless they have a patch applied or have a lifecycle hook.

@calebmer
Copy link
Author

In this case I am patching phosphor-react. (I think the -440667795 in the repository name is a patch hash?)

@matthewjh
Copy link

matthewjh commented Jul 30, 2024

Interesting find. I have noticed more than cache misses onMacOS than expected, but did not yet drill into why myself.

"Ideally I think copy_directory() should probably ignore .DS_Store files automatically since it’s very unlikely they’re contributing to the build."

I do agree with this.

@matthewjh
Copy link

@gregmagolan any more thoughts or updates on this? I keep on seeing cache misses on Macos even when the source has not changed. I haven't checked what is generating those misses but it could plausibly be DS_Store.

@gregmagolan
Copy link
Collaborator

Yes, this looks like something we can fix. The principled fix is to stop using copy_directory for the lifecycle hook actions and instead untar the tarball so that only the tarball is the input. In the meantime, we could add a feature to the copy_directory rule to filter out files. We control the golang implementation so it would be relatively easy to add that feature to bazel-lib and then consume it in rules_js.

No time for this right now for me as I'm in scramble mode on multiple fronts but I would be happy to review PRs with the changes. FYSA @jbedard

@matthewjh
Copy link

matthewjh commented Nov 12, 2024

It seems better to address this at the copy_directory level than lifecycle hook, no? Anyone who opens any directory in their workspace or external repos folder in MacOS Finder is going to inadvertently introduce a .DS_Store file, which will generate downstream cache misses. If you move the lifecycle hooks off copy_directory, this still affects every other use case of that rule for Mac OS users.

I suppose perhaps copy_directory is not that widely used versus copy_to_directory.

@gregmagolan
Copy link
Collaborator

gregmagolan commented Nov 12, 2024

The problematic case with the npm packages is unique as it takes a source directory as an input. This means that everything in the directory which isn't managed by Bazel ends up being copied into the output TreeArtifact, including any .DS_Store files.

Source directory inputs are pretty rare from what I've seen in the wild. We used them here in rules_js since enumerating each file as an individual file inputs from an extracted npm package won't work if files have spaces or other special characters that Bazel doesn't like. Other uses of copy_directory that take a TreeArtifact instead of a source directory would not suffer from this issue. Its just source directory inputs that have the problem here and I don't think copy_directory with source directory inputs is a common case outside of rules_js npm packages.

Either way, we should solve the both in both spots since copy_directory should do the right thing with source directory inputs and extracting a tar instead of copying a directory is more efficient for lifecycle hooks.

@matthewjh
Copy link

Thanks for elaborating. I see what you mean and agree with your points.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants