Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Watchman-backed DiffAwareness implementation #22615

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

fmeum
Copy link
Collaborator

@fmeum fmeum commented Jun 3, 2024

Watchman is a mature, standalone file watching tool that supports Linux, macOS and Windows. Crucially, its watches can be reused by multiple clients, such as VCS tools, with no additional overhead.

The current commit uses Watchman's JSON API over a Unix socket or Windows named pipe for simplicity. The more compact BSER API could be used in the future if the protocol overhead turns out to be noticeable.

@fmeum fmeum requested a review from haxorz June 3, 2024 11:48
@github-actions github-actions bot added the awaiting-review PR is awaiting review from an assigned reviewer label Jun 3, 2024
@fmeum
Copy link
Collaborator Author

fmeum commented Jun 3, 2024

@haxorz Based on #22417 (comment), it seems necessary to me to outsource file system watching in Bazel to a maintained and standard alternative. This could be a first step towards this. I haven't added tests yet since that requires adding watchman to CI, but would look into that if this is generally viewed favorably.

@fmeum
Copy link
Collaborator Author

fmeum commented Jun 3, 2024

CC @sluongng

@sluongng
Copy link
Contributor

sluongng commented Jun 3, 2024

Some context on Watchman:

  • The project is open-source by Meta and works across a wide range of platforms.

  • Git's integration with Watchman was added in 2017 by Microsoft, following a similar integration in Mercurial, and was maintained/improved over time by folks from Dropbox, Microsoft, and myself (representing Booking.com at the time). (0)

  • Companies with big monorepo, such as Stripe(1) or Canva(2), also make uses of Watchman extensively. Most of these companies are also using Bazel, or have been migrating to Bazel in recent years.

Although the development of Watchman has been relatively quiet since Meta has been transitioning to EdenFS (a FUSE for their Sapling VCS), it's still the most robust file-watching solution to date with multiple strategies and fallbacks and is still actively used in newer projects such as Buck2. So I think this is a good addition to Bazel.

(0): https://github.com/git/git/commits/master/templates/hooks--fsmonitor-watchman.sample
(1): https://blog.nelhage.com/post/stripe-dev-environment/
(2): https://www.canva.dev/blog/engineering/we-put-half-a-million-files-in-one-git-repository-heres-what-we-learned/

@sgowroji sgowroji added the team-Core Skyframe, bazel query, BEP, options parsing, bazelrc label Jun 4, 2024
@fmeum fmeum force-pushed the 13226-watchman branch 2 times, most recently from 907d653 to abd53c2 Compare June 4, 2024 10:15
@fmeum
Copy link
Collaborator Author

fmeum commented Jun 4, 2024

I pushed a new commit that adds Windows support. I will not make any further changes for now.

@haxorz
Copy link
Contributor

haxorz commented Jun 5, 2024

Hi @fmeum thanks for the POC and the thoughtful PR.

Just wanted to let you know that @lberki, @meisterT, have been discussing it the past few days. Didn't want you to think the lack of response was due to us not seeing it!

@meisterT
Copy link
Member

meisterT commented Jun 6, 2024

I would like to see more user feedback whether a solution that requires installing watchman is acceptable.

Tagging a few users here, but we might also ask on bazel-discuss or slack.

cc @konste (due to #1931 (comment))

cc @keith (due to #13226 (comment))

cc @sushain97 (due to #18363)

@fmeum I assume addressing #5270 would be theoretically possible with watchman?

@fmeum
Copy link
Collaborator Author

fmeum commented Jun 6, 2024

@meisterT It would, although I expect the work for that to be mostly orthogonal to the DiffAwareness implementation.

@konste
Copy link

konste commented Jun 6, 2024

Does Watchman support Directory Junctions on Windows? It is NOT supposed to follow Directory Junctions in the same way as it skips symlinks .It is important for the case when --symlink_prefix is used and points to the directory (like.build). Those "convenience symlinks" in that directory are Junctions not the actual directory symlinks. We must not follow those.

Another question is how Watchman supposed to get to the build machines? Will Bazel install it as part of its own unpacking?

@fmeum
Copy link
Collaborator Author

fmeum commented Jun 6, 2024

Does Watchman support Directory Junctions on Windows? It is NOT supposed to follow Directory Junctions in the same way as it skips symlinks .It is important for the case when --symlink_prefix is used and points to the directory (like.build). Those "convenience symlinks" in that directory are Junctions not the actual directory symlinks. We must not follow those.

Watchman doesn't follow junctions, just like it doesn't follow symlinks.

Another question is how Watchman supposed to get to the build machines? Will Bazel install it as part of its own unpacking?

As far as I understand, watchman spawns a server process per user similar to what Bazel does. I do not know what happens when you launch different watchman binaries though (@sluongng Do you happen to know this?). It may be necessary to have a single watchman binary for Bazel and all other clients, which would make bundling less attractive.

@fmeum
Copy link
Collaborator Author

fmeum commented Jun 6, 2024

If watchman is not installed, Bazel could also just fall back to the current DiffAwareness implementations and show a warning.

@sluongng
Copy link
Contributor

sluongng commented Jun 6, 2024

Another question is how Watchman supposed to get to the build machines? Will Bazel install it as part of its own unpacking?

AFAIK, the current implementation makes watchman optional. It's something that you would need to opt-in and won't work automatically.

So there is no need for tight coupling when doing bazel and watchman DEB/RPM packaging.

If you are packaging everything in-house today, I would treat this the same way as how git is being packaged: There is an opt-in Git hook that enables Watchman integration. Some org would enable that by default and have git depend on watchman, some won't. And as Fabian said, if watchman is not available, things will just work like how it is today.

@sluongng
Copy link
Contributor

sluongng commented Jun 6, 2024

I do not know what happens when you launch different watchman binaries though (@sluongng Do you happen to know this?).

watchman has the same client-server architecture as Bazel.
A new binary would look for the existing unix socket to communicate with the server.

You should not need to launch a second server since one server can watch multiple file paths. But there are ways to configure multiple servers, just like Bazel with different output paths.

@meisterT
Copy link
Member

meisterT commented Jun 6, 2024

If watchman is not installed, Bazel could also just fall back to the current DiffAwareness implementations and show a warning.

I was hoping that after some migration / testing period we could remove the current implementations, at least on Mac/Windows so that our maintenance surface in an area that is not well covered already is not increased.

@konste
Copy link

konste commented Jun 6, 2024

Except it doesn't quite work on Windows now specifically because of the problem with --symlink_prefix which current implementation does not handle and wanders into Bazel output tree.

@meisterT
Copy link
Member

meisterT commented Jun 6, 2024

It may be necessary to have a single watchman binary for Bazel and all other clients, which would make bundling less attractive.

Bundling would also increase Bazel binary size, by 10MB or something in that ballpark.

@sluongng
Copy link
Contributor

sluongng commented Jun 6, 2024

@konste Could you please describe how you tested this on Windows?

AFAIK, both watchman and Java NIO WindowsWatchService use Windows API ReadDirectoryChangesW underneath, which ignores junctions.

Comparing the 2 implementations, watchman handles the overflow case correctly (by falling back to traversing the tree) while the Java implementation simply ignores overflow. So I think there is a small chance that you might successfully trigger this overflow, which could cause some traversal to happen. Still, I don't expect the traversal to follow symlink/junction at all here. So some reproduction steps would be helpful.

@konste
Copy link

konste commented Jun 6, 2024

@sluongng please take a look at #12054 It describes the problem pretty clearly. Convenience symlinks at the root of the source tree are handled properly, but in subfolder they cause problems which manifests as nearly immediate overflow for any sizable project.

@keith
Copy link
Member

keith commented Jun 6, 2024

I would like to see more user feedback whether a solution that requires installing watchman is acceptable.

Tagging a few users here, but we might also ask on bazel-discuss or slack.

cc @keith (due to #13226 (comment))

I'm supportive of this, i imagine watchman would sidestep the issues we saw (I think since I posted that comment it was suggested somewhere that it could have been because too many files changed at once)

@sushain97
Copy link
Contributor

sushain97 commented Jun 7, 2024

👋 We (Stripe) would definitely try this feature. We already use watchman extensively on our development machines and regularly run out of inotify watches even with a dramatically higher-than-default sysctl. We're fine with it requiring the system to have watchman installed out-of-band.

@sluongng
Copy link
Contributor

@konste

I am a bit confused with your particular issue.
First of all, according to your comment in (a), your workspace setup is a "virtual monorepo" with minimal source files. Instead, it patches multiple smaller repos together using http_archive and git_repository. In such case, --watchfs will not benefit you as much, regardless of the implementation of --watchfs, because it will only watch the source files inside the main workspace.

Changes to http_archive and git_repository should always invalidate the repository rules, which trigger Bazel to re-execute the repo rule and rescan the dir after. So if such a condition is still true, I think it would be better for you to turn off --watchfs entirely. 🤔


Secondly, Watchman does come with its own config file, in which you could exclude directories from being watched (b). This is applicable on the watchman level, so events from ReadDirectoryChangesW matching the ignore dirs will be discarded. In case an overflow happens, Watchman will issue a re-crawl of the watch project, which will also respect the ignore dirs.

If you still need --watchfs, I think adding a .watchmanconfig file with ignore_dirs set to your --symlink_prefix will help. I would be interested to learn how often overflows happen in your case if you could share with us your watchman log file.

(a): 0cdd71f#commitcomment-44572088
(b): https://facebook.github.io/watchman/docs/config#ignore_dirs

@konste
Copy link

konste commented Jun 10, 2024

@sluongng let me provide some comments:
About "virtual monorepo" - this comment was written four years ago. By now the picture is substantially simplified and we got real monorepo, pretty standard one. We can forget about that kink as it is not actual anymore.

Watchman does come with its own config file, in which you could exclude directories from being watched

This is very nice to hear and practically speaking it sufficient to solve the problem with .build folder exclusion. Still it would be better if Bazel knows to exclude that folder automatically based on it being the value of --symlink_prefix flag or alternatively it can be auto-skipped because it only contains directory junctions (AKA convenience symlinks).

Now there is another important question - what is the relationship between the set of folders configured for Watchman and the set of folders Bazel currently scans using its own rules. What would happen if some folder supposed to be scanned by Bazel but excluded in Watchman config? Will it still be scanned by Bazel (but not watched)? We get two sources of truth here and it can be taken for granted that early or later they diverge. We need to make one of them to follow the other or otherwise a mechanism to discover discrepancies in those folder sets and alert the user. .bazelignore file only adds to the confusion as it is potentially the third source of truth. What do you think would be the right solution here?

And by the way thanks a lot for looking into it! Very much appreciated.

@sluongng
Copy link
Contributor

What would happen if some folder supposed to be scanned by Bazel but excluded in Watchman config? Will it still be scanned by Bazel (but not watched)? We get two sources of truth here and it can be taken for granted that early or later they diverge. We need to make one of them to follow the other or otherwise a mechanism to discover discrepancies in those folder sets and alert the user. .bazelignore file only adds to the confusion as it is potentially the third source of truth. What do you think would be the right solution here?

In my personal opinion, this is not something that could be fixed. It's an inherited downside of using an external file-watching service instead of building it inside Bazel code base.

We could provide some guardrails in the future to help detect and warn users about the diff between .bazelignore and .watchmanconfig directories, and perhaps even help create a default .watchmanconfig if one was missing. But fundamentally, it's still gonna be 2 different sets of configs to maintain. Luckily, .watchmanconfig and .bazelignore are not updated often, at least from what I have experienced. I bet that most big orgs who have experimented with Watchman in the past should also have some .watchmanconfig file setup with some sane default, so this will be very easy for them to onboard. For the rest of the user base who are new to Watchman, GitHub Code Search could be a good source of inspiration https://github.com/search?type=code&q=path%3A.watchmanconfig+%2Fignore_dirs%2F.

@konste
Copy link

konste commented Jun 12, 2024

It's an inherited downside of using an external file-watching service instead of building it inside Bazel code base.

Not necessarily. Watchman must have some kind of API to be controlled by the clients like Bazel, right? Using that API we should be able to exclude a folder automatically based on it being the value of --symlink_prefix flag or alternatively it can be auto-skipped because it only contains directory junctions (AKA convenience symlinks).

But let me repeat my question to clarify the current design. What would happen if some folder supposed to be scanned by Bazel but excluded in Watchman config? Will it still be scanned by Bazel (but not watched)?

@fmeum
Copy link
Collaborator Author

fmeum commented Jun 14, 2024

But let me repeat my question to clarify the current design. What would happen if some folder supposed to be scanned by Bazel but excluded in Watchman config? Will it still be scanned by Bazel (but not watched)?

With the current state of the implementation, such a folder would indeed be seen by Bazel but changes to it wouldn't be tracked. If we decide to go forward with this implementation, we could have Bazel read the list of directories ignored by Watchman and emit a warning or fail if it doesn't ignore all of them.

Not necessarily. Watchman must have some kind of API to be controlled by the clients like Bazel, right? Using that API we should be able to exclude a folder automatically based on it being the value of --symlink_prefix flag or alternatively it can be auto-skipped because it only contains directory junctions (AKA convenience symlinks).

Yes and no. There are two ways to ignore directories with watchman: the .watchmanconfig file allows specific directories to be ignored and any client of the watchman server can supply additional exclusions based on a filter language. The difficult part is that depending on the OS and it's low-level file watching API, Watchman may still need to watch parts of the file systems that it will then ignore. This is especially relevant on Windows, where the default strategy is to watch entire directory trees with a single Windows API call. This could be improved in Watchman if necessary.

@konste
Copy link

konste commented Jun 14, 2024

and emit a warning or fail if it doesn't ignore all of them.

THIS is extremely important! Whatever design you end up implementing please try to detect inconsistencies in configuration and give a big honking warning explaining the problem and probably reference a page with the guidance how it can be fixed.
I guess I am beating a dead horse here but just wanted to confirm how crucially important it is for the widely spread and loosely controlled Bazel environments.

fmeum added 2 commits August 6, 2024 15:51
Watchman is a mature, standalone file watching tool that supports Linux, macOS and Windows. Crucially, its watches can be reused by multiple clients, such as VCS tools, with no additional overhead.

The current commit uses Watchman's JSON API over a Unix socket for simplicity. The more compact BSER API could be used in the future if the protocol overhead turns out to be noticeable.
@fmeum
Copy link
Collaborator Author

fmeum commented Aug 6, 2024

This PR is ready for review. It doesn't have tests yet as those require Watchman to be installed in all CI environments, which I could work on once the prod parts have been approved.

Areas of improvement left for follow-up changes:

  1. Verify that all paths ignored by .watchmanconfig are also ignored by Bazel.
  2. If found to be necessary in practice, improve Watchman so that it can ignore events to certain subtrees already on the FS level.

@meisterT

@meisterT
Copy link
Member

meisterT commented Aug 6, 2024

@haxorz from a product point of view, we would be willing to accept this PR as long as it is clearly marked as experimental functionality (I see it is guarded behind and --experimental_* flag). Would you be willing to do the technical review?

@fmeum
Copy link
Collaborator Author

fmeum commented Oct 2, 2024

@haxorz Friendly ping

@lberki
Copy link
Contributor

lberki commented Oct 3, 2024

@haxorz do you think you can attend to this review in time for Bazel 8?

@sluongng
Copy link
Contributor

sluongng commented Oct 4, 2024

There is a recent build meetup with a talk about Watchman by @arxanas

https://blog.waleedkhan.name/incremental-watchman/

It might help get folks to get up to speed on the what/how of Watchman.

@meisterT
Copy link
Member

I wanted to give this a try today, and failed with getting watchman to run.

The installation instructions (https://facebook.github.io/watchman/docs/install#linux) mention prebuilt debs and binaries that do no longer exist apparently. Then I tried to build it from source, but ran into this error:

CMake Error at /usr/share/cmake-3.29/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
  Could NOT find OpenSSL, try to set the path to OpenSSL root folder in the
  system variable OPENSSL_ROOT_DIR (missing: OPENSSL_CRYPTO_LIBRARY) (found
  suitable version "3.3.2", minimum required is "1.1.1")
Call Stack (most recent call first):
  /usr/share/cmake-3.29/Modules/FindPackageHandleStandardArgs.cmake:600 (_FPHSA_FAILURE_MESSAGE)
  /usr/share/cmake-3.29/Modules/FindOpenSSL.cmake:686 (find_package_handle_standard_args)
  CMakeLists.txt:135 (find_package)

This seems similar to facebook/watchman#1225 but I already have that library installed.

@meisterT
Copy link
Member

I found a linux binary in this (failed) action: https://github.com/facebook/watchman/actions/runs/11827666245/job/32956253441

But sadly it is linked against libglog.so.0 while my system provides libglog.so.1.

@ahornby
Copy link

ahornby commented Nov 27, 2024

@meisterT @sluongng to build watchman locally from source its similar to building folly or fbthrift: there is a getdeps.py script coordinating the various cmake and cargo calls (folly README has background, essentially it exists to stitch back together the various bits from the internal monorepo)

First you need toolchain, you might already have these installed, but just in case:

# c++ toolchain and openssl install (you probably already have these of course)
sudo apt install libssl-dev g++ pkg-config m4
# rust toolchain installed and on PATH, if you don't then use something like https://rustup.rs/, which you of course will read carefully and then run like so:
sh ~/rustup-install.sh

Then the watchman build steps:

# clone with git (or sapling!) and cd
git clone https://github.com/facebook/watchman.git && cd watchman
# install system dependencies that getdeps.py thinks the distro can provide (things like cmake, boost, xz)
./build/fbcode_builder/getdeps.py install-system-deps --recursive watchman
# fetch and build the source dependencies (things like cpptoml, folly, fbthrift), and then build watchman itself
./build/fbcode_builder/getdeps.py --allow-system-packages build --src-dir=. watchman
# now you have fresh binaries, ask getdeps where they are:
./build/fbcode_builder/getdeps.py --allow-system-package show-inst-dir watchman

Once you have binaries, optionally run tests. State at time of writing: clean pass on centos stream 9 and fedora 40, 2 erroring tests on ubuntu 22.04 that you'll need to take a view on if important to you.

# optionally run the tests 
./build/fbcode_builder/getdeps.py --allow-system-packages test --src-dir=. watchman

now you are getdeps.py expert prepared to build fbthrift, folly, watchman et al, and hopefully have a working binary!

let me know if this works for you or not, and I'll see if I can help update the watchman docs. I don't have time to debug/fix the test fails on ubuntu at the moment, but if you do I can probably help review PRs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting-review PR is awaiting review from an assigned reviewer team-Core Skyframe, bazel query, BEP, options parsing, bazelrc
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants