Skip to content

Conversation

arohner
Copy link
Contributor

@arohner arohner commented Aug 7, 2025

Significantly improve compile time, CPU usage and memory usage. Significant rework of internals.

The previously observed memory "leak" was not a leak. To avoid race conditions and to get parallelism, we were creating a classloader per thread and compiling in every thread. This wasn't leaking memory, just using more memory than a dev laptop has available. Wall clock time was also poor because reloading the same classes once per core (8-10x) is inefficient.

We instead use a single classloader shared by all threads. We avoid the previous need to discard a classloader (causing more wasted work!) by instead reloading downstream dependencies, the same way a human would at a REPL. Parallelism is handled via agents. We create an agent for each namespace in the project, and send the commands to it. Before compiling a namespace, we send require to all of its dependents. If the namespace contains deftype, definterface or defprotocol, we send reload to all downstream dependencies, to avoid AOT hell.

@arohner arohner force-pushed the speed branch 3 times, most recently from 7e9d37d to 3b4d55c Compare August 12, 2025 16:56
@arohner
Copy link
Contributor Author

arohner commented Aug 12, 2025

Oh, for posterity: this has known issues with potemkin, manifold and byte-streams. Those libraries use the funky definterface+ et. al, which contain

(defmacro definterface+ [name & body]
  (when-not (resolve name)
    `(definterface ~name ~@body)))

If the namespace is already present on the classpath, calling compile on this code will not output .class files to be placed in the jar. The problem is exacerbated by bazel sometimes compiling jars twice, once for compile-time deps, and once for runtime. The two solutions are: use the griffinbank/* forks of these libraries, or in the future, use bazel's path-mapping feature to avoid the double compile.

@miridius miridius mentioned this pull request Aug 13, 2025
@miridius miridius requested a review from a team August 13, 2025 06:45
Copy link

@miridius miridius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first pass complete, 4 files left to review, will finish them later today

Comment on lines 125 to 128
(do
(doseq [p classpath]
(add-url classloader p))
classloader)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a risk that this will cause some things to successfully compile even though they depend on something no longer in the classpath?

@miridius
Copy link

first pass complete, 4 files left to review, will finish them later today

oh yeah I finished this yesterday but didn't have any more comments to add :)

@arohner arohner force-pushed the speed branch 8 times, most recently from 96abf51 to f25ed9f Compare August 18, 2025 19:49
Copy link

@miridius miridius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking much cleaner now! very nice

;; files. If we reload after compiling, that breaks some OSS code
;; doing janky things. Therefore, cache every compile request, and if
;; bazel asks for the same ns again, just copy the files. We
;; fingerprint namespaces using the SHA of the file contents

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


(defn agent-compile [all-ns-decls dest-dir ns]
((ensure-agent ns)
(fn [_]
Copy link

@miridius miridius Aug 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we're never using the agent state, we're using it to do to 3 things:

  1. run thunks one at a time per ns
  2. run thunks in different nses in parallel
  3. wait for those to be done

Could we:

  1. create a lock per ns
  2. require and compile would be (future (locking ns-lock (do-stuff)))
  3. deref the returned futures

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ lock + future works if we don't care about ordering of the submitted requests, which maybe we don't (because of sha?). But if we do, I think a better idea (also mentioned in Slack) is to create a (Executors/newSingleThreadExecutor) per ns and then (.submit ns-exec ^Runnable (bound-fn [] ...)). that will return a future that you can deref to await

@arohner arohner force-pushed the speed branch 8 times, most recently from 9731425 to 9e6cc59 Compare August 21, 2025 20:28
src_dir = None

compile_classpath = compile_info.transitive_runtime_jars.to_list() + ctx.files.compiledeps + [classes_dir]
compile_classpath = compile_info.transitive_runtime_jars.to_list() + ctx.files.compiledeps
Copy link
Contributor Author

@arohner arohner Aug 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Classes dir is no longer a declared output of the build, which means:

  • downstream targets can't read from it
  • we've moved classes dir to a per-request temp dir, which means other rules-clojure processes can't read it either (fixes a race condition where a second rules clojure process would read out the classes dir from the other process, because it was on the classpath)

"worker.clj",
"parse.clj",
"gen_build.clj",
"namespace/parse.cljc",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vendor tools.reader, java.classpath, and finish vendoring tools.namespace. This is to avoid causing incompatibility with user code, and adds a few opportunities for caching.

"@rules_clojure_maven_deps//:org_clojure_data_json",
"@rules_clojure_maven_deps//:org_clojure_java_classpath",
"@rules_clojure_maven_deps//:org_clojure_tools_namespace"])
"@rules_clojure_maven_deps//:org_clojure_tools_deps_alpha"])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to vendor tools.deps.alpha because it's used by gen-build, not rules-clojure.compile

outs=["libcompile.jar"])

genrule(
name="bootstrap-libfs",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started being more strict about never reloading code, which caused problems compiling rules-clojure code using the rules clojure compiler, so bootstrap more of it.

(filter identity)
(apply merge)))

(defn ->lib->jar
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused code

[clojure.tools.namespace.find :as find]
[rules-clojure.fs :as fs]
[rules-clojure.parse :as parse])
[rules-clojure.namespace.find :as find]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just minor tweaks in gen-build to accommodate vendoring tools.namespace. Also drive-by delete some unused code

@arohner arohner force-pushed the speed branch 7 times, most recently from c45708e to d0eb1ea Compare August 29, 2025 14:56
(assert (loaded? ns) (print-str ns "should have already been compiled"))
(copy-classes (fs/->path cache-dir) (fs/->path dest-dir))))

(def no-aot '#{;; specs are too big to AOT
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This moved from banksy's deps.edn because any require on the source tree can trigger it. The proper way to fix is to move it into a bazel toolchain, but that's a lot more work

@arohner arohner force-pushed the speed branch 7 times, most recently from fce9c70 to 7c5bc2f Compare August 30, 2025 00:50
@arohner arohner force-pushed the speed branch 5 times, most recently from 82e1a93 to 09dcc95 Compare October 9, 2025 18:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants