Releases: arrdem/katamari
Incremental rebuilds
Release #17
This version of Katamari achieves incremental rebuilding of java-library
and clojure-library
targets - assuming targets are honest about reporting their own identifiers.
This release hinges on the katamari.roll.extensions/rule-id
function. rule-id
is a function of pretty much everything which would go into rule-build
, except instead of actually building the rule to products, it returns a string (I suggest a UUID or a shasum) which can be used uniquely identify the rule as a function of its inputs.
Consider a java-library
which depends only on one file. If that one file (and the parameters with which it is javac
'd) haven't changed, then there's nothing to be done. javac
is presumed to be a pure function of its inputs. So if we hash all the inputs we can generate a consistent key by which to address the product of the rule even if the rule itself may not be reproducible. Even within the context of a larger build, that rule need not be rebuilt unless its dependencies or other inputs change.
This release modifies the core katamari.roll.core/roll
function so that rather than building all rules in a buildgraph it uses the rule-id
function to try and hit in a filesystem cache before performing computation. This allows Katamari to address products - like uberjars - by content and avoid repeatedly building them whenever possible. The presumption is that the I/O time to interface with the cache is cheaper by far than any amount of recomputation.
There are also a lot of changes to the logging here - for detailed information on what Katamari is seeing and doing check out the logs - by default .kat.d/kat.log
.
Demo
Ok. First we clean the Katamari cache.
This lets us show that we're rebuilding things.
Cache cleaning is in terns of milliseconds - here we use 0 to purge everything.
Ideally Katamari would monitor your cache and its hit rate, trying to find where some re-use probability boundary is and automatically trimming the cache for you.
$ ./kat clean-cache 0
{
"intent": "json",
"deleted-keys": [
"1c180065-b16c-5c44-93b8-97a53b201988",
"36f9c458-aafe-5b60-a115-b7d4a2047d96",
"3e3b4a85-dbbb-568e-8c32-d082c012b690",
"3c054aff-d403-5203-8551-ec5c956b1dd4",
"0440703f-ffa6-5590-810f-c06183b98c58",
"0e132c54-b388-5703-ac2b-37f807bc0a75",
"3342903d-7564-5b7d-a5da-7f7d9a37e880"
]
}
So the cache is empty.
Let's build the example javac
target.
$ time ./kat compile example/javac
{
"example/javac": {
"type": "katamari.roll.extensions.jvm/product",
"from": "example/javac",
"mvn/manifest": "roll",
"deps": null,
"paths": [
"/home/arrdem/doc/dat/git/arrdem/katamari/.kat.d/buildcache/36/36f9c458-aafe-5b60-a115-b7d4a2047d96/target"
],
"id": "36f9c458-aafe-5b60-a115-b7d4a2047d96"
},
"intent": "json"
}
./kat compile example/javac 0.05s user 0.02s system 15% cpu 0.463 total
Cool.
And if we check that target directory out -
$ tree /home/arrdem/doc/dat/git/arrdem/katamari/.kat.d/buildcache/36/36f9c458-aafe-5b60-a115-b7d4a2047d96/target
/home/arrdem/doc/dat/git/arrdem/katamari/.kat.d/buildcache/36/36f9c458-aafe-5b60-a115-b7d4a2047d96/target
└── demo
└── Demo.class
1 directory, 1 file
Awesome!
We built our classfile and it's in the product cache at the ID 36f9c458-aafe-5b60-a115-b7d4a2047d96
.
If we try to compile the same rule again, we should get the same product back.
$ ./kat compile example/javac
{
"example/javac": {
"type": "katamari.roll.extensions.jvm/product",
"from": "example/javac",
"mvn/manifest": "roll",
"deps": null,
"paths": [
"/home/arrdem/doc/dat/git/arrdem/katamari/.kat.d/buildcache/36/36f9c458-aafe-5b60-a115-b7d4a2047d96/target"
],
"id": "36f9c458-aafe-5b60-a115-b7d4a2047d96"
},
"intent": "json"
}
Yep! We got the same 36f9c458-aafe-5b60-a115-b7d4a2047d96
key twice, so we got to skip shelling out to javac
the second time around (not that it's really that much faster for a single input file and no plugins etc.)
And if we go build an uberjar containing our test code -
$ time ./kat compile example/clj+uberjar
{
"example/javac": {
"type": "katamari.roll.extensions.jvm/product",
"from": "example/javac",
"mvn/manifest": "roll",
"deps": null,
"paths": [
"/home/arrdem/doc/dat/git/arrdem/katamari/.kat.d/buildcache/36/36f9c458-aafe-5b60-a115-b7d4a2047d96/target"
],
"id": "36f9c458-aafe-5b60-a115-b7d4a2047d96"
},
"example/clj": {
"type": "katamari.roll.extensions.clj/product",
"from": "example/clj",
"mvn/manifest": "roll",
"paths": [
"/home/arrdem/doc/dat/git/arrdem/katamari/example/src/main/clj"
],
"deps": {
"example/javac": null,
"org.clojure/clojure": null
},
"id": "3e3b4a85-dbbb-568e-8c32-d082c012b690"
},
"example/clj+uberjar": {
"type": "katamari.roll.extensions.jar/product",
"from": "example/clj+uberjar",
"mvn/manifest": "roll",
"paths": [
"/home/arrdem/doc/dat/git/arrdem/katamari/.kat.d/buildcache/1c/1c180065-b16c-5c44-93b8-97a53b201988/target/clj-standalone.jar"
],
"deps": {
"example/javac": null,
"example/clj": null
},
"id": "1c180065-b16c-5c44-93b8-97a53b201988"
},
"intent": "json"
}
./kat compile example/clj+uberjar 0.05s user 0.02s system 5% cpu 1.263 total
We see that we hit the cache for our javac@36f9c458-aafe-5b60-a115-b7d4a2047d96
- good, the example/clj@3e3b4a85-dbbb-568e-8c32-d082c012b690
target had to be built because we blew the cache away. But it's a clojure target so there's actually no work to be done. We did however have to create a new uberjar for example/clj+uberjar@1c180065-b16c-5c44-93b8-97a53b201988
. Which created a jarfile at .kat.d/buildcache/1c/1c180065-b16c-5c44-93b8-97a53b201988/target/clj-standalone.jar
per the :jar-name
of the rule and the cache key.
If we build that uberjar target again, we'll see the ID and consequently the build product repeated.
$ ./kat compile example/clj+uberjar | jq '.["example/clj+uberjar"].id'
"1c180065-b16c-5c44-93b8-97a53b201988"
Awesome.
If we use Katamari to uberjar itself twice, we should really be able to see the difference due to caching the jar the second time around.
$ export kt=me.arrdem/katamari+uberjar
$ time ./kat compile $kt | jq ".[\"$kt\"].id"
"3c054aff-d403-5203-8551-ec5c956b1dd4"
./kat compile $kt 0.09s user 0.04s system 1% cpu 8.333 total
jq ".[\"$kt\"].id" 0.00s user 0.00s system 0% cpu 8.332 total
$ time ./kat compile $kt | jq ".[\"$kt\"].id"
"3c054aff-d403-5203-8551-ec5c956b1dd4"
./kat compile $kt 0.06s user 0.02s system 72% cpu 0.103 total
jq ".[\"$kt\"].id" 0.01s user 0.00s system 5% cpu 0.103 total
Same command twice - same products twice.
Just an 8.2
second difference in how long it takes.
👿
Limitations
This release ONLY captures the incremental build algorithm based on the current state of the filesystem. It has no ability to introspect the consequences of changes. For instance one could imagine asking the build system to report on what should be rebuilt "in comparison to" some previous version control state. This is the heart of any incremental testing strategy.
For instance one could imagine asking what targets were impacted given a map of {<path> : #{:added :changed :deleted}}
, let alone a richer diffing structure like the one behind the Rollfile
parser.
As the roll
function just goes and does a build, what gets built vs what gets extracted from a cache isn't currently well captured. Really extracting that trace information would require interacting with (intercepting or wrapping) the logging system.
There's also a known (fatal 😬) bug in the change detection algorithm. At present, Java and Clojure targets only detect changes in their direct :deps
map and other structure. They DON'T detect change in what that dep map resolves to. Which means that if you bump your global version pinning, nothing gets rebuilt. See #16 which tracks this.
Generalized rolling
This release replaces the previous special purpose uberjar and classpath tasks with a single, unified compile task which correctly (re)builds the entire reached build rule tree. This makes incremental compilation using content addressing of build products possible and provides the API required for custom compilation targets like efficient docker image creation.
See the much updated architecture document for an exploration of how all this works.
Also included in this release:
- A new
java-library
target which willjavac
source files to produce classfiles - A new
jar
target which only jars the:paths
of compiled direct dependencies, instead of jaring all paths of all transitives likeuberjar
does.
Extension based tasks, server intents & fuzzy hinting
Intents
The major blocker on using Katamari for "real" work is the present lack of a vehicle for implementing recognizable tasks such as repl
or test
. Because Katamari attempts to defer the majority of its heavy lifting to the server, there wasn't an obvious way to implement tasks which would occur in or take over the user's shell.
Katamari 0.0.3 introduces the concept of "response intents". Katamari's server handlers (should) respond uniformly with JSON payloads. The "intent"
key in the response JSON map is special, and may be used by server side tasks to communicate to the kat
script a desired interpretation of the response. For instance the "msg"
or "message"
intent may be used to display the "msg"
field of the response as a message to the user. Other examples of intents include the "sh"
or "subshell"
intent which may be used to execute a BASH script in the user's environment, and "exec"
which causes the kat
script to exec
into another program.
Intents are a secondary layer by which the server can provide default behavior. The kat
response handling flags -r
, -m
and -j
and their long equivalents all take precedence over the server's suggested interpretation. This also allows the user to debug and inspect server responses.
This makes it possible to implement recognizable repl
and test
tasks in the future, and enables the new restart-server
command which leverages the user's shell to bounce the server.
$ ./kat restart-server
Attempting to subshell...
{"intent":"sh","sh":"$KAT stop-server; $KAT start-server"}
Scheduling shutdown
Starting server ...
Waiting for it to become responsive \
Started server!
http port: 3636
nrepl port: 3637
Extension based tasks
Previously, Katamari's server handlers were pretty magical - the handler stack was hard coded in web_server.clj
, which made live development difficult due to having to reload both handlers and the server all the time and made the handlers themselves not particularly first class as a user couldn't inject more handlers as they pleased.
Katamari 0.0.3 introduces a middleware registry of sorts, into which dynamically loaded extensions can register their own middlewares either directly or using the defwrapper
and defhandler
macros. These two macros capture the main patterns for existing Katamari request middlewares - wrappers wrap an entire request, either monkeypatching the response or providing request context. Handlers on the other hand handle requests. The middleware registry solves the interactive development problem by re-compiling all the handlers and wrappers into a single middleware stack when changes to the registry occur.
The request server now requests the current compiled handler from the registry.
In extreme circumstances, the registry can be flushed using reset-middleware!
. This is mostly useful for dealing with handlers which should be removed from the registry, or changes to the root handler.
All existing handlers and wrappers have been successfully refactored over to this interface, except for the help
handler which needs some exceptional behavior.
$ ./kat help
Katamari - roll up your software into artifacts!
Usage:
./kat [-r|-j|-m] [command] [flags] [targets]
Flags:
-r, --raw - print the raw JSON of Katamari server responses
-j, --json - format the JSON of Katamari server responses
-m, --message - print only the message part of server responses
Commands:
meta:
Collect metadata for available tasks and handlers.
Tasks and handlers are expected to participate in handling requests starting
with `meta` by conjing their metadata into the response `[:body :metadata]`. By
failing to participate in this protocol, tasks and handlers become invisible to
reflective tools like `list-tasks` and `help` which rely on `meta`.
start-server:
A task which will cause the server to be started.
Reports the ports on which the HTTP and nREPL servers are running.
help:
Describe a command in detail, or sketch all commands
list-tasks:
Enumerate the available tasks.
Do not report their help information.
restart-server:
Reboot the server, reloading the config and other options.
show-request:
Show the request and config context as seen by the server (for debugging)
classpath:
Usage:
./kat classpath [deps-options] -- [target ...]
Compute a classpath and libs mapping for selected target(s)
uberjar:
Usage:
./kat uberjar [target]
Given a single target, produce an uberjar according to the target's config.
WARNING: As this is a special case of the compile task, it may be removed.
stop-server:
Shut down the server after all outstanding requests complete.
list-targets:
Enumerate all the available Rollfile targets.
Fuzzy hinting
As a minor UX improvement and a proof that the new middleware/handler machinery is up to the job, Katamari now ships with a request wrapper which offers suggestions of fuzzily matched possible commands when the user inputs a command for which no handler is found.
$ ./kat list
No handler found for request:
["list"]
Did you mean one of:
list-tasks
list-targets
classpath
Initial improvements
- Improved bootstrapping: deps defaults files are created for you now
- Improved help messages: added some usage & longer help strings
- Fixed
./kat help <task>
so that it actually works - Made the
kat
script smarter about response formatting, updated usage accordingly - Cut over to incremental buildgraph creation from whole buildgraph every time
Initial bootstrapped state
This is the first state of Katamari which has been bootstrapped onto itself. This release has been cut as much for posterity as for anything. Katamari itself is still fairly woefully incomplete, and offers no meaningful user or developer API for implementing compilation or injecting middleware.
That will change! But we're bootstrapped! So 👏 🎉 🍰