Release Incremental rebuilds · arrdem/katamari

Release #17

This version of Katamari achieves incremental rebuilding of java-library and clojure-library targets - assuming targets are honest about reporting their own identifiers.

This release hinges on the katamari.roll.extensions/rule-id function. rule-id is a function of pretty much everything which would go into rule-build, except instead of actually building the rule to products, it returns a string (I suggest a UUID or a shasum) which can be used uniquely identify the rule as a function of its inputs.

Consider a java-library which depends only on one file. If that one file (and the parameters with which it is javac'd) haven't changed, then there's nothing to be done. javac is presumed to be a pure function of its inputs. So if we hash all the inputs we can generate a consistent key by which to address the product of the rule even if the rule itself may not be reproducible. Even within the context of a larger build, that rule need not be rebuilt unless its dependencies or other inputs change.

This release modifies the core katamari.roll.core/roll function so that rather than building all rules in a buildgraph it uses the rule-id function to try and hit in a filesystem cache before performing computation. This allows Katamari to address products - like uberjars - by content and avoid repeatedly building them whenever possible. The presumption is that the I/O time to interface with the cache is cheaper by far than any amount of recomputation.

There are also a lot of changes to the logging here - for detailed information on what Katamari is seeing and doing check out the logs - by default .kat.d/kat.log.

Demo

Ok. First we clean the Katamari cache.
This lets us show that we're rebuilding things.
Cache cleaning is in terns of milliseconds - here we use 0 to purge everything.
Ideally Katamari would monitor your cache and its hit rate, trying to find where some re-use probability boundary is and automatically trimming the cache for you.

$ ./kat clean-cache 0
{         
  "intent": "json",
  "deleted-keys": [
    "1c180065-b16c-5c44-93b8-97a53b201988",
    "36f9c458-aafe-5b60-a115-b7d4a2047d96",
    "3e3b4a85-dbbb-568e-8c32-d082c012b690",
    "3c054aff-d403-5203-8551-ec5c956b1dd4",
    "0440703f-ffa6-5590-810f-c06183b98c58",
    "0e132c54-b388-5703-ac2b-37f807bc0a75",
    "3342903d-7564-5b7d-a5da-7f7d9a37e880"
  ]
}

So the cache is empty.
Let's build the example javac target.

$ time ./kat compile example/javac
{                                   
  "example/javac": {
    "type": "katamari.roll.extensions.jvm/product",
    "from": "example/javac",
    "mvn/manifest": "roll",
    "deps": null,
    "paths": [
      "/home/arrdem/doc/dat/git/arrdem/katamari/.kat.d/buildcache/36/36f9c458-aafe-5b60-a115-b7d4a2047d96/target"
    ],
    "id": "36f9c458-aafe-5b60-a115-b7d4a2047d96"
  },
  "intent": "json"
}
./kat compile example/javac  0.05s user 0.02s system 15% cpu 0.463 total

Cool.
And if we check that target directory out -

$ tree /home/arrdem/doc/dat/git/arrdem/katamari/.kat.d/buildcache/36/36f9c458-aafe-5b60-a115-b7d4a2047d96/target
/home/arrdem/doc/dat/git/arrdem/katamari/.kat.d/buildcache/36/36f9c458-aafe-5b60-a115-b7d4a2047d96/target
└── demo
    └── Demo.class

1 directory, 1 file

Awesome!
We built our classfile and it's in the product cache at the ID 36f9c458-aafe-5b60-a115-b7d4a2047d96.
If we try to compile the same rule again, we should get the same product back.

$ ./kat compile example/javac
{
  "example/javac": {
    "type": "katamari.roll.extensions.jvm/product",
    "from": "example/javac",
    "mvn/manifest": "roll",
    "deps": null,
    "paths": [
      "/home/arrdem/doc/dat/git/arrdem/katamari/.kat.d/buildcache/36/36f9c458-aafe-5b60-a115-b7d4a2047d96/target"
    ],
    "id": "36f9c458-aafe-5b60-a115-b7d4a2047d96"
  },
  "intent": "json"
}

Yep! We got the same 36f9c458-aafe-5b60-a115-b7d4a2047d96 key twice, so we got to skip shelling out to javac the second time around (not that it's really that much faster for a single input file and no plugins etc.)

And if we go build an uberjar containing our test code -

$ time ./kat compile example/clj+uberjar
{
  "example/javac": {
    "type": "katamari.roll.extensions.jvm/product",
    "from": "example/javac",
    "mvn/manifest": "roll",
    "deps": null,
    "paths": [
      "/home/arrdem/doc/dat/git/arrdem/katamari/.kat.d/buildcache/36/36f9c458-aafe-5b60-a115-b7d4a2047d96/target"
    ],
    "id": "36f9c458-aafe-5b60-a115-b7d4a2047d96"
  },
  "example/clj": {
    "type": "katamari.roll.extensions.clj/product",
    "from": "example/clj",
    "mvn/manifest": "roll",
    "paths": [
      "/home/arrdem/doc/dat/git/arrdem/katamari/example/src/main/clj"
    ],
    "deps": {
      "example/javac": null,
      "org.clojure/clojure": null
    },
    "id": "3e3b4a85-dbbb-568e-8c32-d082c012b690"
  },
  "example/clj+uberjar": {
    "type": "katamari.roll.extensions.jar/product",
    "from": "example/clj+uberjar",
    "mvn/manifest": "roll",
    "paths": [
      "/home/arrdem/doc/dat/git/arrdem/katamari/.kat.d/buildcache/1c/1c180065-b16c-5c44-93b8-97a53b201988/target/clj-standalone.jar"
    ],
    "deps": {
      "example/javac": null,
      "example/clj": null
    },
    "id": "1c180065-b16c-5c44-93b8-97a53b201988"
  },
  "intent": "json"
}
./kat compile example/clj+uberjar  0.05s user 0.02s system 5% cpu 1.263 total

We see that we hit the cache for our javac@36f9c458-aafe-5b60-a115-b7d4a2047d96 - good, the example/clj@3e3b4a85-dbbb-568e-8c32-d082c012b690 target had to be built because we blew the cache away. But it's a clojure target so there's actually no work to be done. We did however have to create a new uberjar for example/clj+uberjar@1c180065-b16c-5c44-93b8-97a53b201988. Which created a jarfile at .kat.d/buildcache/1c/1c180065-b16c-5c44-93b8-97a53b201988/target/clj-standalone.jar per the :jar-name of the rule and the cache key.

If we build that uberjar target again, we'll see the ID and consequently the build product repeated.

$ ./kat compile example/clj+uberjar | jq '.["example/clj+uberjar"].id'
"1c180065-b16c-5c44-93b8-97a53b201988"

Awesome.

If we use Katamari to uberjar itself twice, we should really be able to see the difference due to caching the jar the second time around.

$ export kt=me.arrdem/katamari+uberjar
$ time ./kat compile $kt | jq ".[\"$kt\"].id"
"3c054aff-d403-5203-8551-ec5c956b1dd4"
./kat compile $kt  0.09s user 0.04s system 1% cpu 8.333 total
jq ".[\"$kt\"].id"  0.00s user 0.00s system 0% cpu 8.332 total
$ time ./kat compile $kt | jq ".[\"$kt\"].id"
"3c054aff-d403-5203-8551-ec5c956b1dd4"
./kat compile $kt  0.06s user 0.02s system 72% cpu 0.103 total
jq ".[\"$kt\"].id"  0.01s user 0.00s system 5% cpu 0.103 total

Same command twice - same products twice.
Just an 8.2 second difference in how long it takes.

👿

Limitations

This release ONLY captures the incremental build algorithm based on the current state of the filesystem. It has no ability to introspect the consequences of changes. For instance one could imagine asking the build system to report on what should be rebuilt "in comparison to" some previous version control state. This is the heart of any incremental testing strategy.

For instance one could imagine asking what targets were impacted given a map of {<path> : #{:added :changed :deleted}}, let alone a richer diffing structure like the one behind the Rollfile parser.

As the roll function just goes and does a build, what gets built vs what gets extracted from a cache isn't currently well captured. Really extracting that trace information would require interacting with (intercepting or wrapping) the logging system.

There's also a known (fatal 😬) bug in the change detection algorithm. At present, Java and Clojure targets only detect changes in their direct :deps map and other structure. They DON'T detect change in what that dep map resolves to. Which means that if you bump your global version pinning, nothing gets rebuilt. See #16 which tracks this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incremental rebuilds

Demo

Limitations