Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow caching arbitrary directories during build #44

Open
docent opened this issue Sep 20, 2018 · 6 comments
Open

Allow caching arbitrary directories during build #44

docent opened this issue Sep 20, 2018 · 6 comments

Comments

@docent
Copy link

docent commented Sep 20, 2018

In order to facilitate at least partially incremental builds in Gradle, it would be good to cache some data between builds. One way to do that would be to utilize Gradle's Build Cache but that's a more advanced approach which requires that the tasks support caching. (I suppose build cache support could be added to this buildback at some point, as an option)

For now I propose creating a "mirror" of cached directories under $CACHE_DIR/.gradle-additional-cache-dirs by creating corresponding symlinks, if the environmental variable $GRADLE_ADDITIONAL_CACHE_DIRS is set. The variable would contain relative directories to be cached, separated by :, e.g. build:src/main/frontend/node_modules

Some caveats:

  • This solution is not perfect, as, in many cases, Gradle's up-to-date checks rely on absolute paths, which are, unfortunately, different on every Heroku build. But it's still a step forward.
  • No directories with : in directory name (same limitation as with, e.g. $PATH variable
  • No validation whatsoever (e.g. nested directories, etc.)

Code in this commit. I can create a PR if necessary.

This also solves #19

@jkutner
Copy link
Contributor

jkutner commented Sep 21, 2018

@docent thanks for pulling this together. In general, I like this approach. but a few concerns come to mind:

  • Isn't the build/ dir required at runtime? If it's a symlink to the cache, it won't be a included in the slug.
  • Are the node_modules directories needed at runtime? Probably not, in which case your approach is good.

So we might actually need a different mechanism for each of these. I think we should just cache the build/ dir by default (whether the user asks for it or not). But because it's needed in the slug, we can use the cache_copy function from lib/common.sh. You can see an example of this in the Scala buildpack where it copies the target directory into the app and the copies the target directory back out.

But then I think you should keep the GRADLE_ADDITIONAL_CACHE_DIRS stuff for things like your node_modules in a sub dir.

What do you think?

@docent
Copy link
Author

docent commented Sep 21, 2018

@jkutner Yeah, you are right about the build dir, and I was gonna mention that, but forgot about it. To be precise, the build directory itself is not required, but e.g. for Spring Boot app, the uber-jar (which is the only file that is required at runtime, as it contains all the libraries, classes, etc.) is placed in that directory (typically in build\libs) , so as you said it would not be available at runtime.

The way I solved it for my example (Spring Boot) project was this:

task(stage, type: Copy) {
    dependsOn(bootJar)
    from bootJar
    into "$project.rootDir/dist"
}

So basically my stage task copies the uber-jar to the dist directory in the root directory, so it's available at runtime. No need to copy the build dir around.

Now, of course, we cannot anticipate all the possible usages that come to people's minds. Someone might have a project where the entire build directory is required at runtime. But that means that they would need to copy all the libraries to that dir (as they "disappear" after the build, when the cache is not available). But that is not a common situation, I would say. To summarize, I need to put a bit more thought into it. Your idea to copy the build dir is correct at the cost of copying some files I suppose.

Also, bear in mind that people might change the build directory from within their Gradle configuration files. Again, this is rather atypical, albeit possible.

Speaking of node_modules, they are indeed not required at runtime, so we are safe here.

@jkutner
Copy link
Contributor

jkutner commented Oct 1, 2018

I definitely don't want to remove the build dir from the slug by default. Lots of non-Spring-Boot apps need stuff from it (for example, Ratpack writes a build/install/*/bin script to start the app).

I think copying the build dir into the cache is fine. We do this for the Scala buildpack. It adds a small amount of time to the build, but it's probably worth the gain from incremental compile.

@docent
Copy link
Author

docent commented Oct 2, 2018

@jkutner I did some more tests and, unfortunately, achieving incremental builds might be difficult with current Heroku build system, which makes a new, differently named directory for each build.

Even if we link or copy build directory as you suggest, Gradle still thinks the files were removed, e.g:

> Task :compileJava
Deleting stale output file: /tmp/build_923540ac65964f881d15b6eca4390a2d/build/classes/java/main
Task ':compileJava' is not up-to-date because:
Output property 'destinationDir' file /tmp/build_b94c9c5ca6d2be4eded04c8c53aad9a0/build/classes/java/main has been removed.

That's because in many cases Gradle uses absolute paths for up-to-date checks.

The only way I was able to achieve almost incremental builds was by setting build directory via GRADLE_OPTS to reside directly in build cache, as then the path does not change with the build. It does not solve the issue with node_modules though, as they must be in source, as previously mentioned.

To sum up:

  • Copying build dir will not gain us much at this point - unless the builds happened in the same directory everytime, which I believe is impossible (I was thinking about using chroot even... not sure it would work)
  • Doing linking tricks with additional cache dirs is still useful

@jkutner
Copy link
Contributor

jkutner commented Oct 2, 2018

@docent in the Scala buildpack we move everything into a consistently named directory, but I have to say that I would be weary of doing this with Gradle by default. It's hard to maintain (notice the GEM_PATH to account apps that use Scala with the Ruby buildpack). Incremental compile also means users need to know when they need to know when run a clean build, and often have the experience of pushing, failing a build, reconfiguring to run clean, and then pushing again, which is painful at times.

So yea, maybe we should put incremental compile aside for now.

This issue is about cached dirs anyways :) the Scala buildpack again has a good example (albeit not configurable). The question would be what env var to use.

I think HEROKU_GRADLE_CACHED_DIRS is clear. Maybe a little verbose, but ensures no collisions. Maybe GRADLE_CACHED_DIRS is sufficient.

@docent
Copy link
Author

docent commented Oct 5, 2018

@jkutner Yeah, I think GRADLE_CACHED_DIRS would be good. I just hope it conveys the intent well - Gradle has build cache and it would be bad if ppl confused that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants