Make npmInstall and npmSetup tasks cacheable #81

bercik · 2020-03-06T13:11:49Z

Currently npmInstall and npmSetup tasks are not cacheable by gradle build cache, thus when running CI pipelines on different machines those tasks needs to fully run every time which consumes a lot of time.

Please consider making those tasks cacheable. Instructions can be found here:
https://docs.gradle.org/current/userguide/build_cache.html#enable_caching_of_non_cacheable_tasks

deepy · 2020-03-06T13:55:18Z

Long story short, those tasks don't cache very well, however, if you can get a local mirror of node and npm you should be able to get the same experience for nodeSetup/npmSetup.

And if you're using webpack or similar you can set up a task for that with inputs/outputs and cache it, which should help speed things up

bsautel · 2020-03-08T15:33:32Z

Let's have a look to these tasks (I add nodeSetup to the list):

nodeSetup download Node.js. It uses for that a Gradle repository that automatically caches downloaded artifacts (exactly the same way as it does for Maven artifacts). So it is downloaded only once, but when the task is not up-to-date, it unpacks the archive (which is not very different in terms of performance as using a cache).
npmSetup uses npmInstall to install a specific version of npm. See below.
npmInstall: as of version 5 (as far as I remember), it caches the downloaded packages to avoid downloading them each time they are installed. This task is also already cached, not by Gradle directly, but by npm.

All those tasks are not directly cacheable by Gradle, but they rely on tools that use cache.

I know this have some limitations. For instance, if the build runs inside a container whose file system is not persistent, the cache will not be reused for next builds by default. This will be also the case for Gradle, but this can be fixed by changing its cache configuration. It would be necessary to do the same thing for also for npm and it is much simpler if all the cache is managed by Gradle. Once the Gradle cache is configured, everything can be cached.

We can assume (it would be interesting to test it though), that npmInstall for instance would be even faster if it was fully cached by Gradle. Indeed, npm install does not only extract the package contents. It also executes some post install scripts (for instance a node-gyp compile). Caching the node_modules directory would enable us to skip this whole process. But I think it could cause multiple issues:

Do the post install scripts make change only to the node_modules directory? I don't really know whether there are some conventions regarding this behavior. If it's not true, this means that each npmInstall user would be in charge of configuring perfectly the task output to ensure the cached result is perfectly right.
As far as I know, the node_modules directory does not contain exactly the same thing in the different platforms. For instance, node-gyp packages contain some platform-specific binaries. We should probably add the platform as a cache key to ensure the node_modules directory will not be reused across different platforms.
There are probably other things I didn't think about.

I think we should make some experiments to ensure that make them cacheable really makes sense and does not create some unwanted side effects.

deepy · 2020-03-09T07:44:13Z

nodeSetup and npmSetup is going to be interesting to cache as they both access the same directories and create symbolic links (which are going to be replaced with the file they point to if we use the Gradle cache).

Luckily even if you run on an entire clean machine nodeSetup and npmSetup aren't that heavy, especially if you got distBaseUrl and .npmrc pointing to local mirrors or ones close to you.

bercik · 2020-03-09T08:23:02Z

Thanks for replying.

You are right, nodeSetup and npmSetup don't seem to heavy, but the npmInstall does. We did the following in our project:

We've added custom npmCi task that caches node_modules folder:

task npmCi(type: NpmTask, dependsOn: [npmSetup]) {
    if (System.env['CI'] != null && System.env['CI'] == "TRUE") {
        args = ['ci']
    } else {
        args = ['install']
    }

    inputs.file(file("${project.projectDir}/frontend/package.json")).withPathSensitivity(PathSensitivity.RELATIVE)
    inputs.file(file("${project.projectDir}/frontend/package-lock.json")).withPathSensitivity(PathSensitivity.RELATIVE)

    outputs.dir "${project.projectDir}/frontend/node_modules"
    outputs.cacheIf { true }
}

In .gitlab-ci we are running the task as follows:
./gradlew npmCi --build-cache
We are caching .gradle folder in gitlab-ci:

  cache:
    paths:
      - .gradle

What do you think of that solution? It seems to be working, but I'm a little afraid about the possible problems you've mentioned above.

nukesz · 2020-04-02T14:25:12Z

Hi,

I've come up with the following code and it works for me:

npmInstall {
    inputs.files(project.files('package.json'))
    outputs.dir("${project.buildDir}/node_modules/")
}

task buildUI(type: NpmTask, dependsOn: npmInstall) {
    inputs.files(project.files('src'))
    outputs.dir("${project.buildDir}/dist/")
    args = ['run', 'build']
}

build.dependsOn buildUI

I'm running the gradle build and it's running npm install only when the package.json is changed and it runs npm build when the src folder is changed.

bsautel · 2020-04-03T12:24:06Z

@nukesz normally you don't need to configure the npmInstall as you did. The default configuration already contains those inputs and outputs.

I also would advice you to add the node_modules directory as an input of your buildUi task since a change in your project library will probably change the build output. I would also add the package.json file since you rely on its scripts when running run build. You can also use the NpxTask to avoid needing to declare the build script in the package.json file.

@nukesz, you are not talking about the same thing as @bercik. This issue is about caching, not just task execution avoidance. Task execution avoidance is the fact of skipping the task execution if the inputs did not change since last build and the outputs are still present and did not change. Caching is one step further. As Gradle knows which are the input and outputs of the task, it can save the output in a cache (which can be shared between multiple computers) and reuse the same output stored in the cache instead of running the task when the inputs are identical.

bsautel · 2020-04-03T12:25:54Z

@bercik is your experiment successful? Does it work as expected? Do you use only local cache or do you share the cache between multiple hosts?

remmeier · 2020-04-23T06:11:23Z

I would recommend merging npmInstall and build tasks. For example by using in package.json:

"build": "npm ci && ng build

This gives a single task in Gradle. So if the result (the compiled webapp) is already available, the build server can jump over setting up the node_modules directory. Which can save a lot of time.

And npm install should not be used, as it is not reproducible. Rather use npm ci.

deepy · 2020-04-23T07:48:23Z

@remmeier if the compiled web app is already available then both npmInstall (which can be configured to use npm ci) and your build task would be skipped as your build task is already up-to-date.

Though in your case there's a caveat with npm ci, which is that it'll delete your node_modules before running.

remmeier · 2020-04-23T08:14:40Z

does npm ci not do up-to-date checking? Actually in our project we make use of yarn, but apply the same techniques. This works well. Was hoping for NPM to get to the same level in that regard. With our single task yarn setup we manage to save around 30-60 second with every build per frontend project (which adds up, in particular for multi-project builds).

(there is a second new npm-based project on my side, i probably have some more insights soon in that regard)

bsautel · 2020-04-23T08:31:35Z

I did some experiments in another issue to know which one of Gradle up-to-date checking or npm install up-to-date checking and it appeared that Gradle was much faster than npm install (see here for Linux and here for Windows). Of course, it probably also depends on the project size (mainly the number of dependencies). But this was not really the result we were expecting.

I did not do this experiment with npm ci nor with yarn, that would be interesting to do it.

bsautel · 2020-04-23T08:33:32Z

And it sounds like npm ci always starts by removing the node_modules directory. So it does not seem to do any up-to-date checking attempt.

Here is the first line that is printed by npm ci

npm WARN prepare removing existing node_modules/ before installation

remmeier · 2020-04-23T08:39:31Z

I will have a look later and file a bug report if it really does not to up-to-date checking. I really like yarn & Gradle...

npmInstall and gradle cache restore will heavily depend on where the data is coming from. yarn is able to cache it locally, so it can be fast. The same for gradle. A close-by Gradle remote cache is also not so bad. But the public NPM registry should be avoided at all cost I think for anybody in need of r robust/fast builds. Your Gradle cache results look good for sure.

remmeier · 2020-04-23T08:40:48Z

the one things really good at yarn is that yarn install is super fast if all is up-to-date (around a second).

deepy · 2020-04-23T10:43:45Z

the one things really good at yarn is that yarn install is super fast if all is up-to-date (around a second).

I don't think realistically it can ever beat gradle doing up-to-date checks on your compile webapp task unless you either have a massive app or no dependencies.

bsautel · 2020-04-26T16:13:47Z

I did this experiment again using an Angular project with a multiple dependencies (902 packages at node_modules root) on an Ubuntu 20.04 laptop.

Note that I used the version 3 of the plugin (which will be released soon) because I discovered an issue in the version 2.2.3 regarding the yarn task outputs declaration (the task does not run when I remove the node_modules directory, whereas it should).

Command	Command execution time	Gradle up-to-date check
`yarn`	1 second	1 second
`npm install`	8 seconds	1 second
`npm ci`	40 seconds	1 second

We can see that yarn's up-to-date checking is fast, but not faster that Gradle. For npm install and npm ci, Gradle is clearly faster.

bsautel · 2020-04-27T07:58:06Z

I could investigate the issue I encountered, and it appears that the up-to-date checking of the yarn task works as expected, I probably did an error in my experiment.

But I discovered that the node_modules directory is declared twice as output, once as a directory and once as a file tree (unless you use the node_modules output filter). It is useless and slows down the Gradle's up-to-date checking. It takes 3 seconds instead of 1 (with the version 3 of the plugin).

I created the issue #96 and fixed it. It now also takes 1 second with the version 2.2.3 and the fix.

sergeykad · 2021-07-05T18:56:42Z

What about Yarn tasks ( YarnSetupTask, YarnInstallTask, YarnTask)? Can build cache be enabled for them?

deepy · 2021-07-05T20:32:03Z

Your own ad-hoc YarnTasks can absolutely be cached, where I work all our npm run dist takes nearly a minute each (and we got 4 in one project)
They cache extremely well, especially when paired with a remote build-cache

doctorpangloss · 2022-01-25T23:23:10Z

The problem is that yarnSetup, npmSetup and nodeSetup reference absolute paths, busting the cache. Just don't do that. gradle shouldn't be installing those things anyway.

To fix caching, add this to your root build.gradle:

plugins {
  id "com.github.node-gradle.node" // specify version!
}

import com.github.gradle.node.task.NodeSetupTask
import com.github.gradle.node.yarn.task.YarnSetupTask
import com.github.gradle.node.npm.task.NpmSetupTask

afterEvaluate {
  // disable all setup tasks because they bust the cache
  allprojects {
    [YarnSetupTask.class, NpmSetupTask.class, NodeSetupTask.class].forEach({ setupTask ->
      tasks.withType(setupTask).configureEach({ task ->
        task.configure {
          onlyIf { false }
        }
      })
    })
  }
}

deepy · 2022-01-26T16:54:21Z

There's some other workarounds in gradle/gradle#3525 as well, but with symbolic links being resolved to their file instead of being stored I'm not sure we ever could support the build cache in those tasks.

So if you're using download = false you'll get the same effect as the snippet above (but lose out on perfectly reproducible builds)

Cerber-Ursi · 2023-10-31T08:48:29Z

Bumping the thread, since the question I've run into seems to be related.

On Gradle 7.4, npmSetup task shows the following deprecation warning:

Cannot access output property 'npmDir' of task ':npmSetup' (see --info log for details). Accessing unreadable inputs or outputs has been deprecated. This will fail with an error in Gradle 8.0. Declare the task as untracked 
by using Task.doNotTrackState(). Consult the upgrading guide for further information: https://docs.gradle.org/7.4/userguide/upgrading_version_7.html#declare_unreadable_input_output

This is shown only on Windows - on Linux, even in WSL, there's no problem. I guess the source of the error is the symbolic links in bin, since they can't be opened by the ordinary Windows programs too. Does this mean that npmSetup on Windows will always be effectively uncacheable?

deepy · 2023-12-19T16:35:01Z

@Cerber-Ursi I'm unable to reproduce that issue using the example project, with or without download enabled, with or without a npmVersion set
https://github.com/node-gradle/gradle-node-plugin/blob/main/examples/simple-node/npm/build.gradle

Can you open a separate issue and provide a standalone project that reproduces the issue?

doctorpangloss · 2023-12-19T17:54:09Z

in my experience since my last update, it's better to never cache npmInstall and npmSetup. they are too large for gradle to cache efficiently anyway. however, the yarn-state or yarn-integrity files or similar should be specified as outputs, and if you want to be extra disciplined, node_modules must be set to read only. with yarn, use yarn 2+ and plug'n'play if you want caching.

Cerber-Ursi · 2023-12-20T05:13:44Z

Interesting. I've tried to do this again and wasn't able to reproduce the original problem either, including in the real project where it was initially noticed. Will try again later, not sure what exactly has changed in between.

adam-enko · 2024-01-30T11:19:10Z

EDIT: I made a mistake, the information below is incorrect.

~~I tried to enable caching by specifying outputs.cacheIf { true }~~

tasks.npmInstall {
    outputs.cacheIf { true }
}

~~I think it's relevant that I also have a custom NpmTask~~

val npmRunBuild by tasks.registering(NpmTask::class) {
  npmCommand.set(listOf("run", "build"))
  inputs.files("package.json", "webpack.config.js")
    .withPathSensitivity(RELATIVE)
  inputs.dir("src/main")
    .withPathSensitivity(RELATIVE)
  inputs.dir(layout.projectDirectory.dir("node_modules"))
    .withPathSensitivity(RELATIVE)
    outputs.dir("dist")
}

~~If I enable additional build cache logging (add org.gradle.caching.debug=true to gradle.properties) and run ./gradlew clean && ./gradlew assemble I can see this warning on the npmInstall task~~

> Task :npmInstall
[...]
Non-cacheable because Gradle does not know how file 'node_modules/.package-lock.json' was created (output property 'nodeModulesPackageLock'). Task output caching requires exclusive access to output paths to guarantee correctness (i.e. multiple tasks are not allowed to produce output in the same location). [OVERLAPPING_OUTPUTS]

~~Specifying the package-lock file as an output seems to help:~~

tasks.npmInstall {
    outputs.file(layout.projectDirectory.file("node_modules/.package-lock.json"))
        .withPropertyName("nodeModulesPackageLock")
    outputs.cacheIf { true }
}

~~I can then run~~

~~1. ./gradlew assemble > assemble1.log~~
~~2. ./gradlew clean -q~~
~~3. ./gradlew assemble > assemble2.log~~

~~In assemble2.log I can see that npmInstall is cached.~~

> Task :npmInstall FROM-CACHE

doctorpangloss · 2024-01-30T17:11:37Z

tasks.npmInstall {
    outputs.file(layout.projectDirectory.file("node_modules/.package-lock.json"))
        .withPropertyName("nodeModulesPackageLock")
    outputs.cacheIf { true }
}

I don't think this does what you think it does. Unless npmInstall contains node_modules/ as an output directory...

delete node_modules/
run npmInstall
observe gradle will restore just node_modules/.package-lock.json from the cache

adam-enko · 2024-01-30T18:15:56Z

I don't think this does what you think it does.

You're right! It wasn't caching as I expected, I misunderstood what was going on.

adam-enko · 2024-01-31T09:11:49Z

Hi 👋

I've add an example test of the behaviour I'd like to see in #301

Currently I see warning messages from Gradle that explain why npmInstall can't be cached:

Non-cacheable because Gradle does not know how file 'package-lock.json' was created (output property 'packageLockFileAsOutput'). Task output caching requires exclusive access to output paths to guarantee correctness (i.e. multiple tasks are not allowed to produce output in the same location). [OVERLAPPING_OUTPUTS]

Non-cacheable because Gradle does not know how file 'node_modules/.package-lock.json' was created (output property 'nodeModulesPackageLock'). Task output caching requires exclusive access to output paths to guarantee correctness (i.e. multiple tasks are not allowed to produce output in the same location). [OVERLAPPING_OUTPUTS]

From a quick look removing these properties as outputs should work

packageLockFileAsOutput conflicts with the input property, packageLockFileAsInput
nodeModulesPackageLock should be tracked by the getNodeModulesDirectory output

Do these changes seem reasonable? I'm not familiar with NPM development so I would appreciate any insights.

deepy · 2024-01-31T10:11:48Z

Conceptually you've got the right idea @adam-enko, but there's a few things that makes npm install cache extremely badly or outright broken

Let's start with the easy one, Gradle currently breaks symlink and this makes node_modules unhappy, this is currently being worked on and might get resolved by the linked PR in gradle/gradle#3525

Next one is harder because I don't have the data any longer and never thought to create benchmarks, but when your node_modules becomes "moderately large" you may find that Gradle spends minutes doing the up-to-date check
This is especially funny if it turns out to be out-of-date and you then have to wait for npm install to do the same up-to-date check

These might be fine locally, but if you have a remote build cache then publishing a 1-2GB node_modules is likely to cause issues
And even at 100-200MB, the cost of pulling it from the cache will likely be bad, especially if you have a npm mirror close to you

And with a remote build cache, what happens if you pull a node_modules using native components built for the wrong platform?
Adding the platform to the cache key by default would help this, but unless your node_modules has no native components then I suspect caching it won't make much sense for the above reasons

So to list the problems I remember:

npm install already does this check and is pretty fast at this point
Gradle's up-to-date checking can get really slow with large outputs and node_modules can easily become large
Some tools in the JS ecosystem happily read and write to a folder of their own in node_modules, this used to upset me but I can't find any modern tools doing this so this might not be relevant any longer
Every project I've personally come in contact with has included native components, making node_modules platform dependent, this used to upset me because a lot of these didn't build on FreeBSD
Caching the build of your app tends to make more sense as it's generally slow and npm install on a moderately up-to-date node_modules can be very quick

With these in mind, most people who provided feedback (and most people polled) preferred the solution where Gradle mostly leaves everything to npm and just keeps track of whether it has run successfully with the current package.json (and lockfiles) and only expecting node_modules/.package-lock.json as output
See the fastNpmInstall for more information

gradle-node-plugin/src/main/kotlin/com/github/gradle/node/NodeExtension.kt

Lines 127 to 134 in 16ecc0f

    
                * Use fast NpmInstall logic, excluding node_modules for output tracking resulting in a significantly faster 
        
                * npm install/ci configuration at the cost of slightly decreased correctness in certain circumstances. 
        
                * 
        
                * In practice this means that if you change node_modules through other means than npm install/ci 
        
                * NpmInstall tasks will continue being up-to-date, but if you're modifying node_modules through 
        
                * other tools you may have other correctness problems and surfacing them here may be preferred. 
        
                * 
        
                * https://docs.npmjs.com/cli/v8/configuring-npm/package-lock-json#hidden-lockfiles

So to wrap this up, do you really want to cache node_modules or are you ok with recreating it initially and having it be reported up-to-date like other tasks?
i.e. do you want it cached or are you interested in fast builds that work well with the remote cache but with the downside that your CI needs to do an npm install?

deepy · 2024-01-31T10:24:32Z

This doesn't mean there's no use for this, it's just that all evidence I know of suggests that people think they want this but they really might want something else

There might be a legitimate use-case for this, or you might want to look into your build and consider the setup
If you're an open-source project under a permissive license (like the MIT, BSD, Apache) I'll happily take a look at setup, if you're GPL I'll grudgingly take a look at the setup, if you're AGPL or commercial feel free to send me an email and we can discuss options

adam-enko · 2024-01-31T10:39:49Z

Thanks for the detailed response @deepy! It's a lot more clear to me now.

I'm working on Dokka - which uses node-gradle in one subproject https://github.com/Kotlin/dokka/blob/e502b2c1b1aa168bb3cd10d4d5f3ab883daffabd/dokka-subprojects/plugin-base-frontend/build.gradle.kts

The main issue is that npmInstall will run even though there haven't been any changes, which can take 1+ minute on CI even if there are no changes https://ge.jetbrains.com/s/7z6wysb7gfgf6/timeline#xhxfjcjtknpfe

I have made some changes in Kotlin/dokka#3479 that I expect will help.

Regarding the caching of NpmInstallTask, I can think of two options:

NpmInstallTask should be annotated with @DisableCachingByDefault, and update the KDoc to add the reasons you explained so clearly.

Perhaps go even further: the risk of having platform-dependent files in node_modules is too likely, and the resulting issues too subtle, and has a high risk of poor performance, and NpmInstallTask should be marked with doNotTrackState()?
Take my proposal PR NPM Install caching #301 further, so that if someone wants to make NpmInstallTask cacheable they can.

There's a risk that someone, like myself(!), might enable caching without fully understanding the risks and costs.

adam-enko · 2024-01-31T12:51:51Z

Has @LocalState been considered for node_modules/? It looks like a good fit.

https://docs.gradle.org/8.5/userguide/custom_tasks.html#sec:storing_incremental_task_state

However, if the state files are non-relocatable, then they can’t be shared via the build cache. Indeed, when the task is loaded from cache, any such state files must be cleaned up to prevent stale state from confusing the tool during the next execution. Gradle can ensure such stale files are removed if they are declared via task.localState.register() or if a property is marked with the @LocalState annotation.

sergeykad · 2024-01-31T15:23:45Z

@adam-enko That's an interesting proposal, but is not the directory too big for that? IIRC the default artifact size limit for caching is 100MB.

deepy · 2024-01-31T22:32:10Z

Haven't had the time to look into this properly yet, but on Windows I get a 218MB node_modules from the plugin-base-frontend which I guess isn't too bad
But running npm install locally with no caches whatsoever takes 15 seconds on my machine, which is desktop equipped with an ok Ryzen CPU, a NVMe SSD, a gigabit connection, and way too much memory so not exactly a very useful comparison
Though I'm curious on where the bottleneck is, because you might see even worse performance with the remote cache

Good news is we can easily test that by creating a snapshot and comparing the builds
Do you have a maven repository you could use for the snapshot? If not I can probably create a public one on my private nexus instance

deepy · 2024-01-31T22:36:14Z

But before that, can you add the --production flag to npmInstall or set the environment variable NODE_ENV to production and do a comparison?

deepy · 2024-01-31T22:41:04Z

That's apparently --omit=dev now

The npmInstall task is non-cachable node-gradle/gradle-node-plugin#81, and being always activated even for non-relevant task execution paths. Force it to stop being activated during the offline mode. KTI-2149

adam-enko · 2025-02-28T14:00:13Z

I've been thinking about this a lot recently, and I think there's a way to use Gradle to cache NPM's dependencies, meaning while npmInstall still won't be cacheable, at least it can run much faster.

IIUC node_modules/ is essentially just unpacked packages from NPM. Since the dependencies are just unpacked files, it should be possible and much faster to run npm install if the dependencies were pre-loaded into NPM's cache.

Gradle has a really good dependency caching system (even if only supports Maven and Ivy schemas). So what if we used Gradle to cache the NPM dependencies?

Before running npmInstall, first check how npm install --offline completes.

If it succeeds, then there's no problem, finish.
If it fails, then the plugin needs to download missing dependencies. Continue to step 2...

Get a list of all dependencies. Run npm list --all --json and parse the the result.
Add a custom Ivy repository to Gradle, and create Configurations for resolving the NPM dependencies.
Resolve the dependencies, and for each one add them to the NPM cache: npm cache add lib.tgz
Now run npm install --offline again. Because the dependencies are cached, it should work, and run much faster.

Gradle will still ignore node_modules/, avoiding problems with build cache overload or symlinks

What do you think, could it help?

deepy · 2025-03-01T11:36:06Z

It's an interesting solution, the immediate thing that comes to mind is that modeling this might be a little trickier, thinking especially of how to propagate the output of npm list --all --json to the subsequent task(s)
(For npmSetup this might work well with just an intermediate task whose input is the version and output the list)

But at a glance it looks like this is going to either need to run npm during configuration time or run the build twice when in a clean scenario (once to get all the necessary data and fail, and the second time to make use of the dependencies)

And the majority of the benefits would rely on the local cache being available (or pre-seeded, which seems more likely)

I'm on a train right now and my stop is coming up, so I'll get back to this a bit later
But have you compared it with the "fast npm install" functionality?

It doesn't make the tasks cacheable, but makes npmInstall participate in up-to-date checks by tracking its inputs (package.json, lockfiles, etc) and comparing them with the outputs (lockfiles, node_modules/.package-lock.json etc) https://docs.npmjs.com/cli/v10/configuring-npm/package-lock-json/?v=true#hidden-lockfiles
This has no benefits on CI or clean runs, but if you run multiple gradle builds it means everything will stay up-to-update (major benefit locally, benefits CI if you have multiple steps or retain the workspace between builds)

adam-enko · 2025-03-08T12:26:42Z

I've looked a lot more into this, and I think the focus on making the npm precursor tasks (nodeSetup, npmSetup, npmInstall) could be looked at in another way. Instead of trying to make them cacheable, instead can we make it so they only run if necessary?

(Also, my last message about caching dependencies via Gradle would help, but it's a separate improvement that is unrelated to the current thread, so I'll try to make a new issue.)

Scenario

The scenario that I think highlights the problem is a project that's built on GitHub CI with Gradle build cache enabled.

In terms of features it's the 'best case' scenario, because build cache is enabled and supported. However, because it's a fresh checkout, no local files are available. The tasks must either run, or be loaded from build cache. And because it's a fresh checkout fastNpmInstall can't help.

Let's say there's a custom npmBuild task that produces some output files, and it's completely cacheable.

// build.gradle.kts

val npmBuild by tasks.registering(NpxTask::class) {
  dependsOn(tasks.npmInstall)

  command.set("someCmd")
  inputs.dir("src")

  outputs.file("build/result.js")
  outputs.cacheIf { true }
}

Even though npmBuild is fully cacheable, because it depends on the npm precursor tasks they must always run, and they cannot be cached. However, really there's no reason to run them, they're not required.

> Task :nodeSetup SUCCESS
> Task :npmSetup SUCCESS
> Task :npmInstall SUCCESS
> Task :npmBuild FROM-CACHE

Just to clarify, I'm not criticising. I think this is the most optimal set up possible at present. However, here's my big idea: what if the precursor tasks weren't tasks? I think there's an alternative...

Setting up without tasks?

Recently an ex-Gradler came up with a tool for installing external tools https://github.com/jjohannes/gradle-demos/tree/main/toolchain-management/sample.

It uses some internal Gradle tools to make downloading and caching more convenient, but I don't think that's relevant here (the publicly available API can work just fine, even if they're a little less convenient). The most interesting thing to me is how toolchain-management avoids installing the tool unless the task that uses the task actually runs. It does this by using a BuildService that

I found this a bit confusing at first, because wouldn't you want to make sure the tool is installed? But actually, it makes a lot more sense in terms of work avoidance.

Code example

To give a quick code-based demo, here's how it works currently:

val npmSetup by tasks.registering {
  description = "Installs node exe. Cannot be cached."
}
val npmInstall by tasks.registering {
  description = "Installs npm dependencies into node_modules dir. Cannot be cached."
}
val actualNpmTaskThatDoesSomething by tasks.registering(NpmTask::class) {
  dependsOn(npmSetup, npmInstall)
  // Potentially cachable
}

Because npmSetup and npmInstall are non-cacheable tasks they will always need to run on a fresh CI checkout, despite actualNpmTaskThatDoesSomething being cacheable.

Instead, what if NpmTask would set up the executables and run npm install if required?

abstract class NpmTask2 : NpmTask() {
  
  @TaskAction
  fun exec2() {
    npmSetup() // Checks if npm exe is installed, downloading and installing it if necessary.
    npmInstall() // Runs `npm install` if necessary (i.e. `npm install --offline` fails).
    
    // now run the actual command
    runNpmCommand(...)
  }
}

val actualNpmTaskThatDoesSomething by tasks.registering(NpmTask::class) {
  
}

This skips over a lot of potential issues (e.g. what about tasks running in parallel? How to check if the executables are installed and up to date?). But hopefully it gives a good high level idea?

I think this results is a more idiomatic usage of npm install. Instead of npm install being a task, I would see it more as being a 'dependency resolution and transformation' tool. It takes the .tgz packages and unpacks them into node_modules. By itself, it doesn't do any 'work', it just sets up some dependencies.

Technical solution

I'm running a bit short on time so I'll keep this section short, but tl;dr let's make a NpmExecValueSource. Then we could experiment with the above.

A ValueSource will permit running npm commands in Gradle's configuration phase, and also make it possible to run multiple npm command during a single Task.

bsautel mentioned this issue Apr 27, 2020

yarn task's node_modules output directory is declared twice in version 2.2.3 #96

Closed

tschmidt01 mentioned this issue May 13, 2021

Support for Gradle build cache #170

Closed

adam-enko mentioned this issue Jan 31, 2024

NPM Install caching #301

Draft

edgarvonk mentioned this issue May 13, 2024

chore: enable Gradle build cache infonl/dimpact-zaakafhandelcomponent#1093

Merged

Make npmInstall and npmSetup tasks cacheable #81

Make npmInstall and npmSetup tasks cacheable #81

Comments

bercik commented Mar 6, 2020 • edited Loading

deepy commented Mar 6, 2020

bsautel commented Mar 8, 2020

deepy commented Mar 9, 2020

bercik commented Mar 9, 2020

nukesz commented Apr 2, 2020

bsautel commented Apr 3, 2020

bsautel commented Apr 3, 2020

remmeier commented Apr 23, 2020

deepy commented Apr 23, 2020

remmeier commented Apr 23, 2020 • edited Loading

bsautel commented Apr 23, 2020

bsautel commented Apr 23, 2020

remmeier commented Apr 23, 2020 • edited Loading

remmeier commented Apr 23, 2020

deepy commented Apr 23, 2020

bsautel commented Apr 26, 2020 • edited Loading

bsautel commented Apr 27, 2020

sergeykad commented Jul 5, 2021

deepy commented Jul 5, 2021

doctorpangloss commented Jan 25, 2022

deepy commented Jan 26, 2022

Cerber-Ursi commented Oct 31, 2023

deepy commented Dec 19, 2023

doctorpangloss commented Dec 19, 2023

Cerber-Ursi commented Dec 20, 2023

adam-enko commented Jan 30, 2024 • edited Loading

doctorpangloss commented Jan 30, 2024

adam-enko commented Jan 30, 2024

adam-enko commented Jan 31, 2024

deepy commented Jan 31, 2024

deepy commented Jan 31, 2024

adam-enko commented Jan 31, 2024

adam-enko commented Jan 31, 2024 • edited Loading

sergeykad commented Jan 31, 2024

deepy commented Jan 31, 2024

deepy commented Jan 31, 2024

deepy commented Jan 31, 2024

adam-enko commented Feb 28, 2025

deepy commented Mar 1, 2025

adam-enko commented Mar 8, 2025

Scenario

Setting up without tasks?

Code example

Technical solution

bercik commented Mar 6, 2020 •

edited

Loading

remmeier commented Apr 23, 2020 •

edited

Loading

remmeier commented Apr 23, 2020 •

edited

Loading

bsautel commented Apr 26, 2020 •

edited

Loading

adam-enko commented Jan 30, 2024 •

edited

Loading

adam-enko commented Jan 31, 2024 •

edited

Loading