Skip to content
This repository has been archived by the owner on Oct 23, 2019. It is now read-only.

Questions regarding persistent caching #12

Open
donasaur opened this issue Dec 24, 2015 · 11 comments
Open

Questions regarding persistent caching #12

donasaur opened this issue Dec 24, 2015 · 11 comments

Comments

@donasaur
Copy link

How does the persistent caching of this module work?

for the case of:
"Each file dependency's timestamp is checked against the cached output's start time"
Does this mean that once the process running the build script stops, when we decide to run the build script again and subsequently change a file, we have to do a complete (and not incremental) rebuild again? This would imply that the behavior isn't similar to as if watch's cache output was stored and loaded on disk. I'm OK with this; I'm just curious since I read somewhere that it was really hard to serialize/deserialize watch's cache anyways. However, I just want to verify my understanding.

"The emitted assets are checked for existence"
By emitted assets, do you mean checking for the presence of the assets created in the filesystem from the last compilation?

What is the cache composed out of? Is it just whatever webpack exposes to its provided callback (e.g., stats object) as well as some recorded timestamps for making the checks described here? Can this cache layer mimic the ideal behavior you would get if you were able to store and load watch's cache?

Sorry if this is a lot of questions!

@markfinger
Copy link
Owner

"Each file dependency's timestamp is checked against the cached output's start time"
Does this mean that once the process running the build script stops, when we decide to run the build script again and subsequently change a file, we have to do a complete (and not incremental) rebuild again? This would imply that the behavior isn't similar to as if watch's cache output was stored and loaded on disk. I'm OK with this; I'm just curious since I read somewhere that it was really hard to serialize/deserialize watch's cache anyways. However, I just want to verify my understanding.

Before the cache serves up any data, it checks to make sure that every file that was used to build the source is older than the start time of the build that generated the data. This is intended to prevent stale data being served up.

Yeah, caching webpack's a bit of a nightmare. It's internal cache has too many circular structures to serialize it without a massive of pre- and post-processing, plus the cache stores a heap of objects with prototypes (which are difficult to re-hydrate from a serialized state). I brain-dumped a little in webpack/webpack#250 (comment)

I think webpack would require a massive refactor for caching to be viable. Unfortunately, webpack's mostly a one-man show and Tobias - the guy who's built most of it - seems to have far too much work to find the time for a massive refactor of both the codebase and the ecosystem.

So, I'm not sure if it'll every get viable to perform granular caching. Which leaves us only the ability to cache it at a high level and do some simple file timestamp checks to invalidate it. The primary disadvantage of caching at such a high level is that every time a single file is changed, we need to rebuild the entire thing :(


"The emitted assets are checked for existence"
By emitted assets, do you mean checking for the presence of the assets created in the filesystem from the last compilation?

Yeah, it makes sure that the emitted files still exist. This prevents the cache serving up references to files which may not exist any longer.


What is the cache composed out of? Is it just whatever webpack exposes to its provided callback (e.g., stats object) as well as some recorded timestamps for making the checks described here? Can this cache layer mimic the ideal behavior you would get if you were able to store and load watch's cache?

The persistent cache stores the stats object from webpack, as well as a bunch of meta data (which is mostly used to detect when it's no longer valid). I don't believe there's a reasonable way to rehydrate webpack's internal cache so that it can resume. It's technically possible, but ends up being too CPU intensive to rebuild the circular structures, and too complicated to rehydrate the prototyped-objects.

@markfinger
Copy link
Owner

Oh, no worries about the questions. It's been a while since I've worked on this project, so it's nice to refresh myself on it.

@donasaur
Copy link
Author

Ah ok, thanks. Appreciate it!

These followup explanations were really clear. What you're saying is that it's super hard to reproduce webpack's internal cache without a webpack rewrite: once the webpack process is completely stopped so that the internal cache disappears from memory, even if you reboot the webpack process w/ watch mode again, you can never reclaim the original internal cache and are only left with the high-level cache, correct? Basically, those wrappers you described in your other response wrote to 2 kinds of cache: webpack's internal cache and the high-level cache that is your creation, and we lost the internal one when the master process was stopped.

The remaining high-level cache (served from disk) and build logic is set up such that if you change even a single file, regardless of whether or not you're watching or whether it passes all of the checks from the initial webpack-build bootup, webpack-build will rebuild the entire thing again because of the high-level cache's limitations, since this cache only stored and processed high-level information about the bundle(s) anyways because of the rehydration problem.

@markfinger
Copy link
Owner

Yeah, exactly.

@markfinger
Copy link
Owner

When it serves cached content and watch is true, it'll serve the cached content immediately, but also start a compiler in the background. This is mostly to avoid having to wait for the initial build after a small change.

@elsigh
Copy link

elsigh commented Mar 19, 2016

This thread is really helpful, thanks for the clarifications. @markfinger Do you have any tips for how to leverage the persistent cache in a team/build environment? It seems like it only helps provided nothing has changed from build to build in the JS world (which still counts!)

@markfinger
Copy link
Owner

@elsigh can you clarify what you mean by team/build environment?

@markfinger
Copy link
Owner

As a bit of an aside, I've spent the last few months tinkering with an alternative take on persistent caching: webpack/webpack#250 (comment) it's mostly in relation to the lessons learnt while reimplementing parts of webpack for a personal project

@elsigh
Copy link

elsigh commented Mar 22, 2016

For a team/build environment, I suspect there's maybe nothing as useful as the ability to apply a cache to mitigate webpack's first-build startup time. Right now, any change to a core part of our codebase means that we need to do several builds in order to then run unit tests and ensure that nothing was broken by the change. Additionally, whenever we start our local dev server (which crashes a fair amount) we need to wait for the build from scratch currently.

At deploy time, we run each of these builds and and run the tests again just to be sure so our build process is on the order of 4-5 minutes. We use CircleCI so we can actually persist a cache across builds which makes your persistent cache work very interesting - so thank you for your work on this.

@markfinger
Copy link
Owner

Cool. The cache wasn't designed for portability, but I image it would function well enough across the CI servers as long as the absolute paths match and the built artifacts are preserved across runs.

Not certain about Circle, but I think most CI services just spawn VM images, so the environments should match up and the cache should function. It's mostly the absolute path of the cwd that'll be of interest.

@elsigh
Copy link

elsigh commented Mar 22, 2016

I'll give it a try on Circle and I'll report back as to path/portability. Implementing it there as is would only benefit changes to our backend that don't touch the frontend because any change to any JS file would invalidate the cache and thus, square 1 and why I am interested in your other implementation. Still, I like the backend guys on my team - sometimes I'm even one of them ;)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants