Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What's the intended workflow of the tarball cache? #393

Closed
wycats opened this issue Sep 21, 2016 · 41 comments
Closed

What's the intended workflow of the tarball cache? #393

wycats opened this issue Sep 21, 2016 · 41 comments

Comments

@wycats
Copy link
Member

wycats commented Sep 21, 2016

I understand how the feature works, but as I'm not using it, I'm not quite sure what the exact intended workflow is.

@bestander @skevy @kittens

@bestander
Copy link
Member

You mean the offline-mirror setting?

@wycats
Copy link
Member Author

wycats commented Sep 21, 2016

@bestander yeah

@bestander
Copy link
Member

bestander commented Sep 21, 2016

So we use it for offline installs: CI and internal projects.

  1. .npmrc contains yarn-offline-mirror key with path to a folder with .tar.gz files
  2. When you add dependencies via yarn add, they are downloaded from npm repo but yarn.lock stores local path to the .tar.gz file instead of "https://registry.npmjs.org/...
  3. The mirror folder is checked in (that is ~1000 files and 30 MB for React Native which is fine)
  4. When you do yarn install, node_modules are installed without going to network

@bestander
Copy link
Member

Personally, this is the single reason why I joined the effort on this project :)
BTW a related topic to discuss #394

@skevy
Copy link
Contributor

skevy commented Sep 21, 2016

This is particularly useful in a monorepo setting (we have one at Exponent and then of course FB had theirs). All the packages and applications in the repo can share this same module cache. It makes CI/CD (especially monorepo CI/CD) great again. :-)

@wycats
Copy link
Member Author

wycats commented Sep 22, 2016

Personally, this is the single reason why I joined the effort on this project :)

Seems legit. 😄

Just so I'm clear, the primary goal is to make it possible to take a package.json that references packages from the registry. You would like to check the tarballs of those packages into version control, together with a yarn.lock, and you would like yarn install to use those checked in tarballs instead of the registry. You would also like this to work for indirect dependencies.

Is that correct?

@bestander
Copy link
Member

That is correct

@bestander
Copy link
Member

As my colleague, @kentaromiura, pointed out.
We don't have to check in the .tar.gz files, originally we considered using a shared storage for a project but a source control system was as good

@conartist6
Copy link

So I know the NPM guys intentionally made life difficult because they used the resolved field (when present) as the complete and only identity for a package. Their reasoning had to do with private registries. Somebody not targeting a public registry at all may have complete module name overlap. Even more ominously, a package caching server (so far the only sane way to deal with trying to use npm in a high dependability environment) could have been configured to selectively shadow some packages (or provide them past the point in time when the original source had deleted them). To complete the chaos, the npm registry owners proved that they are willing, under appropriate pressure, to themselves reassign a module name to a different project, as happened with kik prompting the famed left-pad disaster.

This makes the logic around upgrading a previously locked package awfully tricky.

Npm basically doesn't allow it. Either you keep everything locked exactly as it is, or you lose all your locked down versions at the same time. Or you manually chop out a big chunk of shrinkwrap and splicing in an updated version. o_O

How are we handling this? Do we see ourselves as having a certain contract with the user like, say, never accidentally at upgrade time replacing an installed package with something that is from a completely different codebase?

@wycats
Copy link
Member Author

wycats commented Sep 22, 2016

The reason I asked this question is that I want to propose a slightly different workflow that I think satisfies the requirements that people had when designing it in the first place, but with fewer rough edges.

The basic idea is that the yarn.lock would continue to store the original name, version and sha from the original npm registry, and that the offline mirror configuration would instruct the registry resolver to use the mirror instead.

(In Cargo, the mirroring configuration happens at a level above the individual sources, as I describe below)

This is how we designed the mirror feature in Cargo, and it has a few nice properties:

  1. If you just need the offline mirror for production deploys (this is true of some, but not all people), you can continue to use the regular workflow in development.
  2. It scales nicely to other kinds of mirrors. The way Cargo structures this is that you can specify, in the equivalent of .npmrc, that you would like a mirror to replace a particular source. This makes it pretty easy for build bots and other environments along those lines to strictly enforce whatever requirements they need (using whichever resolver strategy they want) without imposing the restriction unnecessarily on developers.
  3. On the other hand, if the tarballs are checked into version control, it's still possible to use them in development by specifying the local mirror.
  4. It also has the nice property of tracking the distinction between "version locked from a particular source" and the physical location of a mirror, which could change based on configuration.

@conartist6 I'm trying to understand the problem you're describing.

How are we handling this? Do we see ourselves as having a certain contract with the user like, say, never accidentally at upgrade time replacing an installed package with something that is from a completely different codebase?

The way bundler and Cargo handle (what I believe you mean by) this problem is by using a "precise" version for every dependency in the lockfile that includes enough information to precisely identify it (and its source) but not including mirror information (which is supplied by configuration).

In Cargo, mirrors are required to share precisely the same sha as the original upstream source, and any replacements that change the source code are specified in Cargo.toml (Cargo's equivalent of package.json) using [replace] sections:

[replace]
"foo:0.1.0" = { git = 'https://github.com/example/foo' }
"bar:1.0.2" = { path = 'my/local/bar' }

This means: "if you see foo v0.1.0 in the dependency graph, replace it with the code located at this git repo, and if you see bar v1.0.2 in the dependency graph, replace it with this local code I checked in. It works with any kind of resolver that normally works in Cargo, so it's pretty flexible. The rationale is that mirrors can be largely transparent to development if they share a SHA (and are largely operational concerns), while changing the code is a development concern and should be specified in the manifest.

Aside: Bundler works a little bit differently, but along the same lines: because bundler only allows a single name/version dependency in the entire dependency graph, a specified dependency in the top-level Gemfile always supersedes the registry. In other words, if you specify a dependency in a Gemfile, it's as if you had said [replace] in Cargo.

In Cargo (and bundler's) case, we also require that replacements share a name and version number with the original package they're replacing, and the feature is largely used for emergency patches or things like "the bug is fixed on master but the author hasn't gotten around to publishing it yet".

I'm not entirely sure whether any of that directly targets the issue you're talking about. Can you clarify it a bit?

@conartist6
Copy link

Yes, yes it definitely does target the issue I'm describing. Npm lacks hashes, and with that restriction they were forced to treat source URLs as the best guarantee of authenticity.

The setup that you are describing sounds quite attractive because it understands (on multiple levels) the difference between a cached copy and an override. Npm, infuriatingly, can't, which is why upgrading a cached package is such a nightmare.

@bestander
Copy link
Member

bestander commented Sep 22, 2016

@wycats do you propose moving resolved lines from yarn.lock into a registry file that would map left-pad:1.1.0 to a remote or local location of a tarball?

@bestander
Copy link
Member

bestander commented Sep 22, 2016

For large single-repo projects the experience looks like this.
A developer wants to add left-pad to a project.
There is an .npmrc at the root that haskpm-offline-mirror=./npm-offline-packages.

The developer writes

yarn add left-pad@~1.1.1

The new dependency is added to package.json and yarn.lock file and the tarball is downloaded to npm-offline-packages.

The nice thing about this approach is in simplicity, it is easy to review and easy to connect the dots.

If there is another project in the repository and another developer does

yarn add left-pad@~1.1.1

Then existing tarball will be reused and yarn.lock will refer it.

What would be different with the proposed approach?

@conartist6
Copy link

conartist6 commented Sep 22, 2016

If I understand what @wycats is saying, nothing in the workflow your describe is different for the user. The major difference is what data is stored in yarn.lock.

I understand your earlier response to suggest that yarn.lock would contain something concrete like:

{
    location: "${npm-offline-packages}/leftpad-1.2.1.tgz",
    dependencies: structure recurses...
}

This is the npm approach. The suggestion here is to, in yarn.lock, store:

{
    descriptor: "leftpad-1.2.1",
    hash: "d672jef2",
    sourceRepository: "https://registry.npmjs.org",
    dependencies: structure recurses...
}

This way the cache directory is searched instead of being directly referenced, which means it is trivial to change the cache directory configuration, either as a one-off or between dev/prod/test.

It also means that the program can easily know that if the user says yarn upgrade left-pad, this means that it should go out to the npm central registry, fetch the newest left-pad available there, store it in the local cache, and update the descriptor and hash in yarn.lock.

@wycats
Copy link
Member Author

wycats commented Sep 22, 2016

@wycats do you propose moving resolved lines from yarn.lock into a registry file that would map left-pad:1.1.0 to a remote or local location of a tarball?

Not quite. The resolved lines would go away, and a package would be identified uniquely by its "precise version" (which can be a tarball sha, but could also be things like git sha for example).

With no configuration, we'd use the "default remote" for a particular package. If you configure a mirror in .npmrc, we'd map to the original package to that mirror instead during fetching and confirm that the fetched package matches the integrity information (sha).


A developer wants to add left-pad to a project.
There is an .npmrc at the root that has kpm-offline-mirror=./npm-offline-packages.

The developer writes

yarn add left-pad@~1.1.1

The new dependency is added to package.json and yarn.lock file and the tarball is downloaded to npm-offline-packages.

The main difference so far is that the way to specify npm-offline-packages would be a little more general, allowing you to specify a replacement mirror for any source, not just the npm registry.

The nice thing about this approach is in simplicity, it is easy to review and easy to connect the dots.

I agree, it's nice 😄

If there is another project in the repository and another developer does

yarn add left-pad@~1.1.1

Then existing tarball will be reused and yarn.lock will refer it.

What would be different with the proposed approach?

The main distinction is that the yarn.lock would have the original source rather than the resolved tarball, together with enough information to uniquely identify it (see @ConArtist above) and the .npmrc is responsible for mapping the "npm registry" to the local mirror in the monorepo.

You can still look at the integrity information in the yarn.lock, and you can still look at the in-repo cache of packages to connect the dots. You could also very easily move the location of the in-repo cache (or add additional ones at appropriate places in the hierarchy).

For people who are not using mono-repos, it makes it possible to use the same feature for production deploys without disturbing the normal development workflow, as well as paves the way for other kinds of mirrors that can work together with the in-repo mirror strategy. In other words, it's just a more general way of describing the same thing.

Finally, It also helps to rationalize what's going on with npm link and the local yarn cache (there's no good reason that the local yarn cache behaves so differently from the offline mirror). Linking a package on your machine wouldn't disturb the lockfile, but would rather register a local mirror for the original package. At least for me, I really want uses of npm link on a local machine to be invisible to other developers.

Generally, decoupling the "original source + unique identification" from "where we actually get the packages in practice" makes interactions between mirrors, links, and other similar features more reliable, but doesn't really change any fundamental capabilities.

@bestander
Copy link
Member

That does make sense, it would also solve #394.
What would be used as a key in the replacement map?

In yarn.lock we use strings like yeoman-welcome@^1.0.0 which are specific to a particular package.json of a direct or transitive dependency.
Should it be the full http path?

@wycats
Copy link
Member Author

wycats commented Sep 23, 2016

@bestander the way Cargo works is that there is a notion of "package id", which is a fully qualified package name that is guaranteed to be unique (each resolver gets to decide what is required for uniqueness).

Here's an example Cargo.toml I just whipped up:

[package]
name = "ohai"
version = "0.1.0"
authors = ["Yehuda Katz <[email protected]>"]

[dependencies]

libc = "*"

And here's the lockfile Cargo generates:

[root]
name = "ohai"
version = "0.1.0"
dependencies = [
 "libc 0.2.16 (registry+https://github.com/rust-lang/crates.io-index)",
]

[[package]]
name = "libc"
version = "0.2.16"
source = "registry+https://github.com/rust-lang/crates.io-index"

Cargo uses the word "source" to mean roughly the same thing as Yarn uses the word "resolver" for.

In this case, since the registry doesn't allow people to mutate existing crates, the fully resolved name of the registry, plus the package's name and version are sufficient.

For illustration, let me add another package to the Cargo.toml, this one a git dependency:

[package]
name = "ohai"
version = "0.1.0"
authors = ["Yehuda Katz <[email protected]>"]

[dependencies]

libc = "*"
docopt = { git = "https://github.com/docopt/docopt.rs" }

Here's the output from cargo build (more or less the equivalent of yarn install):

$ cargo build
    Updating git repository `https://github.com/docopt/docopt.rs`
    Updating registry `https://github.com/rust-lang/crates.io-index`
   Compiling lazy_static v0.2.1
   Compiling regex-syntax v0.3.5
   Compiling utf8-ranges v0.1.3
   Compiling memchr v0.1.11
   Compiling winapi-build v0.1.1
   Compiling aho-corasick v0.5.3
   Compiling kernel32-sys v0.2.2
   Compiling strsim v0.5.1
   Compiling rustc-serialize v0.3.19
   Compiling winapi v0.2.8
   Compiling thread-id v2.0.0
   Compiling thread_local v0.2.7
   Compiling regex v0.1.77
   Compiling docopt v0.6.83 (https://github.com/docopt/docopt.rs#be283ce2)
   Compiling ohai v0.1.0 (file:///C:/Code/ohai)
    Finished debug [unoptimized + debuginfo] target(s) in 89.36 secs

And the updated Cargo.lock:

[root]
name = "ohai"
version = "0.1.0"
dependencies = [
 "docopt 0.6.83 (git+https://github.com/docopt/docopt.rs)",
 "libc 0.2.16 (registry+https://github.com/rust-lang/crates.io-index)",
]

[[package]]
name = "aho-corasick"
version = "0.5.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
 "memchr 0.1.11 (registry+https://github.com/rust-lang/crates.io-index)",
]

[[package]]
name = "docopt"
version = "0.6.83"
source = "git+https://github.com/docopt/docopt.rs#be283ce2a00305998e89d98122cdad06e59dede4"
dependencies = [
 "lazy_static 0.2.1 (registry+https://github.com/rust-lang/crates.io-index)",
 "regex 0.1.77 (registry+https://github.com/rust-lang/crates.io-index)",
 "rustc-serialize 0.3.19 (registry+https://github.com/rust-lang/crates.io-index)",
 "strsim 0.5.1 (registry+https://github.com/rust-lang/crates.io-index)",
]

[[package]]
name = "kernel32-sys"
version = "0.2.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
 "winapi 0.2.8 (registry+https://github.com/rust-lang/crates.io-index)",
 "winapi-build 0.1.1 (registry+https://github.com/rust-lang/crates.io-index)",
]

[[package]]
name = "lazy_static"
version = "0.2.1"
source = "registry+https://github.com/rust-lang/crates.io-index"

[[package]]
name = "libc"
version = "0.2.16"
source = "registry+https://github.com/rust-lang/crates.io-index"

[[package]]
name = "memchr"
version = "0.1.11"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
 "libc 0.2.16 (registry+https://github.com/rust-lang/crates.io-index)",
]

[[package]]
name = "regex"
version = "0.1.77"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
 "aho-corasick 0.5.3 (registry+https://github.com/rust-lang/crates.io-index)",
 "memchr 0.1.11 (registry+https://github.com/rust-lang/crates.io-index)",
 "regex-syntax 0.3.5 (registry+https://github.com/rust-lang/crates.io-index)",
 "thread_local 0.2.7 (registry+https://github.com/rust-lang/crates.io-index)",
 "utf8-ranges 0.1.3 (registry+https://github.com/rust-lang/crates.io-index)",
]

[[package]]
name = "regex-syntax"
version = "0.3.5"
source = "registry+https://github.com/rust-lang/crates.io-index"

[[package]]
name = "rustc-serialize"
version = "0.3.19"
source = "registry+https://github.com/rust-lang/crates.io-index"

[[package]]
name = "strsim"
version = "0.5.1"
source = "registry+https://github.com/rust-lang/crates.io-index"

[[package]]
name = "thread-id"
version = "2.0.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
 "kernel32-sys 0.2.2 (registry+https://github.com/rust-lang/crates.io-index)",
 "libc 0.2.16 (registry+https://github.com/rust-lang/crates.io-index)",
]

[[package]]
name = "thread_local"
version = "0.2.7"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
 "thread-id 2.0.0 (registry+https://github.com/rust-lang/crates.io-index)",
]

[[package]]
name = "utf8-ranges"
version = "0.1.3"
source = "registry+https://github.com/rust-lang/crates.io-index"

[[package]]
name = "winapi"
version = "0.2.8"
source = "registry+https://github.com/rust-lang/crates.io-index"

[[package]]
name = "winapi-build"
version = "0.1.1"
source = "registry+https://github.com/rust-lang/crates.io-index"

[metadata]
"checksum aho-corasick 0.5.3 (registry+https://github.com/rust-lang/crates.io-index)" = "ca972c2ea5f742bfce5687b9aef75506a764f61d37f8f649047846a9686ddb66"
"checksum docopt 0.6.83 (git+https://github.com/docopt/docopt.rs)" = "<none>"
"checksum kernel32-sys 0.2.2 (registry+https://github.com/rust-lang/crates.io-index)" = "7507624b29483431c0ba2d82aece8ca6cdba9382bff4ddd0f7490560c056098d"
"checksum lazy_static 0.2.1 (registry+https://github.com/rust-lang/crates.io-index)" = "49247ec2a285bb3dcb23cbd9c35193c025e7251bfce77c1d5da97e6362dffe7f"
"checksum libc 0.2.16 (registry+https://github.com/rust-lang/crates.io-index)" = "408014cace30ee0f767b1c4517980646a573ec61a57957aeeabcac8ac0a02e8d"
"checksum memchr 0.1.11 (registry+https://github.com/rust-lang/crates.io-index)" = "d8b629fb514376c675b98c1421e80b151d3817ac42d7c667717d282761418d20"
"checksum regex 0.1.77 (registry+https://github.com/rust-lang/crates.io-index)" = "64b03446c466d35b42f2a8b203c8e03ed8b91c0f17b56e1f84f7210a257aa665"
"checksum regex-syntax 0.3.5 (registry+https://github.com/rust-lang/crates.io-index)" = "279401017ae31cf4e15344aa3f085d0e2e5c1e70067289ef906906fdbe92c8fd"
"checksum rustc-serialize 0.3.19 (registry+https://github.com/rust-lang/crates.io-index)" = "6159e4e6e559c81bd706afe9c8fd68f547d3e851ce12e76b1de7914bab61691b"
"checksum strsim 0.5.1 (registry+https://github.com/rust-lang/crates.io-index)" = "50c069df92e4b01425a8bf3576d5d417943a6a7272fbabaf5bd80b1aaa76442e"
"checksum thread-id 2.0.0 (registry+https://github.com/rust-lang/crates.io-index)" = "a9539db560102d1cef46b8b78ce737ff0bb64e7e18d35b2a5688f7d097d0ff03"
"checksum thread_local 0.2.7 (registry+https://github.com/rust-lang/crates.io-index)" = "8576dbbfcaef9641452d5cf0df9b0e7eeab7694956dd33bb61515fb8f18cfdd5"
"checksum utf8-ranges 0.1.3 (registry+https://github.com/rust-lang/crates.io-index)" = "a1ca13c08c41c9c3e04224ed9ff80461d97e121589ff27c753a16cb10830ae0f"
"checksum winapi 0.2.8 (registry+https://github.com/rust-lang/crates.io-index)" = "167dc9d6949a9b857f3451275e911c3f44255842c1f7a76f33c55103a909087a"
"checksum winapi-build 0.1.1 (registry+https://github.com/rust-lang/crates.io-index)" = "2d315eee3b34aca4797b2da6b13ed88266e6d612562a0c46390af8299fc699bc"

The github package we added added the following entry (plus all of its dependencies, of course):

[[package]]
name = "docopt"
version = "0.6.83"
source = "git+https://github.com/docopt/docopt.rs#be283ce2a00305998e89d98122cdad06e59dede4"
dependencies = [
 "lazy_static 0.2.1 (registry+https://github.com/rust-lang/crates.io-index)",
 "regex 0.1.77 (registry+https://github.com/rust-lang/crates.io-index)",
 "rustc-serialize 0.3.19 (registry+https://github.com/rust-lang/crates.io-index)",
 "strsim 0.5.1 (registry+https://github.com/rust-lang/crates.io-index)",
]

We include the name and version of course, but also a fully qualified source name, which in the case of git repositories, includes the precise revision at the point where the lockfile was generated. Also note that all of the package versions in the lockfile are precise versions, rather than a version range, which makes the dependency graph easier to work with.

This also allows users to tighten versions (from "*" to "1.3.0") without causing Cargo to believe that the lockfile has changed and trigger updates.

The bottom of the lockfile is a series of checksums in a single, non-source-specific form (SHA256):

[metadata]
"checksum aho-corasick 0.5.3 (registry+https://github.com/rust-lang/crates.io-index)" = "ca972c2ea5f742bfce5687b9aef75506a764f61d37f8f649047846a9686ddb66"
"checksum docopt 0.6.83 (git+https://github.com/docopt/docopt.rs)" = "<none>"
"checksum kernel32-sys 0.2.2 (registry+https://github.com/rust-lang/crates.io-index)" = "7507624b29483431c0ba2d82aece8ca6cdba9382bff4ddd0f7490560c056098d"
"checksum lazy_static 0.2.1 (registry+https://github.com/rust-lang/crates.io-index)" = "49247ec2a285bb3dcb23cbd9c35193c025e7251bfce77c1d5da97e6362dffe7f"
"checksum libc 0.2.16 (registry+https://github.com/rust-lang/crates.io-index)" = "408014cace30ee0f767b1c4517980646a573ec61a57957aeeabcac8ac0a02e8d"
"checksum memchr 0.1.11 (registry+https://github.com/rust-lang/crates.io-index)" = "d8b629fb514376c675b98c1421e80b151d3817ac42d7c667717d282761418d20"
"checksum regex 0.1.77 (registry+https://github.com/rust-lang/crates.io-index)" = "64b03446c466d35b42f2a8b203c8e03ed8b91c0f17b56e1f84f7210a257aa665"
"checksum regex-syntax 0.3.5 (registry+https://github.com/rust-lang/crates.io-index)" = "279401017ae31cf4e15344aa3f085d0e2e5c1e70067289ef906906fdbe92c8fd"
"checksum rustc-serialize 0.3.19 (registry+https://github.com/rust-lang/crates.io-index)" = "6159e4e6e559c81bd706afe9c8fd68f547d3e851ce12e76b1de7914bab61691b"
"checksum strsim 0.5.1 (registry+https://github.com/rust-lang/crates.io-index)" = "50c069df92e4b01425a8bf3576d5d417943a6a7272fbabaf5bd80b1aaa76442e"
"checksum thread-id 2.0.0 (registry+https://github.com/rust-lang/crates.io-index)" = "a9539db560102d1cef46b8b78ce737ff0bb64e7e18d35b2a5688f7d097d0ff03"
"checksum thread_local 0.2.7 (registry+https://github.com/rust-lang/crates.io-index)" = "8576dbbfcaef9641452d5cf0df9b0e7eeab7694956dd33bb61515fb8f18cfdd5"
"checksum utf8-ranges 0.1.3 (registry+https://github.com/rust-lang/crates.io-index)" = "a1ca13c08c41c9c3e04224ed9ff80461d97e121589ff27c753a16cb10830ae0f"
"checksum winapi 0.2.8 (registry+https://github.com/rust-lang/crates.io-index)" = "167dc9d6949a9b857f3451275e911c3f44255842c1f7a76f33c55103a909087a"
"checksum winapi-build 0.1.1 (registry+https://github.com/rust-lang/crates.io-index)" = "2d315eee3b34aca4797b2da6b13ed88266e6d612562a0c46390af8299fc699bc"

We added this after the initial release of Cargo, and it ensures that we have a secure hash for any source, even though there are theoretical risks associated with the hashing strategy used by git, for example.

Cargo also has a command that you can use to get the fully qualified name of a package in the Cargo.lock:

$ cargo pkgid docopt
https://github.com/docopt/docopt.rs#docopt:0.6.83

This package id contains just enough information to uniquely identify a package in the dependency graph (it's the identifier used in the dependency graph structure, in fact). When describing a replacement, it's always fine to use a more general name (like docopt) as long as it uniquely identifies the package in the lockfile. If the specified name is ambiguous (which is rare -- it can only happen when --flat wouldn't work in Yarn and you're referencing a duplicated package), Cargo helps you identify unambiguous names to use.

@bestander
Copy link
Member

@wycats, thanks for giving some background info.
Let's think how we can improve the current situation with yarn.

In the lock file we have name (implied), version and where it gets resolved to.
I suppose we can have a separate file with resolution replacements:

yarn-resolutions.lock (located at a monorepo root file)

https://registry.npmjs.org/yeoman-welcome/-/yeoman-welcome-1.0.1.tgz#f6cf198fd4fba8a771672c26cdfb8a64795c84ec ./local-mirror/yeoman-welcome-1.0.1.tgz

Would that be in par with Cargo features?

@bestander
Copy link
Member

ping @wycats

@glenjamin
Copy link

I'm unclear from my limited use of yarn how much of a role the resolved field plays when installing from lockfile on a second machine - but the npm behaviour of always phoning home for this was very frustrating. It led me to always remove this field from the generated npm-shrinkwrap.

The specific flow we wanted was that I would have my development machine point at the public registry, but CI would go via a proxy.

I think the way @wycats describes storing a reference to the package source separately from the package location would help enable this workflow.

As a strawman, something along the lines of this might work:

The checked-in lock file states the expected source and a hash

abab@^1.0.0:
  version "1.0.3"
  source "registry"
  hash "b81de5f7274ec4e756d797cd834f303642724e5d"

Those sources would have default locations, and then separate environment-specific not-checked in config could override source locations - possibly giving an ordered list?

@mgcrea
Copy link
Contributor

mgcrea commented Nov 15, 2016

I've been testing custom yarn-cache folder for a few weeks, but I'm encountering a lot of Tarball is not in network and can not be located in cache errors (on gitlab-ci, etc.) that would only be solved with a yarn --no-lockfile or rm yarn.lock; yarn. By any chance, would you have encountered such errors?

"error \"bunyan-1.8.4.tgz\": Tarball is not in network and can not be located in cache (\"/srv/player/.yarn-cache/bunyan-1.8.4.tgz\")", "stdout": "yarn install v0.17.0\n[1/4] Resolving packages...\n[2/4] Fetching packages...\ninfo Visit https://yarnpkg.com/en/docs/cli/install for documentation about this command."

@UnrememberMe
Copy link

Somewhat related but maybe a little stray from the topic, is there any thoughts about dealing with node module installation scripts? There are plenty of node modules download additional codes during installation and thus the results of yarn install is still non-repeatable even with the offline mirror.

@bestander
Copy link
Member

The problem is that Node.js install scripts can execute any bash script, there is no way to reliably achieve offline mode without authors' cooperation.
The only thing that we can do is not use such modules or help the authors to think of offline mode.

@UnrememberMe
Copy link

How about caching the post-install results instead of pre-install results? I understand that will not work well with any codes with native platform dependencies, but that is not a problem any of our current solutions address either.

The worst can happen is that we can not use those badly behaving modules, which will be as bad as current situation.

@bestander
Copy link
Member

Then it is as good as saving node_modules somewhere, for example, checking them into source control.

@UnrememberMe
Copy link

UnrememberMe commented Feb 8, 2017

I agree that will be equivalent, with added benefits that offline mirror currently provides:

  • only store one checksumed tgz file vs. hundreds (if not thousands) of files
  • the stored node_modules can be shared among different projects;
  • code review for updating a dependency remain reasonably sane;

@bestander
Copy link
Member

Well, it might work but it may be complex.

Yarn is already tracking a diff between caches and whatever happens after install scripts, see phantomFiles https://github.com/yarnpkg/yarn/blob/master/src/package-linker.js#L124 and beforeFiles in https://github.com/yarnpkg/yarn/blob/master/src/package-install-scripts.js#L280.

If you feel that you could make sense of it and have some sort of offline storage for build artifacts go ahead, send an RFC.
My concern packages may reach out beyond their own limits and even use their own cache folders, basically everything that a bash script can do although I don't know if any significant number do so.

@bestander
Copy link
Member

@UnrememberMe, better discuss this in a separate issue/RFC

@UnrememberMe
Copy link

Agreed. Will open a separate issue/RFC.

@gregsheremeta
Copy link

@UnrememberMe did you happen to open one? Could you link to it?

@UnrememberMe
Copy link

I have started but have not finished the RFC yet. I should submit the pull request for RFC no later than Thursday 2/16/17. @gregsheremeta

@jackhamburger
Copy link

Is there a concern about the offline mirror will start to bloat if it is stored in source control? Every minor version change will leave the old .tgz files.
Is there a plan for a clean up command that empties the offline mirror of module .tgzs that are not used in the yarn.lock?

@bestander
Copy link
Member

bestander commented Feb 17, 2017 via email

@UnrememberMe
Copy link

@gregsheremeta The RFC was posted on Feb 16, 2017.

@gregsheremeta
Copy link

@UnrememberMe do you have a link to it? I don't see it in https://github.com/yarnpkg/rfcs

@UnrememberMe
Copy link

@gregsheremeta I updated the title for yarnpkg/rfcs#50 The initial RFC title was not correct.

@Vanuan
Copy link

Vanuan commented Apr 28, 2017

That's something I'm interested in too. I'll leave my thoughts here:

AFAIK, current yarn workflow is the following:

  1. Set $HOME to isolated location for each build
  2. Use yarn install --pure-lock-file to make sure people check in yarn.lock
  3. Use yarn add/remove commands to change package.json and yarn.lock
  4. When there are conflicts, remove yarn.lock and do clean yarn install :( <--- not sure if it's correct

Each step could be improved:

  1. Yarn should be able to use some kind of cache corruption prevention by default (is it possible to do using a file in $HOME?)
  2. Pure lock file should be used by default, otherwise people wouldn't commit yarn.lock and conflicts are inevitable
  3. Some people edit package.json manually, but it isn't easy to edit yarn.lock, so yarn install --pure-lock-file should still install packages present in package.json, but missing from yarn.lock
  4. Some kind of smart yarn git-resolve would be nice

How is it related to tarball cache feature? Not much. Intended use case of it is to completely avoid yarn install during CI and saving all dependencies in the repository.

As we can see, local tarball cache would resolve all CI issues but somewhat complicate developer workflow issues.

Ideally, we'd want online repository but cacheable.
Yarn already has .cache thus tarball cache doesn't look useful.
I.e. tarball cache wouldn't be needed if each yarn install updated $HOME/ cache without corruption. And committing dependencies to git wouldn't be needed if it was guaranteed that npm registry is always online in case cache is corrupted.

TL;DR It would be nice if workflow wouldn't change while achieving performance and stability improvement.

@raix
Copy link

raix commented Sep 27, 2017

@Vanuan in case of merge conflicts theres a merged pr making yarn install solve merge conflicts (#3544)

@Vanuan
Copy link

Vanuan commented Sep 27, 2017

Yeap, I'm aware of that. That comment was back in April.

Still not clear whether you should use yarn install --pure-lock-file in this case or whether yarn install changes package versions or just resolves conflicts.

@BYK
Copy link
Member

BYK commented Oct 26, 2017

Closing this issue since it seems to be resolved, mostly by #2970 but possibly with other PRs.

Please create a new issue if you want to propose more features of fixes around this.

@varemenos
Copy link

varemenos commented Jan 3, 2020

The last part of this blog post is referencing this ticket. Is it still valid or should the blog post be updated?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

14 participants