Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: relative cache support #114

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
203 changes: 203 additions & 0 deletions designs/2023-relative-cache/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,203 @@
Repo: eslint/eslint
- Start Date: 2023-02-14
- RFC PR: <https://github.com/eslint/rfcs/pull/106>
- Authors: Christian Schulz (@cschulz) and cs6cs6

# Relative cache location strategy

## Summary

To be able to share the created eslint cache between different systems f.e. local machine and CI, there is a need for a relative path stored in the cache.

## Motivation

The overall motivation for doing this is performance during the eslint process.

Most people are running eslint on local developer machine and additional on the CI workers.
In high changing repositories there are several pull requests at the same time changing different files requiring linting all the time.
This is a time consuming process depending on the enabled rules like type checking rules.

If the CI worker is able to cache its last run, it can just pick this cache up and use it.
Also developers can reuse this cache or commit it as part of a repository for the CI.

## Detailed Design

I have a branch with the following design already. It is not fully tested, so I'm not 100% sure this will be the final implementation but I think it is a good start.

The LintResultCache takes a file path as parameter to find or store a file cache entry.

The suggested approach is to truncate the given absolute path of the file to a relative path in the cache.

### Example existing cache file

Here is an example existing cache file for an extremely small project with two source files. It has been formatted for readability.

You can see that the full path of the file is included in the filenames, '/home/USER/git/samplecode'. There is also an obscure field, the value
1k0siqs. That is the hash-value of a stringified ConfigArray object that contains all the configuration for the entire eslint run. It is there so
that any rule changes or configuration changes will cause a complete eslint recheck on all files.

```
[
{
"/home/USER/git/samplecode/src/formatter.ts": "1",
"/home/USER/git/samplecode/src/vite-env.d.ts": "2"
},
{
"size": 2584,
"mtime": 1695823503788,
"results": "3",
"hashOfConfig": "4"
},
{
"size": 75,
"mtime": 1695814197580,
"results": "5",
"hashOfConfig": "4"
},
{
"filePath": "6",
"messages": "7",
"suppressedMessages": "8",
"errorCount": 0,
"fatalErrorCount": 0,
"warningCount": 0,
"fixableErrorCount": 0,
"fixableWarningCount": 0
},
"1k0siqs",
{
"filePath": "9",
"messages": "10",
"suppressedMessages": "11",
"errorCount": 0,
"fatalErrorCount": 0,
"warningCount": 0,
"fixableErrorCount": 0,
"fixableWarningCount": 0
},
"/home/USER/git/samplecode/src/formatter.ts",
[],
[],
"/home/USER/git/samplecode/src/vite-env.d.ts",
[],
[]
]

```

### Suggested updated cache file

Here is a suggested updated cache file, with relative paths instead of full paths. The `.` is the current working
directory. Typically, eslint is always run from the same place, so using a relative path should not cause a problem.

Note that the obscure hash value is still there, and is now Ak0sovs. This hash value will need to be calculated using
relative paths only.

```
[
{
"./src/formatter.ts": "1",
"./src/vite-env.d.ts": "2"
},
{
"size": 2584,
"mtime": 1695823503788,
"results": "3",
"hashOfConfig": "4"
},
{
"size": 75,
"mtime": 1695814197580,
"results": "5",
"hashOfConfig": "4"
},
{
"filePath": "6",
"messages": "7",
"suppressedMessages": "8",
"errorCount": 0,
"fatalErrorCount": 0,
"warningCount": 0,
"fixableErrorCount": 0,
"fixableWarningCount": 0
},
"Ak0sovs",
{
"filePath": "9",
"messages": "10",
"suppressedMessages": "11",
"errorCount": 0,
"fatalErrorCount": 0,
"warningCount": 0,
"fixableErrorCount": 0,
"fixableWarningCount": 0
},
"./src/formatter.ts",
[],
[],
"./src/vite-env.d.ts",
[],
[]
]
```

### Adding the command line parameter
- `conf/default-cli-options.js`: add the property `shareableCache` with a default of false. It should be put in the section with the other cache variables, below `cacheStrategy`.
- `docs/src/use/command-line-interface.md`: Add an explanation of the `shareable-cache` variable with the other cache variables: "By default, an eslintcache contains full file paths and thus cannot readily be shared across developers or ci machines. "False" is that default. "True" changes the internal storage to store a relative path to the process execution directory, thus making the eslintcache shareable across developers and ci machines. . If you change this setting, you must regenerate your entire eslint cache."
- `eslint-helpers.js`: add `shareableCache` variable (set to false) to the `processOptions`, and make sure it must be a boolean.
- `lib/options.js`: Add an option `shareable-cache` of type Boolean with a nice description for people to read: By default, an eslintcache contains full file paths and thus cannot readily be shared across developers or ci machines. "False" is that default. "True" changes the internal storage to store a relative path to the process execution directory, thus making the eslintcache shareable across developers and ci machines. . If you change this setting, you must regenerate your entire eslint cache.
- `lib/cli.js`: Add the `shareableCache` option, defaulted to false, in the `translateOptions `function.

cs6cs6 marked this conversation as resolved.
Show resolved Hide resolved
### Changing cache file serialization
- `lib/cli-engine/lint-result-cache.js`: Add the properties `cwd` and `shareableCache` to the `LintResultCache` class. `cwd` is a string, the current working directory. `shareableCache` is a boolean, the result of the user's passed in command line parameter. Use these values in two main places:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should clarify what we mean by the current working directory that will be passed to the LintResultCache class. Is it process.cwd(), or cwd that was passed to ESLint/FlatESLint constructor? In the latter case, is there a way to pass cwd to file-entry-cache or we could end up calculating paths differently?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another question is does file-entry-cache support relative paths? All tests seem to be with absolute paths:

https://github.com/jaredwray/file-entry-cache/blob/master/test/specs/cache.js

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will. I am just going to push up the code review sometime today. I have a complete and working branch on this task, and most of your questions wuold be resolved by the code review/better answered looking at the code changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget to answer the question regarding what cwd is from the previous comment.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it process.cwd(), or cwd that was passed to ESLint/FlatESLint constructor?

When created in cli-engine.js, it's options.cwd. It is also referred to as 'cwd' in the code.
When created in flat-eslint.js, it's processedOptions.cwd.

Copy link
Member

@fasttime fasttime Jun 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think that cwd would be a better base location to store relative paths?

Do you mean in all cases? If so, the answer is no. I think it makes the most sense to have it relative to the cache file when --cache-location is not used.

I'm asking because when --cache-location is not set, the cache file location should be already cwd. I think the advantage of keeping the paths relative to the cache file directory is that they don't change if the current directory changes, for example in a monorepo. Do you have a particular scenario in mind where storing paths relative to cwd would be advantageous?

Copy link
Member

@mdjermanovic mdjermanovic Jun 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, as @fasttime noticed, when --cache-location is not used, the cache file defaults to .eslintcache in cwd. So if the logic would be to use the location of the cache file when --cache-location is not used, and cwd when --cache-location is used, then it's basically the same as - always use cwd.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a use case where a monorepo has a single eslint cache file that's checked-in and shared between developers, and developers work and run eslint in project subdirectories (using --cache-location with a path to the single cache file somewhere in the root), I believe the only solution would be that the paths are relative to the cache file directory.

For other use cases, e.g., when the cache file is generated in CI and shared between CI runs (I believe that was the original use case in eslint/eslint#16493?), I'm not sure, it depends on where and how the cache file will be stored and used. It might be good to get more input from people requesting this feature.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should just make a decision and see what happens when people test with it. This PR has been open since October of last year, and it seems like this is the last thing to decide on, so I don't think there's value in drawing this out further when we have a binary decision to make.

So I'd like to propose that we say the paths are always relative to location of the .eslintcache file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I'd like to propose that we say the paths are always relative to location of the .eslintcache file.

I agree.

1. Whenever we store or retrieve file results in the cache, use the results of the new class function, `getFileKey(filePath)`. If `shareableCache` is false, it uses a full file path as the key. If `shareableCache` is true, uses a relative file path.
2. Update the function `hashOfConfigFor`, so that when the configuration object is hashed into a hash key and put into the eslintcache, if `shareableCache` is set to true, the file paths hashed are all relative paths. Implementation detail:
```js
function hashOfConfigFor(config, cwd) {
if (!configHashCache.has(config)) {
// replace the full directory path in the config string to make the hash location-neutral
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this feature will only be implemented for the new config system, I believe there is no need for this because there are no absolute paths in configs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A place where we do have absolute paths is LintResult#filePath, so that's something we'll need to convert from absolute to relative and vice versa when cache entries are stored and retrieved.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A place where we do have absolute paths is LintResult#filePath, so that's something we'll need to convert from absolute to relative and vice versa when cache entries are stored and retrieved.

Or, we could omit LintResult#filePath property from cache entries and add it back when cache entries are retrieved.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A reminder that the question about how LintResult#filePath will be handled should be addressed in this RFC.

// implementation TBD; please note configHashCache is a complex object with over 100 fields.
}
```
cs6cs6 marked this conversation as resolved.
Show resolved Hide resolved
- lib/cli-engine/cli-engine.js: Update the constructor call to LintResultCache to add the new parameters cwd and shareableCache.
- lib/eslint/flat-eslint.js: Update the constructor call to LintResultCache to add the new parameters cwd and shareableCache.

## Documentation

Add the new CLI flag to the CLI documentation. The new cli flag will be called shareable-cache.

## Drawbacks
cs6cs6 marked this conversation as resolved.
Show resolved Hide resolved

Adding a new CLI flag adds some complexity for the end user. They will need to understand why the feature exists.

Furthermore, defaulting to the existing behavior does help make upgrading less painful, but it is possible most people expect a shareable cache by default. This may cause confusion.

## Backwards Compatibility Analysis

The new flag, shareable-cache, passed in on the command line, will default to false. Unless people explicitly set it to true, then the old algorithms will
be used. Users of the eslint cache will still need to regenerate their cache no matter how they set this flag, because the current implementation forces a cache rebuild if there are
any configuration changes. Adding a new property counts as a change.
Copy link
Member

@fasttime fasttime May 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify the last sentence? Is a new property some field in the cache, in the config, or what exactly? Will removing a property or changing its value also count as a change?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any new version of ESLint invalidates cache entries stored by a previous version, so I think this is not a concern.

https://github.com/eslint/eslint/blob/d2d83e045ad03f024d1679275708054d789ebe20/lib/cli-engine/lint-result-cache.js#L50


## Alternatives

A CLI tool which translates the existing cache to the current folder structure. This could be any sort of script that will basically do a replace-all on the paths in
the .eslintcache file and also recalculates the hash so that they are the new location rather than the old. A downside to this is that any tool based on string
replacement would be fragile at best.

## Open Questions

Question: Should we make every cache shareable, or keep the shreable-cache flag default to false?

Answer: We don't know how people are currently using the cache file, so we can't make assumptions about what is and isn't safe to change. Even if we made a shareable cache the default (which would be a breaking change), we'd still need a way to enable the previous behavior in case there is a use case we aren't aware of..
cs6cs6 marked this conversation as resolved.
Show resolved Hide resolved

## Help Needed

None at this time.

## Frequently Asked Questions

TBD

## Related Discussions

[Change Request: Eslintcache relative #16493](https://github.com/eslint/eslint/issues/16493)