Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potentially the biggest performance improvement that can be done #46

Closed
SuperchupuDev opened this issue Sep 28, 2024 · 4 comments · Fixed by #69
Closed

Potentially the biggest performance improvement that can be done #46

SuperchupuDev opened this issue Sep 28, 2024 · 4 comments · Fixed by #69
Labels
help wanted Extra attention is needed performance Potential performance improvement

Comments

@SuperchupuDev
Copy link
Owner

SuperchupuDev commented Sep 28, 2024

I'm not even sure how to approach the problem, but implementing this means that all of the weird patterns that currently avoid all optimizations would be really optimized along with literally everything else.

By default, fdir crawls all subdirectories and files of a root, which can result in extra processing work that's not necessary, harming performance. tinyglobby tries to apply some optimizations by inferring a common root.

fdir exposes a exclude function that can be used to exclude directories from crawling. It's being currently used on the ignore patterns to... not crawl those ignored patterns?

What if, we took the matching patterns (basically the patterns that aren't meant to be ignored), we did some weird transformations to them, and used them in the exclude matcher?

For example, let's say we have the following usage:

import { glob } from 'tinyglobby';

await glob(['src/files/index.ts', 'scripts/*.ts']);

with the following file structure:

- node_modules
  ^big
- plugins
  - myPlugins
    | plugin.ts
- scripts
  - utils
    | index.ts
  | deploy.ts
  | run.ts
- src
  - files
    | index.ts
  - other
    | index.ts

Basically, we need a picomatch matcher that returns true for every directory that we don't want to crawl, in this case node_modules, plugins, scripts/utils, and src/other. This could be implemented with the following picomatch usage:

import picomatch from 'picomatch';

const exclude = picomatch('**/*', {
  ignore: ['src/files', 'scripts/*/**']
});

Great! It now only crawls the directories needed (hopefully, I haven't checked). Now the question is how to implement something that converts ['src/files/index.ts', 'scripts/*.ts' into ['src/files', 'scripts/*/**'], which is the whole point of this issue. If we figure it out, tinyglobby should be nearly as fast as possible.

Some notes:

  • Whatever thing is implemented needs to take into account how <pattern>/** patterns work. It not only matches subdirectories of <pattern>, but it also matches <pattern> itself. this is why the second processed pattern in the example isn't scripts/**, as that would match scripts making fdir not crawl it.
  • Ideally this processing should happen in the normalizePattern function, so that we don't need to loop through the patterns extra times.
@SuperchupuDev SuperchupuDev added help wanted Extra attention is needed performance Potential performance improvement labels Sep 28, 2024
@webpro
Copy link
Contributor

webpro commented Oct 25, 2024

Knip does something similar, but just for .gitignore files only. Pointers:

The key is to have functionality like this deepFilter. tinyglobby uses fdir which has filter which is maybe what we need here?

@webpro
Copy link
Contributor

webpro commented Oct 25, 2024

Scratch that, I mixed up filter and exclude.

Still, the deepFilter example might be useful as I think it does the same as fdir#exclude.

@SuperchupuDev
Copy link
Owner Author

an initial implementation has been merged in #69! the code can be improved further though, which is what i'll try to do before a new release

@SuperchupuDev
Copy link
Owner Author

current status of this issue: while an initial implementation was merged (and still not released yet!), it broke some use cases (#80). #76 was opened, which not only fixes this use case but it also works better at optimizing. however, there is still one use case that breaks in both: using / inside glob symbols (for example the pattern {a/b,c}. in theory i could just convert those kinds of parts to ** in the optimizer but it wouldn't be ideal as it would mean no optimizations are applied in those cases. i'm looking for solutions and open for help in #76

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed performance Potential performance improvement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants