Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(glob): correctly handle relative patterns #2620

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Conaclos
Copy link
Member

@Conaclos Conaclos commented Apr 27, 2024

Summary

Fix #2421

Biome relies on an internal fork of the glob crate.
This crate converts relative glob patterns into recursive glob patterns.
This means that ./file' is equivalent to ./**/file'.

This is surprising behavior and has been reported as a bug by our users.
Biome no longer converts relative glob patterns to recursive glob patterns.

EDIT: see the next comment.

Test Plan

I updated the unit tests of the glob pattern utility.

@github-actions github-actions bot added A-Project Area: project A-Changelog Area: changelog labels Apr 27, 2024
@Conaclos
Copy link
Member Author

Conaclos commented Apr 27, 2024

The issue is more complex that I though.

Let's assume that our project is in a folder /temp/.
Assuming that the working directory is /temp/, the following commands should be equivalent:

biome lint ./
biome lint /temp/

Unfortunately this is not the case, because Biome doesn't normalize the paths and pass them as is.

Giving the following biome config:

{
	"overrides": [
		{
			"include": ["./file.js"],
			"linter": {
				"enabled": false
			}
		}
	]
}

The first command biome lint . ignores ./file.js because Biome matches the path ./file.js against the pattern ./file.js.
The second command doesn't ignore ./file.js because Biome matches the path /temp/file.js against the pattern ./file.js.

Ideally we should normalize all path to absolute path and turn relative glob patterns into absolute glob patterns. Or we should remove the working directory of an absolute path. Not sure how symlinks play there.

Otherwise, we could assume that users always pass relative paths on the CLI.
in this case, the current fix works.
We could warn against absolute paths.

Anyway, I think we should take some time to think about that and delay the fix to a minor or major version.

@Conaclos Conaclos marked this pull request as draft April 27, 2024 11:55
@Sec-ant
Copy link
Member

Sec-ant commented Apr 27, 2024

Ideally we should normalize all path to absolute path and turn relative glob patterns into absolute glob patterns.

I think this is the right direction to go. But I don't think we need to implement everything ourselves.

I did some search the other day for rust crates that handle glob patterns: The glob crate which we currently vendor has some issues like it doesn't match the trailing slash (/) to directories, doesn't understand symlinks, doesn't work with absolute paths, doesn't handle UTF-8 filenames very well, doesn't support alternative syntax, etc. If we're to fix the problems related to globs in our codebase, we should at least be aware of the limitations it has.

That's when I find another crate globset, which solves many of the problems above. It is developed by the same author of regex and ripgrep, and is also used in ripgrep and ruff. I think this crate is aiming to be more align with the glob syntax used in .gitignore (which I personally think is great) and it is also a nice choice for us because we do have vcs options that intends to use .gitignore to ignore paths. I also think most users, whether from different OSes and having different language backgrounds, are all familiar with the .gitignore glob syntaxes, and it is well-defined. Nevertheless, there're also descrepencies and bugs in globset that we should be aware of, and the author is seemingly currently revamping the globset crate (judging by his github profile status).

While surfing the issues of globset, I learned that there's another project called gitoxide which is an implementation of git written in Rust. In this project, there're two crates that may be interesting and promising to consider: gix-glob and gix-ignore. According to its README, the .gitignore parsing feature is complete, so maybe we can also learn somthing from that project.

Anyway, I'm sharing these only to hope they can be helpful when tackling the glob matching issue. :)

@ematipico
Copy link
Member

@Sec-ant

Few months ago I tried to use globset and that broke a lot of user's code. We had to revert that. It misses an important functionality that vendored glob has.

@Sec-ant
Copy link
Member

Sec-ant commented Apr 27, 2024

@Sec-ant

Few months ago I tried to use globset and that broke a lot of user's code. We had to revert that. It misses an important functionality that vendored glob has.

Oh, I see. That may explain why ruff has both glob and globset as dependencies. Do you still remember what's the important functionality it misses?

@Conaclos
Copy link
Member Author

Conaclos commented Apr 27, 2024

We could try to switch to a different glob library for Biome 2.0 or later. I would like to work on that in the future.
The impact on users is too large to make the change in a patch or minor release.

@Sec-ant
Copy link
Member

Sec-ant commented Apr 27, 2024

Absolutely, just share some findings!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Changelog Area: changelog A-Project Area: project
Projects
None yet
3 participants