Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audit how we determine whether a file is a "Python source file" #13691

Open
AlexWaygood opened this issue Oct 9, 2024 · 8 comments
Open

Audit how we determine whether a file is a "Python source file" #13691

AlexWaygood opened this issue Oct 9, 2024 · 8 comments
Labels
cli Related to the command-line interface
Milestone

Comments

@AlexWaygood
Copy link
Member

I noticed in #13682 that there's some inconsistency regarding how we determine whether a file is a "Python source file" currently. In the code for ruff server (and the red-knot port of the server), we take care to do case-insensitive matching when figuring out whether something is a notebook file or not:

} else if Path::new(url.path())
.extension()
.map_or(false, |ext| ext.eq_ignore_ascii_case("ipynb"))

} else if Path::new(url.path())
.extension()
.map_or(false, |ext| ext.eq_ignore_ascii_case("ipynb"))
{
DocumentKey::Notebook(url)

Elsewhere, however, we mostly use case-sensitive matching:

/// Infers the source type from the file extension.
pub fn try_from_extension(extension: &str) -> Option<Self> {
let ty = match extension {
"py" => Self::Python,
"pyi" => Self::Stub,
"ipynb" => Self::Ipynb,
_ => return None,
};
Some(ty)
}

pub(crate) fn push(&mut self, component: &str) {
if let Some(component_extension) = camino::Utf8Path::new(component).extension() {
assert!(
self.relative_path.extension().is_none(),
"Cannot push part {component} to {self:?}, which already has an extension"
);
if self.is_standard_library() {
assert_eq!(
component_extension, "pyi",
"Extension must be `pyi`; got `{component_extension}`"
);
} else {
assert!(
matches!(component_extension, "pyi" | "py"),
"Extension must be `py` or `pyi`; got `{component_extension}`"
);
}
}
self.relative_path.push(component);
}

For places like the red-knot module resolver, it's likely correct to do case-sensitive matching (Python recognises foo.py as an importable module, but not foo.PY), but in other places it may not be. We should audit the code to make sure we're using case-sensitive matching and case-insensitive matching for file extensions in the correct places. We should also add comments to the places where there might be a subtle reason why case-(in)sensitive matching is required, rather than vice versa.

@MichaReiser
Copy link
Member

For places like the red-knot module resolver, it's likely correct to do case-sensitive matching (Python recognises foo.py as an importable module, but not foo.PY

Note: I think that depends on the file system's case sensitivity.

Other than that. I think we should ignore casing when matching files because that's what most desktop environments to. They use the same application to open a test.jpg or test.JPG.

@MichaReiser MichaReiser added the cli Related to the command-line interface label Oct 9, 2024
@AlexWaygood
Copy link
Member Author

AlexWaygood commented Oct 9, 2024

Here, for example, we might want to continue to do case-sensitive matching, since the rule only applies to the names of Python modules that are intended to be importable by other Python modules:

/// N999
pub(crate) fn invalid_module_name(
path: &Path,
package: Option<&Path>,
ignore_names: &IgnoreNames,
) -> Option<Diagnostic> {
if !PySourceType::try_from_path(path).is_some_and(PySourceType::is_py_file_or_stub) {
return None;
}

If a FOO.PY module can't be imported due to the fact that it uses a .PY extension on a case-sensitive file system, then the name of the file isn't really relevant to any PEP-8 concerns, as it's not a module (even if it might be a Python file). The rule is invalid-module-name, not invalid-python-file-name.

@dhruvmanila
Copy link
Member

There's also this issue astral-sh/ruff-vscode#574 related to the server although not sure if this affects that.

@hauntsaninja
Copy link
Contributor

Fwiw mypy has some special handling in a few places to be case sensitive even on case insensitive file systems

@MichaReiser
Copy link
Member

MichaReiser commented Feb 28, 2025

I think this is the relevant code in mypy:

https://github.com/python/mypy/blob/c32d11e60f04a0798292c438186199ccdec0db61/mypy/fscache.py#L185-L243

Both methods are called from within module resolution. @AlexWaygood and @carljm any chance you know how Python detects the real casing of a file in its module resolver?

@MichaReiser
Copy link
Member

I think I found it:

https://github.com/python/cpython/blob/9f0879baf15fdcf89f1b85cc244d596d4d0f4e47/Lib/importlib/_bootstrap_external.py#L1411-L1440

It seems Python indexes module on a per directory basis and uses listdir to get the real casing of files.

It does seem that the extension casing is ignored, at least on windows

https://github.com/python/cpython/blob/9f0879baf15fdcf89f1b85cc244d596d4d0f4e47/Lib/importlib/_bootstrap_external.py#L1425-L1438

Python assumes entire platforms as case sensitive or insensitive even though a mounted FAT partition on linux is case sensitive and macOs and both NTFS and macos can be configured to be case sensitive

https://github.com/python/cpython/blob/9f0879baf15fdcf89f1b85cc244d596d4d0f4e47/Lib/importlib/_bootstrap_external.py#L54-L76

@carljm
Copy link
Contributor

carljm commented Feb 28, 2025

Python assumes entire platforms as case sensitive or insensitive

I don't think this is really true, or it's only sort of true. The platform checking code you linked does not automatically determine any default case handling. It is purely for determining whether the legacy PYTHONCASEOK environment variable (which makes imports case insensitive) is available or not. Historically this was platform specific and there's no desire to make it more broadly available, because it's discouraged to use it. So some platforms just don't support that option.

The case insensitive suffix handling on Windows is a similar story. It was a legacy feature added on windows a long time ago and it is preserved on windows to avoid breakage there, but there is no desire to ever expand it to any other platform.

So in both these cases, there aren't assumptions about case sensitivity made that would cause a bug if there's a case sensitive file systems on Windows or non case sensitive file systems on Linux. There are just certain case-related legacy features that are documented as only available on certain platform(s).

I think if you ignore these two legacy features, the story for Python imports is pretty simple: imports are always case sensitive, based on needing to match what listdir returns.

@MichaReiser
Copy link
Member

Oh right, there's an os.environ check when building the relaxed thingy.

@MichaReiser MichaReiser modified the milestones: v0.10, v0.11 Mar 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cli Related to the command-line interface
Projects
None yet
Development

No branches or pull requests

5 participants