Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gazelle seems to not like PEP 695 – Type Parameter Syntax #2396

Open
dougthor42 opened this issue Nov 12, 2024 · 9 comments Β· May be fixed by #2413
Open

Gazelle seems to not like PEP 695 – Type Parameter Syntax #2396

dougthor42 opened this issue Nov 12, 2024 · 9 comments Β· May be fixed by #2413

Comments

@dougthor42
Copy link
Contributor

TL;DR: I think we need to bump go-tree-sitter for python 3.12 support.

🐞 bug report

Affected Rule

Gazelle

Is this a regression?

I could argue it either way, lol. I'm going to go with "probably" because it's related to #1895.

Description

PEP 695 added a Type Parameter syntax that looks like:

# Before
from typing import TypeVar

_T = TypeVar("_T")

def func(a: _T, b: _T) -> _T:
    ...

# After
def func[T](a: T, b: T) -> T:
    ...

If a python file uses this new def func[T](...) syntax, and the file has if __name__ == "__main__", then Gazelle will incorrectly generate a py_library target rather than py_binary.

πŸ”¬ Minimal Reproduction

  1. Make a python file:
# foo.py
def func[T](a: T, b: T) -> T:
    ...

if __name__ == "__main__":
    print("hi")
  1. Run gazelle bazel run //:gazelle

Expected Result

py_binary(
    name = "foo",
    srcs = ["foo.py"],
)

Actual Result

py_library(
    name = "foo",
    srcs = ["foo.py"],
)

πŸ”₯ Exception or Error

None

🌍 Your Environment

Operating System:

gLinux

Output of bazel version:

$ bazel version
Bazelisk version: v1.20.0
Build label: 7.3.2
Build target: @@//src/main/java/com/google/devtools/build/lib/bazel:BazelServer
Build time: Tue Oct 1 17:46:05 2024 (1727804765)
Build timestamp: 1727804765
Build timestamp as int: 1727804765

Rules_python version:

$ bazel mod graph                                                                                                       
<root> ([email protected])                    
β”œβ”€β”€β”€[email protected]            
β”‚   └─── ...
β”œβ”€β”€β”€[email protected]                 
β”‚   └─── ...
β”œβ”€β”€β”€[email protected]             
β”‚   └─── ...     
β”œβ”€β”€β”€[email protected]                 
β”‚   └─── ...
β”œβ”€β”€β”€[email protected] 
β”‚   └─── ...
β”œβ”€β”€β”€[email protected] 
β”‚   β”œβ”€β”€β”€[email protected] (*) 
β”‚   β”œβ”€β”€β”€[email protected] (*) 
β”‚   β”œβ”€β”€β”€[email protected] (*) 
β”‚   β”œβ”€β”€β”€[email protected] (*) 
β”‚   β”œβ”€β”€β”€[email protected] (*) 
β”‚   β”œβ”€β”€β”€[email protected] 
β”‚   β”‚   └───[email protected] (*) 
β”‚   └───[email protected] 
β”‚       └───[email protected] (*) 
└───[email protected] 
    β”œβ”€β”€β”€[email protected] (*) 
    β”œβ”€β”€β”€[email protected] (*) 
    β”œβ”€β”€β”€[email protected] (*) 
    └───[email protected] (*)

Anything else relevant?

I haven't been able to find a human-readable changelog of tree-sitter or go-tree-sitter that explicitly says when it added support for python 3.12...

@dougthor42 dougthor42 changed the title Gazelle seems to lot like PEP 695 – Type Parameter Syntax Gazelle seems to not like PEP 695 – Type Parameter Syntax Nov 12, 2024
@dougthor42
Copy link
Contributor Author

ah, dang:

// at github.com/smacker/go-tree-sitter@latest (after v0.0.0-20240422154435-0628b34cbf9c we used)
// "__main__" is the second child of b. But now, it isn't.
// we cannot use the latest go-tree-sitter because of the top level reference in scanner.c.
// https://github.com/smacker/go-tree-sitter/blob/04d6b33fe138a98075210f5b770482ded024dc0f/python/scanner.c#L1

I just tried this locally and got the same issue. @hunshcn do you have any additional insight for this?

@dougthor42
Copy link
Contributor Author

Opened smacker/go-tree-sitter#175.

@maffoo
Copy link

maffoo commented Nov 12, 2024

I think type parameter support was added to the tree-sitter grammer in tree-sitter/tree-sitter-python@71977ea, and should be included in grammer versions 0.20.4 and later. It looks like this was first included in go-tree-sitter in smacker/go-tree-sitter@8516e80.

@dougthor42
Copy link
Contributor Author

This is looking more and more like an issue with tree-sitter, but I may have to update Gazelle to fallback to the old python-based parser because of limitations elsewhere.


In our code base, I narrowed things down to a single line diff:

# get_deps.py
# using this line is fine - tree-sitter correctly parses the module
def search_one_more_level(graph: dict[T, set[T]], seen: set[T], routes: list[list[T]], target: T) -> list[T] | None:

# using this line cause tree-sitter to fail parsing - see below
def search_one_more_level[T](graph: dict[T, set[T]], seen: set[T], routes: list[list[T]], target: T) -> list[T] | None:

The only difference between the two lines is the addition of [T] after the function name.

I added some debugging prints to gazelle/python/file_parser.go:

--- a/gazelle/python/file_parser.go
+++ b/gazelle/python/file_parser.go
@@ -17,6 +17,7 @@ package python
 import (
        "context"
        "fmt"
+       "log"
        "os"
        "path/filepath"
        "strings"
@@ -184,6 +185,10 @@ func (p *FileParser) Parse(ctx context.Context) (*ParserOutput, error) {
        if err != nil {
                return nil, err
        }
+       if strings.Contains(p.relFilepath, "get_deps.py") {
+               log.Printf("after parsing file, got rootNode=%q", rootNode)
+       }
 
        p.output.HasMain = p.parseMain(ctx, rootNode)

And ran gazelle.

# the "good" code logs:
gazelle: after parsing file, got rootNode="(module (comment) (comment) (comment) (expression_statement (string)) (import_statement name: (dotted_name (identifier))) ...

# the "bad" code logs:
gazelle: after parsing file, got rootNode="(ERROR (comment) (comment) (comment) (expression_statement (string)) (import_statement name: (dotted_name (identifier))) ...

So something is up with tree-sitter where it errors out but doesn't return an err. The net result of this is that the subsequent call to FileParser.parseMain fails and the file is incorrectly labeled as a library (instead of a binary).

@dougthor42
Copy link
Contributor Author

I think type parameter support was added to the tree-sitter grammer in tree-sitter/tree-sitter-python@71977ea, and should be included in grammer versions 0.20.4 and later. It looks like this was first included in go-tree-sitter in smacker/go-tree-sitter@8516e80.

@maffoo yes I agree; the 3.12 grammar was added to tree-sitter a while ago. rules_python is currently pinning to the version just before go-tree-sitter updated the python grammar. rules_python pins 0628b34 and the python grammar was updated in the next commit 8516e80 just like you said.

image


I tried forking go-tree-sitter and hacking things to use the old scanner.cc, which doesn't reference ../array.h, but that naive approach sadly did not work.

@maffoo
Copy link

maffoo commented Nov 15, 2024

As I understand it we are using the standard rules_go rules to import the go-tree-sitter repository:

go_repository(
name = "com_github_smacker_go_tree_sitter",
importpath = "github.com/smacker/go-tree-sitter",
sum = "h1:7QZKUmQfnxncZIJGyvX8M8YeMfn8kM10j3J/2KwVTN4=",
version = "v0.0.0-20240422154435-0628b34cbf9c",
)

And then we use two targets which are created automatically by this import:

"@com_github_smacker_go_tree_sitter//:go-tree-sitter",
"@com_github_smacker_go_tree_sitter//python",

Is it possible to define a more custom way to build that instead of relying on targets generated automatically by rules_go, since clearly rules_go is not able to build this package by default? I guess rules_go does more than just invoking go build under the hood, but maybe there's a way to customize it a bit, e.g. by telling it that the @com_github_smacker_go_tree_sitter//python target should include some files from the parent directory.

@maffoo
Copy link

maffoo commented Nov 15, 2024

Actually, looks like the go_repository rules comes from gazelle itself, not from rules_go: https://github.com/bazel-contrib/bazel-gazelle/blob/master/repository.md#go_repository. There are a lot of options there, so maybe there's a way to customize it to work with the newer go-tree-sitter.

@maffoo
Copy link

maffoo commented Nov 15, 2024

I checked out the go-tree-sitter repo and ran gazelle to generate build files as described in the rules_go docs. To get //python to build, I added a filegroup in the root BUILD.bazel:

filegroup(
    name = "common",
    srcs = ["alloc.c", "alloc.h", "api.h", "array.h"],
    visibility = [":__subpackages__"],
)

and then added a reference to this in the go_library in python/BUILD.bazel:

go_library(
    name = "python",
    srcs = [
        "//:common",
        "binding.go",
        ...

This was enough to get //python to build. Given that, it seems like one way to get things working with the new go-tree-sitter would be to add these changes as patches when we bring in go-tree-sitter as a go_repository.

@dougthor42
Copy link
Contributor Author

Thanks Matthew! My next attempt was going to be to just move/copy those C flies into the python dir and tweak headers accordingly (also via a patch).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants